83 research outputs found

    Identifying the topic-specific influential users in Twitter

    Get PDF
    Social Influence can be described as the ability to have an effect on the thoughts or actions of others. Influential members in online communities are becoming the new media to market products and sway opinions. Also, their guidance and recommendations can save some people the search time and assist their selective decision making. The objective of this research is to detect the influential users in a specific topic on Twitter. In more detail, from a collection of tweets matching a specified query, we want to detect the influential users, in an online fashion. In order to address this objective, we first want to focus our search on the individuals who write in their personal accounts, so we investigate how we can differentiate between the personal and non-personal accounts. Secondly, we investigate which set of features can best lead us to the topic-specific influential users, and how these features can be expressed in a model to produce a ranked list of influential users. Finally, we look into the use of the language and if it can be used as a supporting feature for detecting the author\u27s influence. In order to decide on how to differentiate between the personal and non-personal accounts, we compared between the effectiveness of using SVM and using a manually assembled list of the non-personal accounts. In order to decide on the features that can best lead us to the influential users, we ran a few experiments on a set of features inspired from the literature. Two ranking methods were then developed, using feature combinations, to identify the candidate users for being influential. For evaluation we manually examined the users, looking at their tweets and profile page in order to decide on their influence. To address our final objective, we ran a few experiments to investigate if the SLM could be used to identify the influential users\u27 tweets. For user account classification into personal and non-personal accounts, the SVM was found to be domain independent, reliable and consistent with a precision of over 0.9. The results showed that over time the list performance deteriorates and when the domain of the test data was changed, the SVM performed better than the list with higher precision and specificity values. We extracted eight independent features from a set of 12, and ran experiments on these eight and found that the best features at identifying influential users to be the Followers count, the Average Retweets count, The Average Retweets Frequency and the Age_Activity combination. Two ranking methods were developed and tested on a set of tweets retrieved using a specific query. In the first method, these best four features were combined in different ways. The best combination was the one that took the average of the Followers count and the Average Retweets count, producing a precision at 10 value of 0.9. In the second method, the users were ranked according to the eight independent features and the top 50 users of each were included in separate lists. The users were then ranked according to their appearance frequency in these lists. The best result was obtained when we considered the users who appeared in six or more of the lists, which resulted in a precision of 1.0. Both ranking methods were then conducted on 20 different collections of retrieved tweets to verify their effectiveness in detecting influential users, and to compare their performance. The best result was obtained by the second method, for the set of users who appeared in six or more of the lists, with the highest precision mean of 0.692. Finally, for the SLM, we found a correlation between the users\u27 average Retweets counts and their tweets\u27 perplexity values, which consolidates the hypothesis that SLM can be trained to detect the highly retweeted tweets. However, the use of the perplexity for identifying influential users resulted in very low precision values. The contributions of this thesis can be summarized into the following. A method to classify the personal accounts was proposed. The features that help detecting influential users were identified to be the Followers count, the Average Retweets count, the Average Retweet Frequency and the Age_Activity combination. Two methods for identifying the influential users were proposed. Finally, the simplistic approach using SLM did not produce good results, and there is still a lot of work to be done for the SLM to be used for identifying influential users

    Opening up to big data: computer-assisted analysis of textual data in social sciences

    Get PDF
    "Two developments in computational text analysis may change the way qualitative data analysis in social sciences is performed: 1. the availability of digital text worth to investigate is growing rapidly, and 2. the improvement of algorithmic information extraction approaches, also called text mining, allows for further bridging the gap between qualitative and quantitative text analysis. The key factor hereby is the inclusion of context into computational linguistic models which extends conventional computational content analysis towards the extraction of meaning. To clarify methodological differences of various computer-assisted text analysis approaches the article suggests a typology from the perspective of a qualitative researcher. This typology shows compatibilities between manual qualitative data analysis methods and computational, rather quantitative approaches for large scale mixed method text analysis designs." (author's abstract

    The Blurred Line Between Form and Process: A Comparison of Stream Channel Classification Frameworks

    Get PDF
    Stream classification provides a means to understand the diversity and distribution of channels and floodplains that occur across a landscape while identifying links between geomorphic form and process. Accordingly, stream classification is frequently employed as a watershed planning, management, and restoration tool. At the same time, there has been intense debate and criticism of particular frameworks, on the grounds that these frameworks classify stream reaches based largely on their physical form, rather than direct measurements of their component hydrogeomorphic processes. Despite this debate surrounding stream classifications, and their ongoing use in watershed management, direct comparisons of channel classification frameworks are rare. Here we implement four stream classification frameworks and explore the degree to which each make inferences about hydrogeomorphic process from channel form within the Middle Fork John Day Basin, a watershed of high conservation interest within the Columbia River Basin, U.S.A. We compare the results of the River Styles Framework, Natural Channel Classification, Rosgen Classification System, and a channel form-based statistical classification at 33 field-monitored sites. We found that the four frameworks consistently classified reach types into similar groups based on each reach or segment’s dominant hydrogeomorphic elements. Where classified channel types diverged, differences could be attributed to the (a) spatial scale of input data used, (b) the requisite metrics and their order in completing a framework’s decision tree and/or, (c) whether the framework attempts to classify current or historic channel form. Divergence in framework agreement was also observed at reaches where channel planform was decoupled from valley setting. Overall, the relative agreement between frameworks indicates that criticism of individual classifications for their use of form in grouping stream channels may be overstated. These form-based criticisms may also ignore the geomorphic tenet that channel form reflects formative hydrogeomorphic processes across a given landscape

    The Blurred Line Between Form and Process: A Comparison of Stream Channel Classification Frameworks

    Get PDF
    Stream classification provides a means to understand the diversity and distribution of channels and floodplains that occur across a landscape while identifying links between geomorphic form and process. Accordingly, stream classification is frequently employed as a watershed planning, management, and restoration tool. At the same time, there has been intense debate and criticism of particular frameworks, on the grounds that these frameworks classify stream reaches based largely on their physical form, rather than direct measurements of their component hydrogeomorphic processes. Despite this debate surrounding stream classifications, and their ongoing use in watershed management, direct comparisons of channel classification frameworks are rare. Here we implement four stream classification frameworks and explore the degree to which each make inferences about hydrogeomorphic process from channel form within the Middle Fork John Day Basin, a watershed of high conservation interest within the Columbia River Basin, U.S.A. We compare the results of the River Styles Framework, Natural Channel Classification, Rosgen Classification System, and a channel form-based statistical classification at 33 field-monitored sites. We found that the four frameworks consistently classified reach types into similar groups based on each reach or segment’s dominant hydrogeomorphic elements. Where classified channel types diverged, differences could be attributed to the (a) spatial scale of input data used, (b) the requisite metrics and their order in completing a framework’s decision tree and/or, (c) whether the framework attempts to classify current or historic channel form. Divergence in framework agreement was also observed at reaches where channel planform was decoupled from valley setting. Overall, the relative agreement between frameworks indicates that criticism of individual classifications for their use of form in grouping stream channels may be overstated. These form-based criticisms may also ignore the geomorphic tenet that channel form reflects formative hydrogeomorphic processes across a given landscape

    Applications across Co-located Devices

    Get PDF
    We live surrounded by many computing devices. However, their presence has yet to be fully explored to create a richer ubiquitous computing environment. There is an opportunity to take better advantage of those devices by combining them into a unified user experience. To realize this vision, we studied and explored the use of a framework, which provides the tools and abstractions needed to develop applications that distribute UI components across co-located devices. The framework comprises the following components: authentication and authorization services; a broker to sync information across multiple application instances; background services that gather the capabilities of the devices; and a library to integrate web applications with the broker, determine which components to show based on UI requirements and device capabilities, and that provides custom elements to manage the distribution of the UI components and the multiple application states. Collaboration between users is supported by sharing application states. An indoor positioning solution had to be developed in order to determine when devices are close to each other to trigger the automatic redistribution of UI components. The research questions that we set out to respond are presented along with the contributions that have been produced. Those contributions include a framework for crossdevice applications, an indoor positioning solution for pervasive indoor environments, prototypes, end-user studies and developer focused evaluation. To contextualize our research, we studied previous research work about cross-device applications, proxemic interactions and indoor positioning systems. We presented four application prototypes. The first three were used to perform studies to evaluate the user experience. The last one was used to study the developer experience provided by the framework. The results were largely positive with users showing preference towards using multiple devices under some circumstances. Developers were also able to grasp the concepts provided by the framework relatively well.Vivemos rodeados de dispositivos computacionais. No entanto, ainda não tiramos partido da sua presença para criar ambientes de computação ubíqua mais ricos. Existe uma oportunidade de combiná-los para criar uma experiência de utilizador unificada. Para realizar esta visão, estudámos e explorámos a utilização de uma framework que forneça ferramentas e abstrações que permitam o desenvolvimento de aplicações que distribuem os componentes da interface do utilizador por dispositivos co-localizados. A framework é composta por: serviços de autenticação e autorização; broker que sincroniza informação entre várias instâncias da aplicação; serviços que reúnem as capacidades dos dispositivos; e uma biblioteca para integrar aplicações web com o broker, determinar as componentes a mostrar com base nos requisitos da interface e nas capacidades dos dispositivos, e que disponibiliza elementos para gerir a distribuição dos componentes da interface e dos estados de aplicação. A colaboração entre utilizadores é suportada através da partilha dos estados de aplicação. Foi necessário desenvolver um sistema de posicionamento em interiores para determinar quando é que os dispositivos estão perto uns dos outros para despoletar a redistribuição automática dos componentes da interface. As questões de investigação inicialmente colocadas são apresentadas juntamente com as contribuições que foram produzidas. Essas contribuições incluem uma framework para aplicações multi-dispositivo, uma solução de posicionamento em interiores para computação ubíqua, protótipos, estudos com utilizadores finais e avaliação com programadores. Para contextualizar a nossa investigação, estudámos trabalhos anteriores sobre aplicações multi-dispositivo, interação proxémica e sistemas de posicionamento em interiores. Apresentámos quatro aplicações protótipo. As primeiras três foram utilizadas para avaliar a experiência de utilização. A última foi utilizada para estudar a experiência de desenvolvimento com a framework. Os resultados foram geralmente positivos, com os utilizadores a preferirem utilizar múltiplos dispositivos em certas circunstâncias. Os programadores também foram capazes de compreender a framework relativamente bem

    A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text

    Get PDF
    Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies

    A Systematic Review of Deep Learning Approaches to Educational Data Mining

    Get PDF
    Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasing attention in the educational domain. Deep Learning is a machine learning method based on neural network architectures with multiple layers of processing units, which has been successfully applied to a broad set of problems in the areas of image recognition and natural language processing. This paper surveys the research carried out in Deep Learning techniques applied to EDM, from its origins to the present day. The main goals of this study are to identify the EDM tasks that have benefited from Deep Learning and those that are pending to be explored, to describe the main datasets used, to provide an overview of the key concepts, main architectures, and configurations of Deep Learning and its applications to EDM, and to discuss current state-of-the-art and future directions on this area of research

    Predicting Forex Currency Fluctuations Using a Novel Bio-inspired Modular Neural Network

    Get PDF
    This thesis explores the intricate interplay of rational choice theory (RCT), brain modularity, and artificial neural networks (ANNs) for modelling and forecasting hourly rate fluctuations in the foreign exchange (Forex) market. While RCT traditionally models human decision-making by emphasising self-interest and rational choices, this study extends its scope to encompass emotions, recognising their significant impact on investor decisions. Recent advances in neuro- science, particularly in understanding the cognitive and emotional processes associated with decision-making, have inspired computational methods to emulate these processes. ANNs, in particular, have shown promise in simulating neuroscience findings and translating them into effective models for financial market dynamics. However, their monolithic architectures of ANNs, characterised by fixed struc- tures, pose challenges in adaptability and flexibility when faced with data perturbations, limiting overall performance. To address these limitations, this thesis proposes a Modular Convolutional orthogonal Recurrent Neural Net- work with Monte Carlo dropout-ANN (MCoRNNMCD-ANN) inspired by recent neuroscience findings. A comprehensive literature review contextualises the challenges associated with monolithic architectures, leading to the identification of neural network structures that could enhance predictions of Forex price fluctuations, such as in the most prominently traded currencies, the EUR/GBP pairing. The proposed MCoRNNMCD-ANN is thoroughly evaluated through a detailed comparative analysis against state-of-the-art techniques, such as BiCuDNNL- STM, CNN–LSTM, LSTM–GRU, CLSTM, and ensemble modelling and single- monolithic CNN and RNN models. Results indicate that the MCoRNNMCD- ANN outperforms competitors. For instance, reducing prediction errors in test sets from 19.70% to an impressive 195.51%, measured by objective evaluation metrics like a mean square error. This innovative neurobiologically-inspired model not only capitalises on modularity but also integrates partial transfer learning to improve forecasting ac- curacy in anticipating Forex price fluctuations when less data occurs in the EUR/USD currency pair. The proposed bio-inspired modular approach, incorporating transfer learning in a similar task, brings advantages such as robust forecasts and enhanced generalisation performance, especially valuable in domains where prior knowledge guides modular learning processes. The proposed model presents a promising avenue for advancing predictive modelling in Forex predictions by incorporating transfer learning principles

    Making Machines Learn. Applications of Cultural Analytics to the Humanities

    Get PDF
    The digitization of several million books by Google in 2011 meant the popularization of a new kind of humanities research powered by the treatment of cultural objects as data. Culturomics, as it is called, was born, and other initiatives resonated with such a methodological approach, as is the case with the recently formed Digital Humanities or Cultural Analytics. Intrinsically, these new quantitative approaches to culture all borrow from techniques and methods developed under the wing of the exact sciences, such as computer science, machine learning or statistics. There are numerous examples of studies that take advantage of the possibilities that treating objects as data has to offer for the understanding of the human. This new data science that is now applied to the current trends in culture can also be replicated to study more traditional humanities. Led by proper intellectual inquiry, an adequate use of technology may bring answers to questions intractable by other means, or add evidence to long held assumptions based on a canon built from few examples. This dissertation argues in favor of such approach. Three different case studies are considered. First, in the more general sense of the big and smart data, we collected and analyzed more than 120,000 pictures of paintings from all periods of art history, to gain a clear insight on how the beauty of depicted faces, in the framework of neuroscience and evolutionary theory, has changed over time. A second study covers the nuances of modes of emotions employed by the Spanish Golden Age playwright Calderón de la Barca to empathize with his audience. By means of sentiment analysis, a technique strongly supported by machine learning, we shed some light into the different fictional characters, and how they interact and convey messages otherwise invisible to the public. The last case is a study of non-traditional authorship attribution techniques applied to the forefather of the modern novel, the Lazarillo de Tormes. In the end, we conclude that the successful application of cultural analytics and computer science techniques to traditional humanistic endeavours has been enriching and validating
    • …
    corecore