43 research outputs found

    A classification-based approach to economic event detection in Dutch news text

    Get PDF
    Breaking news on economic events such as stock splits or mergers and acquisitions has been shown to have a substantial impact on the financial markets. As it is important to be able to automatically identify events in news items accurately and in a timely manner, we present in this paper proof-of-concept experiments for a supervised machine learning approach to economic event detection in newswire text. For this purpose, we created a corpus of Dutch financial news articles in which 10 types of company-specific economic events were annotated. We trained classifiers using various lexical, syntactic and semantic features. We obtain good results based on a basic set of shallow features, thus showing that this method is a viable approach for economic event detection in news text

    Exploratory Analysis of Pairwise Interactions in Online Social Networks

    Get PDF
    In the last few decades sociologists were trying to explain human behaviour by analysing social networks, which requires access to data about interpersonal relationships. This represented a big obstacle in this research field until the emergence of online social networks (OSNs), which vastly facilitated the process of collecting such data. Nowadays, by crawling public profiles on OSNs, it is possible to build a social graph where "friends" on OSN become represented as connected nodes. OSN connection does not necessarily indicate a close real-life relationship, but using OSN interaction records may reveal real-life relationship intensities, a topic which inspired a number of recent researches. Still, published research currently lacks an extensive exploratory analysis of OSN interaction records, i.e. a comprehensive overview of users' interaction via different ways of OSN interaction. In this paper we provide such an overview by leveraging results of conducted extensive social experiment which managed to collect records for over 3,200 Facebook users interacting with over 1,400,000 of their friends. Our exploratory analysis focuses on extracting population distributions and correlation parameters for 13 interaction parameters, providing valuable insight in online social network interaction for future researches aimed at this field of study.Comment: Journal Article published 2 Oct 2017 in Automatika volume 58 issue 4 on pages 422 to 42

    A Modified Weight Balanced Algorithm for Influential Users Community Detection in Online social Network (OSNs)

    Full text link
    In the modern era online users are increasing day by day. Different users are using various social networks in different forms. The behavior and attitude of the users of social networking sites varies U2U (User to User). In online social networking users join many groups and communities as per interests and according to the groups'/Communities' influential user. This paper consist of 7 sections , first section emphasis on introduction to the community evelotion and community. Second section signify movement between communities ,third section involve related work about the research.. Fourth section includes Problem Definition and fifth section involve Methodology (Proposed Algorithm Process ,Get Community Matrix, Community detetcion).Sixth section involve Implementation. Furthermore implementation include Datasets ,Quantitative performance, Graphical Results, Enhancement in the existing work..Last section include Conclusion and then references. In this paper,we are implementing and proposing the community detection in social media .In the proposed we have deployed a Longest Chain Subsequence metric for finding the number of connections to the kernel community

    Crawling Facebook for Social Network Analysis Purposes

    Get PDF
    We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.\u

    Towards measuring the complexity of introducing semantics into a company

    Get PDF
    The Semantics Difficulty Model (SDM) is a model that measures the difficult of introducing semantics technology into a company. SDM manages three descriptions of stages, which we will refer to as ?snapshots?: a company semantic snapshot, data snapshot and semantic application snapshot. Understanding a priory the complexity of introducing semantics into a company is important because it allows the organization to take early decisions, thus saving time and money, mitigating risks and improving innovation, time to market and productivity. SDM works by measuring the distance between each initial snapshot and its reference models (the company semantic snapshots reference model, data snapshots reference model, and the semantic application snapshots reference model) with Euclidian distances. The difficulty level will be "not at all difficult" when the distance is small, and becomes "extremely difficult" when the the distance is large. SDM has been tested experimentally with 2000 simulated companies with arrangements and several initial stages. The output is measured by five linguistic values: "not at all difficult, slightly difficult, averagely difficult, very difficult and extremely difficult". As the preliminary results of our SDM simulation model indicate, transforming a search application into integrated data from different sources with semantics is a "slightly difficult", in contrast with data and opinion extraction applications for which it is "very difficult"

    Categorizing and measuring social ties

    Get PDF
    he analysis of social networks has boomed recently, mainly as online social networking systems such as Twitter allow researchers to access these data. However, the research is less and less focused on the fundamental question on the validity of the data and interpretation of the results. For example, Golder et al. (2007) use the word 'friend' in quotes while describing their results. To enhance the discussion around the validity of results, our work contributes a categorization of social network data. We also discuss the differences of the data sources, especially highlighting the fact that different data sources disclose different kinds of networks. Our approach is to examine social networks based on several sources of data, and thus acquire a richer data set. Based on this extended data set, we are more equipped to understand the social relations represented via links between nodes. After reviewing the existing literature, we make two observations of social relationships in online services. Firstly, the friendship data may be shared in public or with the specific group of users of that service - this may affect how people perceive and use these relationships, especially when compared with the private displays of relations (e.g., Donath & boyd, 2004). On the other hand, people interact only with part of their social relations (e.g., Golder et al., 2007) and research has started to focus from statical networks to more dynamical activity based networks (e.g., Huberman et al., 2009). Based on the existing literature, shortly discussed above, a 2x2 matrix can be developed. Relations may be public or private and active or passive. For instance, those relations with which you use Instant Messaging can be considered private and active whereas Facebook friends are passive and public. As they are different in this nature, also the conclusions based on the analysis should differ. After confirming that the data measure the phenomenon desired, one should use several kinds of data sources to really understand the social structures behind the group under study. We claim that multiple data sets should be used when measuring social relations. McPherson et al. (2001) have also concluded that the priority for future social network researchers should be to gather dynamic data on multiple social relations. By studying existing research and our own empirical data (e.g., Karikoski & Nelimarkka, 2011), we discuss the opportunities and challenges of using multiple data sets to cover the same group.Peer reviewe

    Enhanced information retrieval by exploiting recommender techniques in cluster-based link analysis

    Get PDF
    Inspired by the use of PageRank algorithms in document ranking, we develop and evaluate a cluster-based PageRank algorithm to re-rank information retrieval (IR) output with the objective of improving ad hoc search effectiveness. Unlike existing work, our methods exploit recommender techniques to extract the correlation between documents and apply detected correlations in a cluster-based PageRank algorithm to compute the importance of each document in a dataset. In this study two popular recommender techniques are examined in four proposed PageRank models to investigate the effectiveness of our approach. Comparison of our methods with strong baselines demonstrates the solid performance of our approach. Experimental results are reported on an extended version of the FIRE 2011 personal information retrieval (PIR) data collection which includes topically related queries with click-through data and relevance assessment data collected from the query creators. The search logs of the query creators are categorized based on their different topical interests. The experimental results show the significant improvement of our approach compared to results using standard IR and cluster-based PageRank methods

    In search of knowledge: text mining dedicated to technical translation

    Get PDF
    Articolo pubblicato su CD e commercializzato direttamente dall'ASLIB (http://shop.emeraldinsight.com/product_info.htm/cPath/56_59/products_id/431). Programma del convegno su http://aslib.co.uk/conferences/tc_2011/programme.htm
    corecore