5,093 research outputs found

    Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

    Get PDF
    We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition

    Name Disambiguation from link data in a collaboration graph using temporal and topological features

    Get PDF
    In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

    MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

    Full text link
    Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

    A Hybrid Approach to Domain-Specific Entity Linking

    Get PDF
    The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds.Comment: SEM'1

    Identifying Geographic Clusters: A Network Analytic Approach

    Get PDF
    In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invariant and it does not allow one to differentiate between different activities. Our goal in this paper is to present a new methodology for identifying clusters, that can be applied to different empirical settings. We use a graph approach based on k-shell decomposition to analyze world biomedical research clusters based on PubMed scientific publications. We identify research institutions and locate their activities in geographical clusters. Leading areas of scientific production and their top performing research institutions are consistently identified at different geographic scales
    • 

    corecore