Search CORE

1,735 research outputs found

Name Disambiguation from link data in a collaboration graph using temporal and topological features

Author: Hasan Mohammad Al
Saha Tanay Kumar
Zhang Baichuan
Publication venue
Publication date: 01/12/2015
Field of study

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

arXiv.org e-Print Archive

IUPUIScholarWorks

The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

Author: Baus
Beaulieu
Birnholtz
Boyack
Börner
Cambrosio
Crane
Cronin
Fry
Fry
Galison
Geels
Gläser
Gläser
Gläser
Guimera
Guimera
Hellsten
Hine
Howard
Huang
Jansen
Kling
Kling
Kling
Knorr Cetina
Kretschmer
Lambiotte
Lancichinetti
Laurens
Lievrouw
Lievrouw
Melin
Mogoutov
Moran-Ellis
Morris
Mulkay
Nentwich
Rafols
Rosvall
Seglen
Shibata
Small
Strotmann
Van den Besselaar
Van House
Velden
Veugelers
Walsh
Whitley
Zitt
Zitt
Zuccala
Publication venue
Publication date: 09/01/2013
Field of study

The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping sub- communities within a research specialty and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we aim to increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set in order to adequately represent a research specialty, and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

Exploiting citation networks for large-scale author name disambiguation

Author: Helbing Dirk
Mazloumian Amin
Penner Orion
Petersen Alexander M
Schulz Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.Comment: 14 pages, 5 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Springer - Publisher Connector

eScholarship - University of California

IMT Institutional Repository