25 research outputs found

    Distributed human computation framework for linked data co-reference resolution

    No full text
    Distributed Human Computation (DHC) is a technique used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI is envisioned to be a decentralised world-wide information space for sharing machine-readable data with minimal integration costs. There are many research problems in the Semantic Web that are considered as AI-complete problems. An example is co-reference resolution, which involves determining whether different URIs refer to the same entity. This is considered to be a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution in the Semantic Web when integrating distributed datasets. The traditional way to solve this problem is to design machine-learning algorithms. However, they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity co-reference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are published to the Linked Data Cloud

    Benchmarking some Portuguese S&T system research units: 2nd Edition

    Full text link
    The increasing use of productivity and impact metrics for evaluation and comparison, not only of individual researchers but also of institutions, universities and even countries, has prompted the development of bibliometrics. Currently, metrics are becoming widely accepted as an easy and balanced way to assist the peer review and evaluation of scientists and/or research units, provided they have adequate precision and recall. This paper presents a benchmarking study of a selected list of representative Portuguese research units, based on a fairly complete set of parameters: bibliometric parameters, number of competitive projects and number of PhDs produced. The study aimed at collecting productivity and impact data from the selected research units in comparable conditions i.e., using objective metrics based on public information, retrievable on-line and/or from official sources and thus verifiable and repeatable. The study has thus focused on the activity of the 2003-06 period, where such data was available from the latest official evaluation. The main advantage of our study was the application of automatic tools, achieving relevant results at a reduced cost. Moreover, the results over the selected units suggest that this kind of analyses will be very useful to benchmark scientific productivity and impact, and assist peer review.Comment: 26 pages, 20 figures F. Couto, D. Faria, B. Tavares, P. Gon\c{c}alves, and P. Verissimo, Benchmarking some portuguese S\&T system research units: 2nd edition, DI/FCUL TR 13-03, Department of Informatics, University of Lisbon, February 201

    Author disambiguation using multi-aspect similarity indicators

    Get PDF
    Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques

    Effect of forename string on author name disambiguation

    Full text link
    In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real‐world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine‐learning‐based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full‐length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full‐string format via record linkage for improved disambiguation performances.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/1/asi24298.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/2/asi24298_am.pd

    Researchers’ publication patterns and their use for author disambiguation

    Get PDF
    Over the recent years, we are witnessing an increase of the need for advanced bibliometric indicators on individual researchers and research groups, for which author disambiguation is needed. Using the complete population of university professors and researchers in the Canadian province of Québec (N=13,479), of their papers as well as the papers authored by their homonyms, this paper provides evidence of regularities in researchers’ publication patterns. It shows how these patterns can be used to automatically assign papers to individual and remove papers authored by their homonyms. Two types of patterns were found: 1) at the individual researchers’ level and 2) at the level of disciplines. On the whole, these patterns allow the construction of an algorithm that provides assignation information on at least one paper for 11,105 (82.4%) out of all 13,479 researchers—with a very low percentage of false positives (3.2%)

    ORCID Integration Among Publishing and Funding Organizations: An Examination of Process and Rationale

    Get PDF
    This study examines how a sample of scholarly publishers and granting organizations have integrated the Open Researcher and Contributor ID (ORCID) into their grant application and manuscript submission workflows. The study was conducted to discover what benefits these organizations gain from using the ORCID unique author identifiers and how effective they are at introducing scholars to ORCID as a service. The data was collected through interviews of representatives from a sample of publishing and funding organizations: the National Institutes of Health, the U.S. Department of Energy's Office of Science and Technical Information, the Wellcome Trust, Autism Speaks, Elsevier, and Oxford University Press. A representative from eJournalPress, a software company that provides manuscript management tools was also interviewed. The result is an analysis of best practices for ORCID integration at these types of organizations and suggestions for improvement. The conclusions drawn are generalizable to other institutions seeking to adopt ORCID themselves.Master of Science in Library Scienc

    作者主题模型及其改进的方法与应用研究综述

    Get PDF
    [目的/意义]作者主题模型作为近年来计算机领域关注度较高的新型概率模型,在文本挖掘与自然语言处理等方向已有广泛应用。分析国内外作者主题模型及其改进的思路与应用,更好地把握其研究现状,以期为计算机、图书情报等相关领域科研人员提供参考。[方法/过程]本文选取Web of Science核心数据库、DBLP及中国知网(CNKI)数据库作为文献来源,通过制定检索规则、去重及人工判读等操作提炼出关于作者主题模型及其改进方法的文献集,从模型应用过程的视角,结合文献分析法对现有研究进行总结归纳。[结果/结论]通过分析发现,现有相关研究已形成较为完整的分析流程,且模型的改进角度、适用领域也日益多样化。但性能优化、模型评价指标的规范完善以及在图书情报领域的进一步应用等方面仍有待深入探索。</p
    corecore