171 research outputs found

    Conference Models to Bridge Micro and Macro Studies of Science

    Get PDF
    We propose using community-centered analyses and agent-based models of scientific gatherings such as conferences, symposia and workshops as a way to understand how scientific practices evolve and transition between local, community, and systems levels in science. We suggest using robotics as a case study of global, cross-cultural, interdisciplinary scientific practice. What is needed is a set of modeling frameworks for simulating both the internal and population dynamics of scientific gatherings. In this paper we make the case for conference models as a mid-level unit of analysis that can advance the ways scientists and citizens design systems for transferring and producing knowledge.Science of Science, Conferences, Community-Based Complex Models, Group Size, Methodology

    Which Conference Is That? A Case Study in Computer Science

    Get PDF
    Conferences play a major role in some disciplines such as computer science and are often used in research quality evaluation exercises. Differently from journals and books, for which ISSN and ISBN codes provide unambiguous keys, recognizing the conference series in which a paper was published is a rather complex endeavor: There is no unique code assigned to conferences, and the way their names are written may greatly vary across years and catalogs. In this article, we propose a technique for the entity resolution of conferences based on the analysis of different semantic parts of their names. We present the results of an investigation of our technique on a dataset of 42,395 distinct computer science conference names excerpted from the DBLP computer science repository,1 which we automatically link to different authority files. With suitable data cleaning, the precision of our record linkage algorithm can be as high as 94%. A comparison with results obtainable using state-of-the-art general-purpose record linkage algorithms rounds off the article, showing that our ad hoc solution largely outperforms them in terms of the quality of the results

    Linking named entities to Wikipedia

    Get PDF
    Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems

    OpenCitations Meta

    Full text link
    OpenCitations Meta is a new database that contains bibliographic metadata of scholarly publications involved in citations indexed by the OpenCitations infrastructure. It adheres to Open Science principles and provides data under a CC0 license for maximum reuse. The data can be accessed through a SPARQL endpoint, REST APIs, and dumps. OpenCitations Meta serves three important purposes. Firstly, it enables disambiguation of citations between publications described using different identifiers from various sources. For example, it can link publications identified by DOIs in Crossref and PMIDs in PubMed. Secondly, it assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs), to bibliographic resources without existing external persistent identifiers like DOIs. Lastly, by hosting the bibliographic metadata internally, OpenCitations Meta improves the speed of metadata retrieval for citing and cited documents. The database is populated through automated data curation, including deduplication, error correction, and metadata enrichment. The data is stored in RDF format following the OpenCitations Data Model, and changes and provenance information are tracked. OpenCitations Meta and its production. OpenCitations Meta currently incorporates data from Crossref, DataCite, and the NIH Open Citation Collection. In terms of semantic publishing datasets, it is currently the first in data volume.Comment: 26 pages, 7 figure

    CiteSeer x : A Scholarly Big Dataset

    Full text link
    Abstract. The CiteSeerx digital library stores and indexes research ar-ticles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeerx has been proven as a powerful resource in many data min-ing, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeerx is done using automated tech-niques. Although fairly accurate, these techniques still result in noisy metadata. Since the performance of models trained on these data highly depends on the quality of the data, we propose an approach to CiteSeerx metadata cleaning that incorporates information from an external data source. The result is a subset of CiteSeerx, which is substantially cleaner than the entire set. Our goal is to make the new dataset available to the research community to facilitate future work in Information Retrieval

    A New Approach to Journal and Conference Name Disambiguation through K-Means Clustering of Internet and Document Surrogates

    Get PDF
    Bibliometrics has a long history in Information Science. The validity of any bibliometric analysis depends on accurate citations. We introduce an approach that combines author names and Internet document surrogates with K-means clustering to disambiguate journal and conference titles automatically. To evaluate the quality this approach we used records from the Digital Bibliography & Library Project (DBLP). We found there are 2.54±1.52 authors per articles. A manual analysis of 125 articles selected at random from the 1.18 million DBLP citations revealed only seven article pairs from the same publication venue. We describe the changes in cluster properties as the number of articles increases from 100 to 25,000. Our findings suggest that additional features are required to disambiguate journal and conference names accurately. As 60.86% of the DBLP articles are published at conferences future efforts should focus on conference name disambiguation

    Citations: Indicators of Quality? The Impact Fallacy

    Get PDF
    We argue that citation is a composed indicator: short-term citations can be considered as currency at the research front, whereas long-term citations can contribute to the codification of knowledge claims into concept symbols. Knowledge claims at the research front are more likely to be transitory and are therefore problematic as indicators of quality. Citation impact studies focus on short-term citation, and therefore tend to measure not epistemic quality, but involvement in current discourses in which contributions are positioned by referencing. We explore this argument using three case studies: (1) citations of the journal Soziale Welt as an example of a venue that tends not to publish papers at a research front, unlike, for example, JACS; (2) Robert Merton as a concept symbol across theories of citation; and (3) the Multi-RPYS ("Multi-Referenced Publication Year Spectroscopy") of the journals Scientometrics, Gene, and Soziale Welt. We show empirically that the measurement of "quality" in terms of citations can further be qualified: short-term citation currency at the research front can be distinguished from longer-term processes of incorporation and codification of knowledge claims into bodies of knowledge. The recently introduced Multi-RPYS can be used to distinguish between short-term and long-term impacts.Comment: accepted for publication in Frontiers in Research Metrics and Analysis; doi: 10.3389/frma.2016.0000

    Rule based autonomous citation mining with TIERL

    Get PDF
    Citations management is an important task in managing digital libraries. Citations provide valuable information e.g., used in evaluating an author’s influences or scholarly quality (the impact factor of research journals). But although a reliable and effective autonomous citation management is essential, manual citation management can be extremely costly. Automatic citation mining on the other hand is a non-trivial task mainly due to non-conforming citation styles, spelling errors and the difficulty of reliably extracting text from PDF documents. In this paper we propose a novel rule-based autonomous citation mining technique, to address this important task. We define a set of common heuristics that together allow to improve the state of the art in automatic citation mining. Moreover, by first disambiguating citations based on venues, our technique significantly enhances the correct discovery of citations. Our experiments show that the proposed approach is indeed able to overcome limitations of current leading citation indexes such as ISI Web of Knowledge, Citeseer and Google Scholar
    corecore