171 research outputs found
Conference Models to Bridge Micro and Macro Studies of Science
We propose using community-centered analyses and agent-based models of scientific gatherings such as conferences, symposia and workshops as a way to understand how scientific practices evolve and transition between local, community, and systems levels in science. We suggest using robotics as a case study of global, cross-cultural, interdisciplinary scientific practice. What is needed is a set of modeling frameworks for simulating both the internal and population dynamics of scientific gatherings. In this paper we make the case for conference models as a mid-level unit of analysis that can advance the ways scientists and citizens design systems for transferring and producing knowledge.Science of Science, Conferences, Community-Based Complex Models, Group Size, Methodology
Which Conference Is That? A Case Study in Computer Science
Conferences play a major role in some disciplines such as computer science and are often used in research quality evaluation exercises. Differently from journals and books, for which ISSN and ISBN codes provide unambiguous keys, recognizing the conference series in which a paper was published is a rather complex endeavor: There is no unique code assigned to conferences, and the way their names are written may greatly vary across years and catalogs. In this article, we propose a technique for the entity resolution of conferences based on the analysis of different semantic parts of their names. We present the results of an investigation of our technique on a dataset of 42,395 distinct computer science conference names excerpted from the DBLP computer science repository,1 which we automatically link to different authority files. With suitable data cleaning, the precision of our record linkage algorithm can be as high as 94%. A comparison with results obtainable using state-of-the-art general-purpose record linkage algorithms rounds off the article, showing that our ad hoc solution largely outperforms them in terms of the quality of the results
Linking named entities to Wikipedia
Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems
OpenCitations Meta
OpenCitations Meta is a new database that contains bibliographic metadata of
scholarly publications involved in citations indexed by the OpenCitations
infrastructure. It adheres to Open Science principles and provides data under a
CC0 license for maximum reuse. The data can be accessed through a SPARQL
endpoint, REST APIs, and dumps. OpenCitations Meta serves three important
purposes. Firstly, it enables disambiguation of citations between publications
described using different identifiers from various sources. For example, it can
link publications identified by DOIs in Crossref and PMIDs in PubMed. Secondly,
it assigns new globally persistent identifiers (PIDs), known as OpenCitations
Meta Identifiers (OMIDs), to bibliographic resources without existing external
persistent identifiers like DOIs. Lastly, by hosting the bibliographic metadata
internally, OpenCitations Meta improves the speed of metadata retrieval for
citing and cited documents. The database is populated through automated data
curation, including deduplication, error correction, and metadata enrichment.
The data is stored in RDF format following the OpenCitations Data Model, and
changes and provenance information are tracked. OpenCitations Meta and its
production. OpenCitations Meta currently incorporates data from Crossref,
DataCite, and the NIH Open Citation Collection. In terms of semantic publishing
datasets, it is currently the first in data volume.Comment: 26 pages, 7 figure
CiteSeer x : A Scholarly Big Dataset
Abstract. The CiteSeerx digital library stores and indexes research ar-ticles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeerx has been proven as a powerful resource in many data min-ing, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeerx is done using automated tech-niques. Although fairly accurate, these techniques still result in noisy metadata. Since the performance of models trained on these data highly depends on the quality of the data, we propose an approach to CiteSeerx metadata cleaning that incorporates information from an external data source. The result is a subset of CiteSeerx, which is substantially cleaner than the entire set. Our goal is to make the new dataset available to the research community to facilitate future work in Information Retrieval
A New Approach to Journal and Conference Name Disambiguation through K-Means Clustering of Internet and Document Surrogates
Bibliometrics has a long history in Information Science. The validity of any bibliometric analysis depends on accurate citations. We introduce an approach that combines author names and Internet document surrogates with K-means clustering to disambiguate journal and conference titles automatically. To evaluate the quality this approach we used records from the Digital Bibliography & Library Project (DBLP). We found there are 2.54±1.52 authors per articles. A manual analysis of 125 articles selected at random from the 1.18 million DBLP citations revealed only seven article pairs from the same publication venue. We describe the changes in cluster properties as the number of articles increases from 100 to 25,000. Our findings suggest that additional features are required to disambiguate journal and conference names accurately. As 60.86% of the DBLP articles are published at conferences future efforts should focus on conference name disambiguation
Citations: Indicators of Quality? The Impact Fallacy
We argue that citation is a composed indicator: short-term citations can be
considered as currency at the research front, whereas long-term citations can
contribute to the codification of knowledge claims into concept symbols.
Knowledge claims at the research front are more likely to be transitory and are
therefore problematic as indicators of quality. Citation impact studies focus
on short-term citation, and therefore tend to measure not epistemic quality,
but involvement in current discourses in which contributions are positioned by
referencing. We explore this argument using three case studies: (1) citations
of the journal Soziale Welt as an example of a venue that tends not to publish
papers at a research front, unlike, for example, JACS; (2) Robert Merton as a
concept symbol across theories of citation; and (3) the Multi-RPYS
("Multi-Referenced Publication Year Spectroscopy") of the journals
Scientometrics, Gene, and Soziale Welt. We show empirically that the
measurement of "quality" in terms of citations can further be qualified:
short-term citation currency at the research front can be distinguished from
longer-term processes of incorporation and codification of knowledge claims
into bodies of knowledge. The recently introduced Multi-RPYS can be used to
distinguish between short-term and long-term impacts.Comment: accepted for publication in Frontiers in Research Metrics and
Analysis; doi: 10.3389/frma.2016.0000
Rule based autonomous citation mining with TIERL
Citations management is an important task in managing digital libraries. Citations provide valuable information e.g., used in evaluating an author’s influences or scholarly quality (the impact factor of research journals). But although a reliable and effective autonomous citation management is essential, manual citation management can be extremely costly. Automatic citation mining on the other hand is a non-trivial task mainly due to non-conforming citation styles, spelling errors and the difficulty of reliably extracting text from PDF documents. In this paper we propose a novel rule-based autonomous citation mining technique, to address this important task. We define a set of common heuristics that together allow to improve the state of the art in automatic citation mining. Moreover, by first disambiguating citations based on venues, our technique significantly enhances the correct discovery of citations. Our experiments show that the proposed approach is indeed able to overcome limitations of current leading citation indexes such as ISI Web of Knowledge, Citeseer and Google Scholar
- …