Search CORE

30 research outputs found

Search Facets and Ranking in Geospatial Dataset Search

Author: Hervey Thomas
Kuhn Werner
Lafia Sara
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th International Conference on Geographic Information Science (GIScience 2021) - Part I
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

Author: Fan Lizhou
Hemphill Libby
Lafia Sara
Publication venue: 'Wiley'
Publication date: 23/05/2022
Field of study

Discovering authoritative links between publications and the datasets that they use can be a labor-intensive process. We introduce a natural language processing pipeline that retrieves and reviews publications for informal references to research datasets, which complements the work of data librarians. We first describe the components of the pipeline and then apply it to expand an authoritative bibliography linking thousands of social science studies to the data-related publications in which they are used. The pipeline increases recall for literature to review for inclusion in data-related collections of publications and makes it possible to detect informal data references at scale. We contribute (1) a novel Named Entity Recognition (NER) model that reliably detects informal data references and (2) a dataset connecting items from social science literature with datasets they reference. Together, these contributions enable future work on data reference, data citation networks, and data reuse.Comment: 13 pages, 7 figures, 3 table

arXiv.org e-Print Archive

Deep Blue Documents

Spatial Discovery and the Research Library: Linking Research Datasets and Documents

Author: Lafia Sara Katherine
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Academic libraries have always supported research across disciplines by integrating access to diverse contents and resources. They now have the opportunity to reinvent their role in facilitating interdisciplinary work by offering researchers new ways of sharing, curating, discovering, and linking research data. Spatial data and metadata support this process because location often integrates disciplinary perspectives, enabling researchers to make their own research data more discoverable, to discover data of other researchers, and to integrate data from multiple sources. The Center for Spatial Studies at the University of California, Santa Barbara (UCSB) and the UCSB Library are undertaking joint research to better enable the discovery of research data and publications. The research addresses the question of how to spatially enable data discovery in a setting that allows for mapping and analysis in a GIS while connecting the data to publications about them. It suggests a framework for an integrated data discovery mechanism and shows how publications may be linked to associated data sets exposed either directly or through metadata on Esri’s Open Data platform. The results demonstrate a simple form of linking data to publications through spatially referenced metadata and persistent identifiers. This linking adds value to research products and increases their discoverability across disciplinary boundaries. Current data publishing practices in academia result in datasets that are not easily discovered, hard to integrate across domains, and typically not linked to publications about them. For example, discovering that two datasets, such as archaeological observations and specimen data collections, share a spatial extent in Mesoamerica, is not currently supported, nor is it easy to get from those data sets to relevant publications or other documents. In our previous work, we had developed a basic linked metadata model relating spatially referenced datasets to documents. The research reported here applies the model to a collection of spatially referenced researcher datasets, capturing metadata and encoding them as linked open data. We use existing RDF vocabularies to triplify the metadata, to make them spatially explicit, and to link them thematically. Our latest research has produced a simple and extensible method for exposing metadata of research objects as a library service and for spatially integrating collections across repositories

Ezid

eScholarship - University of California

DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization

Author: Fan Lizhou
Hemphill Libby
Lafia Sara
Li Lingyao
Yang Fangyuan
Publication venue
Publication date: 26/05/2023
Field of study

Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter-university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine-readability of data and its documentation. There are opportunities to enhance dataset search by improving users' ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot-based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives' and institutional repositories' ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.Comment: 6 pages, 2 figures, and 1 table. Accepted to the 86th Annual Meeting of the Association for Information Science & Technolog

arXiv.org e-Print Archive

How and Why do Researchers Reference Data? A Study of Rhetorical Features and Functions of Data References in Academic Articles

Author: Bleckley David
Hemphill Libby
Lafia Sara
Moss Elizabeth
Thomer Andrea
Publication venue
Publication date: 16/02/2023
Field of study

Data reuse is a common practice in the social sciences. While published data play an essential role in the production of social science research, they are not consistently cited, which makes it difficult to assess their full scholarly impact and give credit to the original data producers. Furthermore, it can be challenging to understand researchers' motivations for referencing data. Like references to academic literature, data references perform various rhetorical functions, such as paying homage, signaling disagreement, or drawing comparisons. This paper studies how and why researchers reference social science data in their academic writing. We develop a typology to model relationships between the entities that anchor data references, along with their features (access, actions, locations, styles, types) and functions (critique, describe, illustrate, interact, legitimize). We illustrate the use of the typology by coding multidisciplinary research articles (n=30) referencing social science data archived at the Inter-university Consortium for Political and Social Research (ICPSR). We show how our typology captures researchers' interactions with data and purposes for referencing data. Our typology provides a systematic way to document and analyze researchers' narratives about data use, extending our ability to give credit to data that support research.Comment: 35 pages, 2 appendices, 1 tabl

arXiv.org e-Print Archive

The University of Arizona

Enabling the discovery of thematically related research objects with systematic spatializations

Author: Kuhn Werner
Lafia Sara
Last Christina
Publication venue
Publication date: 01/01/2019
Field of study

It is challenging for scholars to discover thematically related research in a multidisciplinary setting, such as that of a university library. In this work, we use spatialization techniques to convey the relatedness of research themes without requiring scholars to have specific knowledge of disciplinary search terminology. We approach this task conceptually by revisiting existing spatialization techniques and reframing them in terms of core concepts of spatial information, highlighting their different capacities. To apply our design, we spatialize masters and doctoral theses (two kinds of research objects available through a university library repository) using topic modeling to assign a relatively small number of research topics to the objects. We discuss and implement two distinct spaces for exploration: a field view of research topics and a network view of research objects. We find that each space enables distinct visual perceptions and questions about the relatedness of research themes. A field view enables questions about the distribution of research objects in the topic space, while a network view enables questions about connections between research objects or about their centrality. Our work contributes to spatialization theory a systematic choice of spaces informed by core concepts of spatial information. Its application to the design of library discovery tools offers two distinct and intuitive ways to gain insights into the thematic relatedness of research objects, regardless of the disciplinary terms used to describe them

Dagstuhl Research Online Publication Server

eScholarship - University of California

Explore Bristol Research

How do properties of data, their curation, and their funding relate to reuse?

Author: Akmon Dharma
Bleckley David
Hemphill Libby
Lafia Sara
Pienta Amy
Publication venue
Publication date: 17/06/2021
Field of study

Despite large public investments in facilitating the secondary use of data, there is little information about the specific factors that predict data’s reuse. Using data download logs from the Inter-university Consortium for Political and Social Research (ICPSR), this study examines how data properties, curation decisions, and repository funding models relate to data reuse. We find that datasets deposited by institutions, subject to many curatorial tasks, and whose access and preservation is funded externally are used more often. Our findings confirm that investments in data collection, curation, and preservation are associated with more data reuse.National Science Foundation grant 1930645 (LH, AP, DA) Institute of Museum and Library Services grant LG-37-19-0134-19 (LH, DA) National Institute of Drug Abuse contract number N01DA-14-5576 (AP)http://deepblue.lib.umich.edu/bitstream/2027.42/168212/5/Hemphill et al Data downloads.pdf4ae71d2a-01c0-4084-84c3-c32ce960e81c5836d8a9-776f-4cd5-ba6e-a0cfd10d555dSEL

PubMed Central

Deep Blue Documents

How and Why Do Researchers Reference Data ? A Study of Rhetorical Features and Functions of Data References in Academic Articles

Author: Bleckley David
Hemphill Libby
Lafia Sara
Moss Elizabeth
Thomer Andrea
Publication venue: Informatic and Data Science Journal STMIK Muhammadiyah Banten
Publication date
Field of study

La réutilisation des données est une pratique courante dans les sciences sociales. Il peut être difficile de comprendre les motivations pour référencer les données. Cet article étudie comment et pourquoi les chercheurs font référence aux données scientifiques dans leurs écrits universitaires. Nous illustrons l’utilisation de la typologie en codant la recherche multidisciplinaire d’ articles. La typologie offre un moyen systématique de documenter et d’analyser les récits des chercheurs

Bibliothèque numérique de l'enssib

Detecting Informal Data References in Academic Literature

Author: Hemphill Libby
Kim Jinseok
Ko Jeong-Woo
Lafia Sara
Moss Elizabeth
Thomer Andrea
Publication venue
Publication date: 22/07/2021
Field of study

The Inter-university Consortium for Political and Social Research (ICPSR) is developing a machine learning approach using natural language processing (NLP) to assist in the detection of informal data references. Formal data citations that reference unique identifiers are readily discoverable; however, informal references indicating research data reuse are challenging to infer and detect. We contribute a model that uses a combination of cues, such as the presence of indicator terms and syntactical patterns, to assign a likelihood score to dataset mentions and extract candidate data citations from academic text. In production, the model will support the evaluation of candidate documents for ingest into the ICPSR Bibliography of Data-related Literature. This work supports a larger effort to measure the impact of research data.http://deepblue.lib.umich.edu/bitstream/2027.42/168392/1/Detecting_Informal_Data_Refs.pdfDescription of Detecting_Informal_Data_Refs.pdf : PreprintSEL

Deep Blue Documents

Recommended from our members

Designing for Serendipity: Research Data Curation in Topic Spaces

Author: Lafia Sara Katherine
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Researchers seeking relevant data across disciplines confront the challenge of navigating technical descriptions. How can curation support the serendipitous discovery of related research data? Everyday spaces like bookshelves are designed to support browsing and exploration by placing similar resources closer together. Space and time are foundational ordering relations for knowledge organization. I ask how this ordering, which is well-established in the geographic context, can be translated to locate and organize research data in abstract topic spaces. This dissertation develops methods for making the latent topics of research metadata explicit. These methods produce spatial configurations where related research topics are co-located in neighborhoods. This has the potential to support serendipitous discovery by offering researchers ways to discover related data. I test this notion in three studies that develop topic spaces for research data curation. The first part of this dissertation in Chapter 2 focuses on supporting research data discovery with a common terminology. I develop a crosscutting base vocabulary of geospatial topics to help users discover related government data in a ubiquitous open civic data platform. Semantic annotation expands search terms by mapping users’ vernacular onto the language of metadata. In the second part of this dissertation, I shift away from addressing terminological search to supporting spatial curation by developing topic spaces. In Chapter 3, I develop two kinds of topic spaces for curating research theses and dissertations: landscapes and networks. I use topic modeling to determine the latent semantic similarity of research metadata and then produce topic spaces from these using spatialization techniques. In Chapter 4, I spatialize an institute’s multidisciplinary body of research, producing topic maps at two distinct levels of detail. Emerging spatial patterns, like centrality and proximity, support high-level narratives about cross-disciplinary research activities that complement the quantitative metrics currently cited in reviews of institutional research. Together, these three studies demonstrate strategies for developing topic spaces in which diverse, yet related, multidisciplinary research data are curated. Future research will extend these methods by tracing the impact of specific curatorial actions contributing to research data discovery and reuse

eScholarship - University of California