49 research outputs found

    End-user Searching of Web Resources: Subject Needs and Zero-hits

    Get PDF
    This study analyzed a log file capturing users' queries executed in the Web site of the University of Tennessee, Knoxville during March, 1997. The purpose of the study is three-fold: to understand what information needs the users of this Web site have, to investigate how successful these end-users are in searching for information, and to identify problems related to unsuccessful queries. Content analysis of each query focused on the type of information needs and the type of errors that caused a zero-hit result. Fifteen classes of information needs are identified based on content analysis of the queries; the most frequently occurred queries are searching for institutional unit and searching for academic information (counting for 40.0% ofthe total queries searched). The unsuccessful queries is more than 33.5% measured by zero-hit outcomes. Two types of errors that caused zero-hit are identified: syntactic and semantic errors. Syntactic errors occurred more often than semantic errors (53.6% vs. 46.4%). The fmdings suggest that end-users of Web resources need guidance and help in performing searches. Syntactic errors may be corrected by the search engine automatically, while semantic errors need a better information representation scheme from the Web site

    Revisiting the Data Lifecycle with Big Data Curation

    Get PDF
    As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data projec

    Cataloging for digital libraries: The TEI scheme and the TEI header

    Get PDF
    This article describes the uses and advantages of using the Text Encoding Initiative (TEI) guidelines for cataloging electronic texts. The TEI guidelines have been developed through an international and collaborative effort, and their applications in digital libraries such as the University of Virginia Electronic Text Center have required close collaboration between catalogers and humanities computing researchers. Detailed description and examples of the TEI header, a vehicle for meta-information written in SGML and the part of the TEI scheme most useful to librarians, are provided. Possible congruence between TEI headers and USMARC records implies that granularity of the TEI header and flexibility of the MARC record are simultaneously improved

    Ontology Alignment with Mappings Published in the Purdue Research Repository

    Get PDF
    ESIP has been collecting and hosting Earth and Environmental ontologies in the ESIP Ontology Portal. But these ontologies have very different degrees of quality and curation. One way to ascertain this degree of quality is to locate terms with similar semantics between two ontologies and upload these mappings into the Portal. This work describes how we provide backend mappings between classes in ontologies using Agreement Maker Light, a winning algorithm from the Ontology Alignment Evaluation Initiative. We use the OWL equivalence relationship to discover similarities between concept labels. The mappings, in the form of rdf triples are published in the Purdue University Research Repository and given Digital Object Identifiers

    A pilot “big data” education module curriculum for engineering graduate education: Development and implementation

    Get PDF
    Projects in engineering higher education increasingly produce data in the volume, variety, velocity, and need for veracity such that the output of the research is considered “Big Data”. While engineering faculty members do conceive of and direct the research producing this data, there may be gaps in faculty members’ knowledge in training graduate and undergraduate research assistants in the practical management of Big Data. The project described in this research paper details the development of a Big Data education module for graduate researchers in Electrical and Computer Engineering. The project has the following objectives: to document and describe current data management practices within a specific research group; to identify gaps in knowledge that need to be addressed in order for research assistants to successfully manage Big Data; and to create curricular interventions to address these gaps. This paper details the motivation, relevant literature, research methodology, curricular intervention, and pilot presentation of the Big Data module. Results indicate that the fundamental concepts governing the management of Big Data have been cursorily covered in previous coursework and that students are in need of a comprehensive introduction to the topic, contextualized to the work that they are performing in the research or classroom environment

    A pilot big data education modular curriculum for engineering graduate education: Development and implmentation

    Get PDF
    Engineering higher education increasingly produces data in the volume, variety, velocity, and need for veracity such that the output of the research is considered “Big Data”. While engineering faculty members do conceive of and direct the research producing this data, there may be gaps in faculty members’ knowledge in training graduate and undergraduate research assistants in the management of Big Data. The project described herein details the development of a Big Data education module for a group of graduate researchers and undergraduate research assistants in Electrical and Computer Engineering. This project has the following objectives: to document and describe current data management practices; to identify gaps in knowledge that need to be addressed in order for research assistants to successfully manage Big Data; and to create curricular interventions to address these gaps. This paper details the motivation, relevant literature, research methodology, curricular intervention, and pilot presentation of the module. Results indicate that, generally, students involved in Big Data projects need comprehensive introduction to the topic, which will be most effective when contextualized to the work that they are performing in the research or classroom environment

    Data Narratives: Increasing Scholarly Value

    Get PDF
    Data narratives or data stories have emerged as a new form of the scholarly communication focused on data. In this paper, we explore the potential value of data narratives and the requirements for data stories to enhance scholarly communication. We examine three types of data stories that form a continuum from the less to the more structured: the DataONE data stories, the Data Curation Profiles, and the Data Descriptors from the journal Scientific Data. We take the position that these data stories will increase the value of scholarly communication if they are linked to the datasets and to the publications that describe results, and have instructional value

    A Rigorous Uncertainty-Aware Quantification Framework Is Essential for Reproducible and Replicable Machine Learning Workflows

    Full text link
    The ability to replicate predictions by machine learning (ML) or artificial intelligence (AI) models and results in scientific workflows that incorporate such ML/AI predictions is driven by numerous factors. An uncertainty-aware metric that can quantitatively assess the reproducibility of quantities of interest (QoI) would contribute to the trustworthiness of results obtained from scientific workflows involving ML/AI models. In this article, we discuss how uncertainty quantification (UQ) in a Bayesian paradigm can provide a general and rigorous framework for quantifying reproducibility for complex scientific workflows. Such as framework has the potential to fill a critical gap that currently exists in ML/AI for scientific workflows, as it will enable researchers to determine the impact of ML/AI model prediction variability on the predictive outcomes of ML/AI-powered workflows. We expect that the envisioned framework will contribute to the design of more reproducible and trustworthy workflows for diverse scientific applications, and ultimately, accelerate scientific discoveries
    corecore