10,049 research outputs found
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd
From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web
In the process of scientific research, many information objects are
generated, all of which may remain valuable indefinitely. However, artifacts
such as instrument data and associated calibration information may have little
value in isolation; their meaning is derived from their relationships to each
other. Individual artifacts are best represented as components of a life cycle
that is specific to a scientific research domain or project. Current cataloging
practices do not describe objects at a sufficient level of granularity nor do
they offer the globally persistent identifiers necessary to discover and manage
scholarly products with World Wide Web standards. The Open Archives
Initiative's Object Reuse and Exchange data model (OAI-ORE) meets these
requirements. We demonstrate a conceptual implementation of OAI-ORE to
represent the scientific life cycles of embedded networked sensor applications
in seismology and environmental sciences. By establishing relationships between
publications, data, and contextual research information, we illustrate how to
obtain a richer and more realistic view of scientific practices. That view can
facilitate new forms of scientific research and learning. Our analysis is
framed by studies of scientific practices in a large, multi-disciplinary,
multi-university science and engineering research center, the Center for
Embedded Networked Sensing (CENS).Comment: 28 pages. To appear in the Journal of the American Society for
Information Science and Technology (JASIST
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Modeling Faceted Browsing with Category Theory for Reuse and Interoperability
Faceted browsing (also called faceted search or faceted navigation) is an exploratory search model where facets assist in the interactive navigation of search results. Facets are attributes that have been assigned to describe resources being explored; a faceted taxonomy is a collection of facets provided by the interface and is often organized as sets, hierarchies, or graphs. Faceted browsing has become ubiquitous with modern digital libraries and online search engines, yet the process is still difficult to abstractly model in a manner that supports the development of interoperable and reusable interfaces. We propose category theory as a theoretical foundation for faceted browsing and demonstrate how the interactive process can be mathematically abstracted in order to support the development of reusable and interoperable faceted systems.
Existing efforts in facet modeling are based upon set theory, formal concept analysis, and light-weight ontologies, but in many regards they are implementations of faceted browsing rather than a specification of the basic, underlying structures and interactions. We will demonstrate that category theory allows us to specify faceted objects and study the relationships and interactions within a faceted browsing system. Resulting implementations can then be constructed through a category-theoretic lens using these models, allowing abstract comparison and communication that naturally support interoperability and reuse.
In this context, reuse and interoperability are at two levels: between discrete systems and within a single system. Our model works at both levels by leveraging category theory as a common language for representation and computation. We will establish facets and faceted taxonomies as categories and will demonstrate how the computational elements of category theory, including products, merges, pushouts, and pullbacks, extend the usefulness of our model. More specifically, we demonstrate that categorical constructions such as the pullback and pushout operations can help organize and reorganize facets; these operations in particular can produce faceted views containing relationships not found in the original source taxonomy. We show how our category-theoretic model of facets relates to database schemas and discuss how this relationship assists in implementing the abstractions presented.
We give examples of interactive interfaces from the biomedical domain to help illustrate how our abstractions relate to real-world requirements while enabling systematic reuse and interoperability. We introduce DELVE (Document ExpLoration and Visualization Engine), our framework for developing interactive visualizations as modular Web-applications in order to assist researchers with exploratory literature search. We show how facets relate to and control visualizations; we give three examples of text visualizations that either contain or interact with facets. We show how each of these visualizations can be represented with our model and demonstrate how our model directly informs implementation.
With our general framework for communicating consistently about facets at a high level of abstraction, we enable the construction of interoperable interfaces and enable the intelligent reuse of both existing and future efforts
Research Data: Who will share what, with whom, when, and why?
The deluge of scientific research data has excited the general public, as well as the scientific community, with the possibilities for better understanding of scientific problems, from climate to culture. For data to be available, researchers must be willing and able to share them. The policies of governments, funding agencies, journals, and university tenure and promotion committees also influence how, when, and whether research data are shared. Data are complex objects. Their purposes and the methods by which they are produced vary widely across scientific fields, as do the criteria for sharing them. To address these challenges, it is necessary to examine the arguments for sharing data and how those arguments match the motivations and interests of the scientific community and the public. Four arguments are examined: to make the results of publicly funded data available to the public, to enable others to ask new questions of extant data, to advance the state of science, and to reproduce research. Libraries need to consider their role in the face of each of these arguments, and what expertise and systems they require for data curation.
Data literacy conceptions, community capabilities
To delineate and describe the data literacy concept, a core set of data competencies are identified, and a further four varieties – each with a different focus of attention – are described. These are named research (an academic focus), classroom (a secondary educational focus), carpentry (a practical training focus) and inclusion (a community development focus). It is argued that the inclusion focus helps data literacy to be construed as a community capability. The capabilities approach enables us to see data literacy as conferring a holistic freedom to operate, where the technical gains only make sense within a framework of social functions and goals
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
Aerospace medicine and biology: A continuing bibliography with indexes (supplement 349)
This bibliography lists 149 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during April, 1991. Subject coverage includes: aerospace medicine and psychology, life support systems and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance
- …