573 research outputs found
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Communityâs Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by ConselleriÌa
de Cultura, EducacioÌn e OrdenacioÌn Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank InÌigo GarciaÌ -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Recommended from our members
The classification of gene products in the molecular biology domain: Realism, objectivity, and the limitations of the Gene Ontology
Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology.
Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented.
Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology.
Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve.
Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled âopen ontologiesâ for gene products that can work alongside more traditional, âtop-downâ developed vocabularies
Recommended from our members
A meta-information structure for representing arguments in science text
The research for this thesis has been concerned with defining and demonstrating the existence of certain semantic elements in English natural language science text which can be called metainformation. Meta-information is described as being the organisational-, rather than the conceptual properties of an author's 'message' in text. Conceptual information is that subject-related output from a document which readers assimilate or synthesise with their current state-of-knowledge. Meta-information reflects the organisation or structural format used by an author to present conceptual information for transfer from text to readers. The example used here to demonstrate the existence of meta-information, is a format for the presentation of empirical argument in science text. At its most simplep a meta-informational element could be a report section-heading like, INTRODUCTION, which describes (we assume), the contents of the subsequent text. At a lower level of analysis the phrase, 'This paper describes contains some semantic inference that the complete statement is one of an introductory nature; thereforep such a statement could be labelled as one of INTRODUCTION for meta-informational purposes. A 'grammar' or set of meta-informational elements, has been developed as a means of identifying certain semantic aspects of text. This grammar is based on some experimental evidence and the consensus view of readers and writers of science text who produced what has been called a conventional format for empirical argument presentation. An initial set of rules for implementing this grammar have also been developed. The rules have been tested for replicability with positive results. Although analysis of full text hasshown deviation from a 'conventional argument structure readers' summaries of the same text conform to this structure. Thus, a model of the phenomenQn of information transfer from text to readers, which includes a structural transformation process based on the experimental results, has been built. A computer simulation is given to demonstrate the model in an inter-active program-user system designed to produce summaries of whole text. The thesis is that evidence exists for the presence of meta-information in science text and that if a grammar appropriate to the kind of output information required by users is built, highly structured text could be produced so that the process of information transfer is optimised
Purposive variation in recordkeeping in the academic molecular biology laboratory
This thesis presents an investigation into the role played by laboratory records in the disciplinary discourse of academic molecular biology laboratories.
The motivation behind this study stems from two areas of concern. Firstly, the laboratory record has received comparatively little attention as a linguistic genre in spite of its central role in the daily work of laboratory scientists. Secondly, laboratory records have become a focus for technologically driven change through the advent of computing systems that aim to support a transition away from the traditional paper-based approach towards electronic recordkeeping. Electronic recordkeeping raises the potential for increased sharing of laboratory records across laboratory communities. However, the uptake of electronic laboratory notebooks has been, and remains, markedly low in academic laboratories.
The investigation employs a multi-perspective research framework combining ethnography, genre analysis, and reading protocol analysis in order to evaluate both the organizational practices and linguistic practices at work in laboratory recordkeeping, and to examine these practices from the viewpoints of both producers and consumers of laboratory records. Particular emphasis is placed on assessing variation in the practices used by different scientists when keeping laboratory records, and on assessing the types of articulation work used to achieve mutual intelligibility across laboratory members.
The findings of this investigation indicate that the dominant viewpoint held by laboratory staff other than principal investigators conceptualized laboratory records as a personal resource rather than a community archive. Readers other than the original author relied almost exclusively on the recontextualization of selected information from laboratory records into âpublic genresâ such as laboratory talks, research articles, and progress reports as the preferred means of accessing the information held in the records. The consistent use of summarized forms of recording experimental data rendered most laboratory records as both unreliable and of limited usability in the records management sense that they did not form full and accurate descriptions that could support future organizational activities.
These findings offer a counterpoint to other studies, notably a number of studies undertaken as part of technology developments for electronic recordkeeping, that report sharing of laboratory records or assume a âcyberbolicâ view of laboratory records as a shared resource
Engines of Order
Over the last decades, and in particular since the widespread adoption of the Internet, encounters with algorithmic procedures for âinformation retrievalâ â the activity of getting some piece of information out of a col-lection or repository of some kind â have become everyday experiences for most people in large parts of the world
Study on open science: The general state of the play in Open Science principles and practices at European life sciences institutes
Nowadays, open science is a hot topic on all levels and also is one of the priorities of the European Research Area. Components that are commonly associated with open science are open access, open data, open methodology, open source, open peer review, open science policies and citizen science. Open science may a great potential to connect and influence the practices of researchers, funding institutions and the public. In this paper, we evaluate the level of openness based on public surveys at four European life sciences institute
Third International Symposium on Artificial Intelligence, Robotics, and Automation for Space 1994
The Third International Symposium on Artificial Intelligence, Robotics, and Automation for Space (i-SAIRAS 94), held October 18-20, 1994, in Pasadena, California, was jointly sponsored by NASA, ESA, and Japan's National Space Development Agency, and was hosted by the Jet Propulsion Laboratory (JPL) of the California Institute of Technology. i-SAIRAS 94 featured presentations covering a variety of technical and programmatic topics, ranging from underlying basic technology to specific applications of artificial intelligence and robotics to space missions. i-SAIRAS 94 featured a special workshop on planning and scheduling and provided scientists, engineers, and managers with the opportunity to exchange theoretical ideas, practical results, and program plans in such areas as space mission control, space vehicle processing, data analysis, autonomous spacecraft, space robots and rovers, satellite servicing, and intelligent instruments
- âŠ