2,445 research outputs found
Biomedical ontology alignment: An approach based on representation learning
While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results
Recommended from our members
Global integration of public sector information
This paper deals with technological methods for consolidating assets lists of available public sector information (PSI) for re-use. In this direction, the effort is to review the state of the art in delivering access to PSI throughout the world and to prioritize the necessary engagements for joining available PSI catalogues. We propose an architectural framework grounded on Semantic Web technologies to deliver a global platform for federated searching. A speculative survey of available PSI portals is presented, and the initial implementation, results, and analysis of the proposed architecture are covered in detail
Crowdsourcing Linked Data on listening experiences through reuse and enhancement of library data
Research has approached the practice of musical reception in a multitude of ways, such as the analysis of professional critique, sales figures and psychological processes activated by the act of listening. Studies in the Humanities, on the other hand, have been hindered by the lack of structured evidence of actual experiences of listening as reported by the listeners themselves, a concern that was voiced since the early Web era. It was however assumed that such evidence existed, albeit in pure textual form, but could not be leveraged until it was digitised and aggregated. The Listening Experience Database (LED) responds to this research need by providing a centralised hub for evidence of listening in the literature. Not only does LED support search and reuse across nearly 10,000 records, but it also provides machine-readable structured data of the knowledge around the contexts of listening. To take advantage of the mass of formal knowledge that already exists on the Web concerning these contexts, the entire framework adopts Linked Data principles and technologies. This also allows LED to directly reuse open data from the British Library for the source documentation that is already published. Reused data are re-published as open data with enhancements obtained by expanding over the model of the original data, such as the partitioning of published books and collections into individual stand-alone documents. The database was populated through crowdsourcing and seamlessly incorporates data reuse from the very early data entry phases. As the sources of the evidence often contain vague, fragmentary of uncertain information, facilities were put in place to generate structured data out of such fuzziness. Alongside elaborating on these functionalities, this article provides insights into the most recent features of the latest instalment of the dataset and portal, such as the interlinking with the MusicBrainz database, the relaxation of geographical input constraints through text mining, and the plotting of key locations in an interactive geographical browser
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
An algebra of qualitative taxonomical relations for ontology alignments
inants2015aInternational audienceAlgebras of relations were shown useful in managing ontology alignments. They make it possible to aggregate alignments disjunctively or conjunctively and to propagate alignments within a network of ontologies. The previously considered algebra of relations contains taxonomical relations between classes. However, compositional inference using this algebra is sound only if we assume that classes which occur in alignments have nonempty extensions. Moreover, this algebra covers relations only between classes. Here we introduce a new algebra of relations, which, first, solves the limitation of the previous one, and second, incorporates all qualitative taxonomical relations that occur between individuals and concepts, including the relations "is a" and "is not". We prove that this algebra is coherent with respect to the simple semantics of alignments
Initial Implementation of a Comparative Data Analysis Ontology
Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species) are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: “Operational Taxonomic Units” (OTUs), representing the entities to be compared; “character-state data” representing the observations compared among OTUs; “phylogenetic tree”, representing the historical path of evolution among the entities; and “transitions”, the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL), we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO). CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc.) that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research
Automated extension of biomedical ontologies
Developing and extending a biomedical ontology is a very demanding
process, particularly because biomedical knowledge is diverse, complex
and continuously changing and growing. Existing automated
and semi-automated techniques are not tailored to handling the issues
in extending biomedical ontologies.
This thesis advances the state of the art in semi-automated ontology
extension by presenting a framework as well as methods and
methodologies for automating ontology extension specifically designed
to address the features of biomedical ontologies.The overall strategy is
based on first predicting the areas of the ontology that are in need of
extension and then applying ontology learning and ontology matching
techniques to extend them. A novel machine learning approach for
predicting these areas based on features of past ontology versions was
developed and successfully applied to the Gene Ontology. Methods
and techniques were also specifically designed for matching biomedical
ontologies and retrieving relevant biomedical concepts from text,
which were shown to be successful in several applications.O desenvolvimento e extensão de uma ontologia biomédica é um processo
muito exigente, dada a diversidade, complexidade e crescimento
contĂnuo do conhecimento biomĂ©dico. As tĂ©cnicas existentes nesta
área não estão preparadas para lidar com os desafios da extensão de
uma ontologia biomédica.
Esta tese avança o estado da arte na extensão semi-automática de ontologias,
apresentando uma framework assim como métodos e metodologias
para a automação da extensão de ontologias especificamente desenhados
tendo em conta as caracterĂsticas das ontologias biomĂ©dicas.
A estratégia global é baseada em primeiro prever quais as áreas da ontologia
que necessitam extensão, e depois usá-las como enfoque para
técnicas de alinhamento e aprendizagem de ontologias, com o objectivo
de as estender. Uma nova estratégia de aprendizagem automática
para prever estas áreas baseada em atributos de antigas versões de
ontologias foi desenvolvida e testada com sucesso na Gene Ontology.
Foram também especificamente desenvolvidos métodos e técnicas para
o alinhamento de ontologias biomédicas e extracção de conceitos relevantes
de texto, cujo sucesso foi demonstrado em várias aplicações.Fundação para a Ciência e a Tecnologi
- …