Search CORE

472 research outputs found

Recommended from our members

A Semantic Network for Modeling Biological Knowledge in Multiple Databases

Author: Greenblatt Marc
Stone Jeffrey
Wu Xindong
Publication venue: CSUSB ScholarWorks
Publication date: 04/02/2015
Field of study

We have developed a semantic network of biological terminology to aid in the retrieval and integration of biological information from a variety of disparate information sources. Our semantic network strives to provide a categorization of biological concepts and relationships among these concepts. The semantic network will impart a knowledge structure through which computers can reason and draw conclusions about biological data objects and will provide a federated view of the many disparate databases of interest to biologists. In the development of our system, we have included the concepts from several established controlled vocabularies, chief among them being the National Library of Medicine\u27s Unified Medical language System (UMLS). While the UMLS Metathesaurus provides an excellent controlled vocabulary, we have found their semantic network lacking in sufficient detail to be useful as a tool for categorization of biological concepts in databases. We would like to provide a categorization of concepts that provides finer detail than their semantic network without the considerable size and complexity of their Metathesaurus. Our complete semantic network consists of 183 semantic types and 69 relationships

CSUSB ScholarWorks

Mapping the Gene Ontology Into the Unified Medical Language System

Author: Lomax Jane
McCray Alexa T.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2004
Field of study

We have recently mapped the Gene Ontology (GO), developed by the Gene Ontology Consortium, into the National Library of Medicine's Unified Medical Language System (UMLS). GO has been developed for the purpose of annotating gene products in genome databases, and the UMLS has been developed as a framework for integrating large numbers of disparate terminologies, primarily for the purpose of providing better access to biomedical information sources. The mapping of GO to UMLS highlighted issues in both terminology systems. After some initial explorations and discussions between the UMLS and GO teams, the GO was integrated with the UMLS. Overall, a total of 23% of the GO terms either matched directly (3%) or linked (20%) to existing UMLS concepts. All GO terms now have a corresponding, official UMLS concept, and the entire vocabulary is available through the web-based UMLS Knowledge Source Server. The mapping of the Gene Ontology, with its focus on structures, processes and functions at the molecular level, to the existing broad coverage UMLS should contribute to linking the language and practices of clinical medicine to the language and practices of genomics

Crossref

Directory of Open Access Journals

PubMed Central

Abstraction, extension and structural auditing with the UMLS semantic network

Author: Chen Yan
Publication venue: Digital Commons @ NJIT
Publication date: 27/01/2008
Field of study

The Unified Medical Language System (UMLS) is a two-level biomedical terminological knowledge base, consisting of the Metathesaurus (META) and the Semantic Network (SN), which is an upper-level ontology of broad categories called semantic types (STs). The two levels are related via assignments of one or more STs to each concept of the META. Although the SN provides a high-level abstraction for the META, it is not compact enough. Various metaschemas, which are compact higher-level abstraction networks of the SN, have been derived. A methodology is presented to evaluate and compare two given metaschemas, based on their structural properties. A consolidation algorithm is designed to yield a consolidated metaschema maintaining the best and avoiding the worst of the two given metaschemas. The methodology and consolidation algorithm were applied to the pair of heuristic metaschemas, the top-down metaschema and the bottom-up metaschema, which have been derived from two studies involving two groups of UMLS experts. The results show that the consolidated metaschema has better structural properties than either of the two input metaschemas. Better structural properties are expected to lead to better utilization of a metaschema in orientation and visualization of the SN. Repetitive consolidation, which leads to further structural improvements, is also shown. The META and SN were created in the absence of a comprehensive curated genomics terminology. The internal consistency of the SN\u27s categories which are relevant to genomics is evaluated and changes to improve its ability to express genomic knowledge are proposed. The completeness of the SN with respect to genomic concepts is evaluated and conesponding extensions to the SN to fill identified gaps are proposed. Due to the size and complexity of the UMLS, errors are inevitable. A group auditing methodolgy is presented, where the ST assignments for groups of similar concepts are audited. The extent of an ST, which is the group of all concepts assigned this ST, is divided into groups of concepts that have been assigned exactly the same set of STs. An algorithm finds subgroups of suspicious concepts. The auditor is presented with these subgroups, which purportedly exhibit the same semantics, and thus he will notice different concepts with wrong or missing ST assignments. Another methodology partitions these groups into smaller, singly rooted, hierarchically organized sets used to audit the hierarchical relationships. The algorithmic methodologies are compared with a comprehensive manual audit and show a very high error recall with a much higher precision than the manual exhaustive review

Digital Commons @ New Jersey Institute of Technology (NJIT)

A Semantic Framework Supporting Multilayer Networks Analysis for Rare Diseases

Author: Capuano N.
Foggia P.
Greco L.
Ritrovato P.
Publication venue: 'IGI Global'
Publication date: 01/01/2022
Field of study

Understanding the role played by genetic variations in diseases, exploring genomic variants, and discovering disease-associated loci are among the most pressing challenges of genomic medicine. A huge and ever-increasing amount of information is available to researchers to address these challenges. Unfortunately, it is stored in fragmented ontologies and databases, which use heterogeneous formats and poorly integrated schemas. To overcome these limitations, the authors propose a linked data approach, based on the formalism of multilayer networks, able to integrate and harmonize biomedical information from multiple sources into a single dense network covering different aspects on Neuroendocrine Neoplasms (NENs). The proposed integration schema consists of three interconnected layers representing, respectively, information on the disease, on the affected genes, on the related biological processes and molecular functions. An easy-to-use client-server application was also developed to browse and search for information on the model supporting multilayer network analysis

Archivio della Ricerca - Università di Salerno

Enriching and designing metaschemas for the UMLS semantic network

Author: Zhang Li
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2004
Field of study

The disparate terminologies used by various biomedical applications or professionals make the communication between them more difficult. The Unified Medical Language System (UMLS) of the National Library of Medicine (NLM) is an attempt to integrate different medical terminologies into a unified representation framework to improve decision making and the quality of patient care as well as research in the health-care field. Metathesaurus (META) and Semantic Network (SN) are two main components of the UMLS system, where the SN provides a high-level abstract of the concepts in the META. This dissertation addresses three problems of the SN. First, the SN\u27s two-tree structure is restrictive because it does not allow a semantic type to be a specialization of several other semantic types. This restriction leads to the omission of some subsumption knowledge in the SN. Secondly, the SN is large and complex for comprehension purposes and it does not come with a pictorial representation for users. As a partial solution for this problem, several metaschemas were previously built as higher-level abstractions for the SN to help users\u27 orientation. Third, there is no efficient method to evaluate each metaschema. There is no technique to obtain a consolidated metaschema acceptable for a majority of the UMLS\u27s users. In this dissertation work the author attacked the described problems by using the following approaches. (1) The SN was expanded into the Enriched Semantic Network (ESN), a multiple subsumption structure with a directed acyclic graph (DAG) IS-A hierarchy, allowing a semantic type to have multiple parents. New viable IS-A links were added as warranted. Two methodologies were presented to identify and add new viable IS-A links. The ESN serves as an extended high-level abstract of the META. (2) The ESN\u27s semantic relationship distribution and concept configuration were studied. Rules were defined to derive the ESN\u27s semantic relationship distribution from the current SN\u27s semantic relationship distribution. A mapping function was defined to map the SN\u27s concept configuration to the ESN\u27s concept configuration, avoiding redundant classifications in the ESN\u27s concept configuration. (3) Several new metaschemas for the SN and the ESN were built and evaluated based on several different partitioning techniques. Each of these metaschema can serve as a higher-level abstraction of the SN (or the ESN)

Digital Commons @ New Jersey Institute of Technology (NJIT)

Methods and trends of biomedical and genomic information retrieval based on semantic relations of thesauri and MeSH

Author: Morán Reyes Ariel Antonio
Naumis Peñas Catalina
Publication venue: UNAM, Instituto de Investigaciones Bibliotecológicas y de la Información
Publication date: 01/01/2016
Field of study

There are two methods of retrieving information from documents in the field of genomic science and medicine in general, namely: 1) through the combined use of associations determined by the Medical Subject Headings, and 2) by employing specific terminologies, such as in folksonomies, alternative medical-genomic terms in use in the general language, or acronyms or apocopes from the genomics field. To some extent, many thinkers and indexers hold that the combination of two methods may be the best approach. While few authors advocate for keeping the structure of controlled vocabularies, built up over many years of content interpretation, unchanged, there are numerous proposals for expanding the search horizons of thesauri, whether through social cataloging, algorithmic domain analyses that contrast indicators or the semantic web using markers of meaningful semantic lexicons contained in digitized text

E-LIS

Elsevier - Publisher Connector

Crossref

Structural auditing methodologies for controlled terminologies

Author: Min Hua
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2006
Field of study

Several auditing methodologies for large controlled terminologies are developed. These are applied to the Unified Medical Language System XXXX and the National Cancer Institute Thesaurus (NCIT). Structural auditing methodologies are based on the structural aspects such as IS-A hierarchy relationships groups of concepts assigned to semantic types and groups of relationships defined for concepts. Structurally uniform groups of concepts tend to be semantically uniform. Structural auditing methodologies focus on concepts with unlikely or rare configuration. These concepts have a high likelihood for errors. One of the methodologies is based on comparing hierarchical relationships between the META and SN, two major knowledge sources of the UMLS. In general, a correspondence between them is expected since the SN hierarchical relationships should abstract the META hierarchical relationships. It may indicate an error when a mismatch occurs. The UMLS SN has 135 categories called semantic types. However, in spite of its medium size, the SN has limited use for comprehension purposes because it cannot be easily represented in a pictorial form, it has many (about 7,000) relationships. Therefore, a higher-level abstraction for the SN called a metaschema, is constructed. Its nodes are meta-semantic types, each representing a connected group of semantic types of the SN. One of the auditing methodologies is based on a kind of metaschema called a cohesive metaschema. The focus is placed on concepts of intersections of meta-semantic types. As is shown, such concepts have high likelihood for errors. Another auditing methodology is based on dividing the NCIT into areas according to the roles of its concepts. Moreover, each multi-rooted area is further divided into pareas that are singly rooted. Each p-area contains a group of structurally and semantically uniform concepts. These groups, as well as two derived abstraction networks called taxonomies, help in focusing on concepts with potential errors. With genomic research being at the forefront of bioscience, this auditing methodology is applied to the Gene hierarchy as well as the Biological Process hierarchy of the NCIT, since processes are very important for gene information. The results support the hypothesis that the occurrence of errors is related to the size of p-areas. Errors are more frequent for small p-areas

Digital Commons @ New Jersey Institute of Technology (NJIT)

PhenoHM: human–mouse comparative phenome–genome server

Author: Amberger
Anil G. Jegga
Aronson
Ashburner
Becker
Bilder
Bodenreider
Bogue
Botstein
Bruce J. Aronow
Burgun
Clarke
Davis
Divya Sardana
Eppig
Freimer
Groth
Hamosh
Jing Chen
Johnson
Kahraman
Korbel
Lussier
Lussier
Maglott
Morgan
Nishanth Vepachedu
O’Brien
Perez-Iratxeta
Ranga Chandra Gudivada
Robinson
Shannon
Smith
Suresh Vasa
Tatusova
Publication venue: Oxford University Press
Publication date
Field of study

PhenoHM is a human–mouse comparative phenome–genome server that facilitates cross-species identification of genes associated with orthologous phenotypes (http://phenome.cchmc.org; full open access, login not required). Combining and extrapolating the knowledge about the roles of individual gene functions in the determination of phenotype across multiple organisms improves our understanding of gene function in normal and perturbed states and offers the opportunity to complement biologically the rapidly expanding strategies in comparative genomics. The Mammalian Phenotype Ontology (MPO), a structured vocabulary of phenotype terms that leverages observations encompassing the consequences of mouse gene knockout studies, is a principal component of mouse phenotype knowledge source. On the other hand, the Unified Medical Language System (UMLS) is a composite collection of various human-centered biomedical terminologies. In the present study, we mapped terms reciprocally from the MPO to human disease concepts such as clinical findings from the UMLS and clinical phenotypes from the Online Mendelian Inheritance in Man knowledgebase. By cross-mapping mouse–human phenotype terms, extracting implicated genes and extrapolating phenotype-gene associations between species PhenoHM provides a resource that enables rapid identification of genes that trigger similar outcomes in human and mouse and facilitates identification of potentially novel disease causal genes. The PhenoHM server can be accessed freely at http://phenome.cchmc.org

Crossref

PubMed Central