20,175 research outputs found
Ontology-Based MEDLINE Document Classification
An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts. The Medical Subject Headings (MeSH) ontology is already available and used by MEDLINE indexers to annotate the conceptual content of biomedical articles. This paper presents a domain-independent method that uses the MeSH ontology inter-concept relationships to extend the existing MeSH-based representation of MEDLINE documents. The extension method is evaluated within a document triage task organized by the Genomics track of the 2005 Text REtrieval Conference (TREC). Our method for extending the representation of documents leads to an improvement of 17% over a non-extended baseline in terms of normalized utility, the metric defined for the task. The SVMlight software is used to classify documents
Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings
We consider the task of inferring is-a relationships from large text corpora.
For this purpose, we propose a new method combining hyperbolic embeddings and
Hearst patterns. This approach allows us to set appropriate constraints for
inferring concept hierarchies from distributional contexts while also being
able to predict missing is-a relationships and to correct wrong extractions.
Moreover -- and in contrast with other methods -- the hierarchical nature of
hyperbolic space allows us to learn highly efficient representations and to
improve the taxonomic consistency of the inferred hierarchies. Experimentally,
we show that our approach achieves state-of-the-art performance on several
commonly-used benchmarks
Using Neural Networks for Relation Extraction from Biomedical Literature
Using different sources of information to support automated extracting of
relations between biomedical concepts contributes to the development of our
understanding of biological systems. The primary comprehensive source of these
relations is biomedical literature. Several relation extraction approaches have
been proposed to identify relations between concepts in biomedical literature,
namely, using neural networks algorithms. The use of multichannel architectures
composed of multiple data representations, as in deep neural networks, is
leading to state-of-the-art results. The right combination of data
representations can eventually lead us to even higher evaluation scores in
relation extraction tasks. Thus, biomedical ontologies play a fundamental role
by providing semantic and ancestry information about an entity. The
incorporation of biomedical ontologies has already been proved to enhance
previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1
Multi-task Deep Neural Networks in Automated Protein Function Prediction
In recent years, deep learning algorithms have outperformed the state-of-the
art methods in several areas thanks to the efficient methods for training and
for preventing overfitting, advancement in computer hardware, the availability
of vast amount data. The high performance of multi-task deep neural networks in
drug discovery has attracted the attention to deep learning algorithms in
bioinformatics area. Here, we proposed a hierarchical multi-task deep neural
network architecture based on Gene Ontology (GO) terms as a solution to protein
function prediction problem and investigated various aspects of the proposed
architecture by performing several experiments. First, we showed that there is
a positive correlation between performance of the system and the size of
training datasets. Second, we investigated whether the level of GO terms on GO
hierarchy related to their performance. We showed that there is no relation
between the depth of GO terms on GO hierarchy and their performance. In
addition, we included all annotations to the training of a set of GO terms to
investigate whether including noisy data to the training datasets change the
performance of the system. The results showed that including less reliable
annotations in training of deep neural networks increased the performance of
the low performed GO terms, significantly. We evaluated the performance of the
system using hierarchical evaluation method. Mathews correlation coefficient
was calculated as 0.75, 0.49 and 0.63 for molecular function, biological
process and cellular component categories, respectively. We showed that deep
learning algorithms have a great potential in protein function prediction area.
We plan to further improve the DEEPred by including other types of annotations
from various biological data sources. We plan to construct DEEPred as an open
access online tool.Comment: 19 pages, 4 figures, 4 table
- …