11,618 research outputs found
Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework
Knowing the location of a protein within the cell is important for
understanding its function, role in biological processes, and potential use as
a drug target. Much progress has been made in developing computational methods
that predict single locations for proteins, assuming that proteins localize to
a single location. However, it has been shown that proteins localize to
multiple locations. While a few recent systems have attempted to predict
multiple locations of proteins, they typically treat locations as independent
or capture inter-dependencies by treating each locations-combination present in
the training set as an individual location-class. We present a new method and a
preliminary system we have developed that directly incorporates
inter-dependencies among locations into the multiple-location-prediction
process, using a collection of Bayesian network classifiers. We evaluate our
system on a dataset of single- and multi-localized proteins. Our results,
obtained by incorporating inter-dependencies are significantly higher than
those obtained by classifiers that do not use inter-dependencies. The
performance of our system on multi-localized proteins is comparable to a top
performing system (YLoc+), without restricting predictions to be based only on
location-combinations present in the training set.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
A lexicographic multi-objective genetic algorithm for multi-label correlation-based feature selection
This paper proposes a new Lexicographic multi-objective Genetic Algorithm for Multi-Label Correlation-based Feature Selection (LexGA-ML-CFS), which is an extension of the previous single-objective Genetic Algorithm for Multi-label Correlation-based Feature Selection (GA-ML-CFS). This extension uses a LexGA as a global search method for generating candidate feature subsets. In our experiments, we compare the results obtained by LexGA-ML-CFS with the results obtained by the original hill climbing-based ML-CFS, the single-objective GA-ML-CFS and a baseline Binary Relevance method, using ML-kNN as the multi-label classifier. The results from our experiments show that LexGA-ML-CFS improved predictive accuracy, by comparison with other methods, in some cases, but in general there was no statistically significant different between the results of LexGA-ML-CFS and other methods
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
- …