1,733 research outputs found
Extracting protein-protein interactions from text using rich feature vectors and feature selection
Because of the intrinsic complexity of natural language, automatically extracting accurate information from text remains a challenge. We have applied rich featurevectors derived from dependency graphs to predict protein-protein interactions using machine learning techniques. We present the first extensive analysis of applyingfeature selection in this domain, and show that it can produce more cost-effective models. For the first time, our technique was also evaluated on several large-scalecross-dataset experiments, which offers a more realistic view on model performance.
During benchmarking, we encountered several fundamental problems hindering comparability with other methods. We present a set of practical guidelines to set up ameaningful evaluation.
Finally, we have analysed the feature sets from our experiments before and after feature selection, and evaluated the contribution of both lexical and syntacticinformation to our method. The gained insight will be useful to develop better performing methods in this domain
Learning Dictionaries for Named Entity Recognition using Minimal Supervision
This paper describes an approach for automatic construction of dictionaries
for Named Entity Recognition (NER) using large amounts of unlabeled data and a
few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower
dimensional embeddings (representations) for candidate phrases and classify
these phrases using a small number of labeled examples. Our method achieves
16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER
respectively. We also show that by adding candidate phrase embeddings as
features in a sequence tagger gives better performance compared to using word
embeddings.Comment: In 14th Conference of the European Chapter of the Association for
Computational Linguistic, 201
One Decade of Development and Evolution of MicroRNA Target Prediction Algorithms
Nearly two decades have passed since the publication of the first study reporting the discovery of microRNAs (miRNAs). The key role of miRNAs in post-transcriptional gene regulation led to the performance of an increasing number of studies focusing on origins, mechanisms of action and functionality of miRNAs. In order to associate each miRNA to a specific functionality it is essential to unveil the rules that govern miRNA action. Despite the fact that there has been significant improvement exposing structural characteristics of the miRNA-mRNA interaction, the entire physical mechanism is not yet fully understood. In this respect, the development of computational algorithms for miRNA target prediction becomes increasingly important. This manuscript summarizes the research done on miRNA target prediction. It describes the experimental data currently available and used in the field and presents three lines of computational approaches for target prediction. Finally, the authors put forward a number of considerations regarding current challenges and future direction
Efficient Correlated Topic Modeling with Topic Embedding
Correlated topic modeling has been limited to small model and problem sizes
due to their high computational cost and poor scaling. In this paper, we
propose a new model which learns compact topic embeddings and captures topic
correlations through the closeness between the topic vectors. Our method
enables efficient inference in the low-dimensional embedding space, reducing
previous cubic or quadratic time complexity to linear w.r.t the topic size. We
further speedup variational inference with a fast sampler to exploit sparsity
of topic occurrence. Extensive experiments show that our approach is capable of
handling model and data scales which are several orders of magnitude larger
than existing correlation results, without sacrificing modeling quality by
providing competitive or superior performance in document classification and
retrieval.Comment: KDD 2017 oral. The first two authors contributed equall
Towards a Protein-Protein Interaction information extraction system: recognizing named entities
[EN] The majority of biological functions of any living being are related to Protein Protein Interactions (PPI). PPI discoveries are reported in form of research publications whose volume grows day after day. Consequently, automatic PPI information extraction systems are a pressing need for biologists. In this paper we are mainly concerned with the named entity detection module of PPIES (the PPI information extraction system we are implementing) which recognizes twelve entity types relevant in PPI context. It is composed of two sub-modules: a dictionary look-up with extensive normalization and acronym detection, and a Conditional Random Field classifier. The dictionary look-up module has been tested with Interaction Method Task (IMT), and it improves by approximately 10% the current solutions that do not use Machine Learning (ML). The second module has been used to create a classifier using the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 04) data set. It does not use any external resources, or complex or ad hoc post-processing, and obtains 77.25%, 75.04% and 76.13 for precision, recall, and F1-measure, respectively, improving all previous results obtained for this data set.This work has been funded by MICINN, Spain, as part of the "Juan de la Cierva" Program and the Project DIANA-Applications (TIN2012-38603-C02-01), as well as the by the European Commission as part of the WIQ-EI IRSES Project (Grant No. 269180) within the FP 7 Marie Curie People Framework.Danger Mercaderes, RM.; Pla SantamarÃa, F.; Molina Marco, A.; Rosso, P. (2014). Towards a Protein-Protein Interaction information extraction system: recognizing named entities. Knowledge-Based Systems. 57:104-118. https://doi.org/10.1016/j.knosys.2013.12.010S1041185
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
Identification of drug-target interactions (DTIs) plays a key role in drug
discovery. The high cost and labor-intensive nature of in vitro and in vivo
experiments have highlighted the importance of in silico-based DTI prediction
approaches. In several computational models, conventional protein descriptors
are shown to be not informative enough to predict accurate DTIs. Thus, in this
study, we employ a convolutional neural network (CNN) on raw protein sequences
to capture local residue patterns participating in DTIs. With CNN on protein
sequences, our model performs better than previous protein descriptor-based
models. In addition, our model performs better than the previous deep learning
model for massive prediction of DTIs. By examining the pooled convolution
results, we found that our model can detect binding sites of proteins for DTIs.
In conclusion, our prediction model for detecting local residue patterns of
target proteins successfully enriches the protein features of a raw protein
sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure
Non-linear dynamical analysis of resting tremor for demand-driven deep brain stimulation.
Parkinson's Disease (PD) is currently the second most common neurodegenerative disease. One of the most characteristic symptoms of PD is resting tremor. Local Field Potentials (LFPs) have been widely studied to investigate deviations from the typical patterns of healthy brain activity. However, the inherent dynamics of the Sub-Thalamic Nucleus (STN) LFPs and their spatiotemporal dynamics have not been well characterized. In this work, we study the non-linear dynamical behaviour of STN-LFPs of Parkinsonian patients using ε -recurrence networks. RNs are a non-linear analysis tool that encodes the geometric information of the underlying system, which can be characterised (for example, using graph theoretical measures) to extract information on the geometric properties of the attractor. Results show that the activity of the STN becomes more non-linear during the tremor episodes and that ε -recurrence network analysis is a suitable method to distinguish the transitions between movement conditions, anticipating the onset of the tremor, with the potential for application in a demand-driven deep brain stimulation system
Multi-Class Classifier in Parkinson’s Disease Using an Evolutionary Multi-Objective Optimization Algorithm
This work was funded by the Spanish Ministry of Sciences, Innovation and Universities under Project RTI-2018-101674-B-I00 and the projects from Junta de Andalucia B-TIC-414, A-TIC-530-UGR20 and P20-00163.In this contribution, a novel methodology for multi-class classification in the field of
Parkinson’s disease is proposed. The methodology is structured in two phases. In a first phase,
the most relevant volumes of interest (VOI) of the brain are selected by means of an evolutionary
multi-objective optimization (MOE) algorithm. Each of these VOIs are subjected to volumetric feature
extraction using the Three-Dimensional Discrete Wavelet Transform (3D-DWT). When applying
3D-DWT, a high number of coefficients is obtained, requiring the use of feature selection/reduction
algorithms to find the most relevant features. The method used in this contribution is based on
Mutual Redundancy (MI) and Minimum Maximum Relevance (mRMR) and PCA. To optimize
the VOI selection, a first group of 550 MRI was used for the 5 classes: PD, SWEDD, Prodromal,
GeneCohort and Normal. Once the Pareto Front of the solutions is obtained (with varying degrees of
complexity, reflected in the number of selected VOIs), these solutions are tested in a second phase.
In order to analyze the SVM classifier accuracy, a test set of 367 MRI was used. The methodology
obtains relevant results in multi-class classification, presenting several solutions with different levels
of complexity and precision (Pareto Front solutions), reaching a result of 97% as the highest precision
in the test data.Spanish Government RTI-2018-101674-B-I00Junta de Andalucia B-TIC-414
A-TIC-530-UGR20
P20-0016
- …