1,733 research outputs found

    Extracting protein-protein interactions from text using rich feature vectors and feature selection

    Get PDF
    Because of the intrinsic complexity of natural language, automatically extracting accurate information from text remains a challenge. We have applied rich featurevectors derived from dependency graphs to predict protein-protein interactions using machine learning techniques. We present the first extensive analysis of applyingfeature selection in this domain, and show that it can produce more cost-effective models. For the first time, our technique was also evaluated on several large-scalecross-dataset experiments, which offers a more realistic view on model performance. During benchmarking, we encountered several fundamental problems hindering comparability with other methods. We present a set of practical guidelines to set up ameaningful evaluation. Finally, we have analysed the feature sets from our experiments before and after feature selection, and evaluated the contribution of both lexical and syntacticinformation to our method. The gained insight will be useful to develop better performing methods in this domain

    Learning Dictionaries for Named Entity Recognition using Minimal Supervision

    Full text link
    This paper describes an approach for automatic construction of dictionaries for Named Entity Recognition (NER) using large amounts of unlabeled data and a few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower dimensional embeddings (representations) for candidate phrases and classify these phrases using a small number of labeled examples. Our method achieves 16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER respectively. We also show that by adding candidate phrase embeddings as features in a sequence tagger gives better performance compared to using word embeddings.Comment: In 14th Conference of the European Chapter of the Association for Computational Linguistic, 201

    One Decade of Development and Evolution of MicroRNA Target Prediction Algorithms

    Get PDF
    Nearly two decades have passed since the publication of the first study reporting the discovery of microRNAs (miRNAs). The key role of miRNAs in post-transcriptional gene regulation led to the performance of an increasing number of studies focusing on origins, mechanisms of action and functionality of miRNAs. In order to associate each miRNA to a specific functionality it is essential to unveil the rules that govern miRNA action. Despite the fact that there has been significant improvement exposing structural characteristics of the miRNA-mRNA interaction, the entire physical mechanism is not yet fully understood. In this respect, the development of computational algorithms for miRNA target prediction becomes increasingly important. This manuscript summarizes the research done on miRNA target prediction. It describes the experimental data currently available and used in the field and presents three lines of computational approaches for target prediction. Finally, the authors put forward a number of considerations regarding current challenges and future direction

    Efficient Correlated Topic Modeling with Topic Embedding

    Full text link
    Correlated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations through the closeness between the topic vectors. Our method enables efficient inference in the low-dimensional embedding space, reducing previous cubic or quadratic time complexity to linear w.r.t the topic size. We further speedup variational inference with a fast sampler to exploit sparsity of topic occurrence. Extensive experiments show that our approach is capable of handling model and data scales which are several orders of magnitude larger than existing correlation results, without sacrificing modeling quality by providing competitive or superior performance in document classification and retrieval.Comment: KDD 2017 oral. The first two authors contributed equall

    Towards a Protein-Protein Interaction information extraction system: recognizing named entities

    Full text link
    [EN] The majority of biological functions of any living being are related to Protein Protein Interactions (PPI). PPI discoveries are reported in form of research publications whose volume grows day after day. Consequently, automatic PPI information extraction systems are a pressing need for biologists. In this paper we are mainly concerned with the named entity detection module of PPIES (the PPI information extraction system we are implementing) which recognizes twelve entity types relevant in PPI context. It is composed of two sub-modules: a dictionary look-up with extensive normalization and acronym detection, and a Conditional Random Field classifier. The dictionary look-up module has been tested with Interaction Method Task (IMT), and it improves by approximately 10% the current solutions that do not use Machine Learning (ML). The second module has been used to create a classifier using the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 04) data set. It does not use any external resources, or complex or ad hoc post-processing, and obtains 77.25%, 75.04% and 76.13 for precision, recall, and F1-measure, respectively, improving all previous results obtained for this data set.This work has been funded by MICINN, Spain, as part of the "Juan de la Cierva" Program and the Project DIANA-Applications (TIN2012-38603-C02-01), as well as the by the European Commission as part of the WIQ-EI IRSES Project (Grant No. 269180) within the FP 7 Marie Curie People Framework.Danger Mercaderes, RM.; Pla Santamaría, F.; Molina Marco, A.; Rosso, P. (2014). Towards a Protein-Protein Interaction information extraction system: recognizing named entities. Knowledge-Based Systems. 57:104-118. https://doi.org/10.1016/j.knosys.2013.12.010S1041185

    DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

    Full text link
    Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure

    Non-linear dynamical analysis of resting tremor for demand-driven deep brain stimulation.

    Get PDF
    Parkinson's Disease (PD) is currently the second most common neurodegenerative disease. One of the most characteristic symptoms of PD is resting tremor. Local Field Potentials (LFPs) have been widely studied to investigate deviations from the typical patterns of healthy brain activity. However, the inherent dynamics of the Sub-Thalamic Nucleus (STN) LFPs and their spatiotemporal dynamics have not been well characterized. In this work, we study the non-linear dynamical behaviour of STN-LFPs of Parkinsonian patients using ε -recurrence networks. RNs are a non-linear analysis tool that encodes the geometric information of the underlying system, which can be characterised (for example, using graph theoretical measures) to extract information on the geometric properties of the attractor. Results show that the activity of the STN becomes more non-linear during the tremor episodes and that ε -recurrence network analysis is a suitable method to distinguish the transitions between movement conditions, anticipating the onset of the tremor, with the potential for application in a demand-driven deep brain stimulation system

    Multi-Class Classifier in Parkinson’s Disease Using an Evolutionary Multi-Objective Optimization Algorithm

    Get PDF
    This work was funded by the Spanish Ministry of Sciences, Innovation and Universities under Project RTI-2018-101674-B-I00 and the projects from Junta de Andalucia B-TIC-414, A-TIC-530-UGR20 and P20-00163.In this contribution, a novel methodology for multi-class classification in the field of Parkinson’s disease is proposed. The methodology is structured in two phases. In a first phase, the most relevant volumes of interest (VOI) of the brain are selected by means of an evolutionary multi-objective optimization (MOE) algorithm. Each of these VOIs are subjected to volumetric feature extraction using the Three-Dimensional Discrete Wavelet Transform (3D-DWT). When applying 3D-DWT, a high number of coefficients is obtained, requiring the use of feature selection/reduction algorithms to find the most relevant features. The method used in this contribution is based on Mutual Redundancy (MI) and Minimum Maximum Relevance (mRMR) and PCA. To optimize the VOI selection, a first group of 550 MRI was used for the 5 classes: PD, SWEDD, Prodromal, GeneCohort and Normal. Once the Pareto Front of the solutions is obtained (with varying degrees of complexity, reflected in the number of selected VOIs), these solutions are tested in a second phase. In order to analyze the SVM classifier accuracy, a test set of 367 MRI was used. The methodology obtains relevant results in multi-class classification, presenting several solutions with different levels of complexity and precision (Pareto Front solutions), reaching a result of 97% as the highest precision in the test data.Spanish Government RTI-2018-101674-B-I00Junta de Andalucia B-TIC-414 A-TIC-530-UGR20 P20-0016
    • …
    corecore