95 research outputs found

    Fast calibrated additive quantile regression

    Full text link
    We propose a novel framework for fitting additive quantile regression models, which provides well calibrated inference about the conditional quantiles and fast automatic estimation of the smoothing parameters, for model structures as diverse as those usable with distributional GAMs, while maintaining equivalent numerical efficiency and stability. The proposed methods are at once statistically rigorous and computationally efficient, because they are based on the general belief updating framework of Bissiri et al. (2016) to loss based inference, but compute by adapting the stable fitting methods of Wood et al. (2016). We show how the pinball loss is statistically suboptimal relative to a novel smooth generalisation, which also gives access to fast estimation methods. Further, we provide a novel calibration method for efficiently selecting the 'learning rate' balancing the loss with the smoothing priors during inference, thereby obtaining reliable quantile uncertainty estimates. Our work was motivated by a probabilistic electricity load forecasting application, used here to demonstrate the proposed approach. The methods described here are implemented by the qgam R package, available on the Comprehensive R Archive Network (CRAN)

    HIV envelope trimer-elicited autologous neutralizing antibodies bind a region overlapping the N332 glycan supersite

    Get PDF
    To date, immunization studies of rabbits with the BG505 SOSIP.664 HIV envelope glycoprotein trimers have revealed the 241/289 glycan hole as the dominant neutralizing antibody epitope. Here, we isolated monoclonal antibodies from a rabbit that did not exhibit glycan hole–dependent autologous serum neutralization. The antibodies did not compete with a previously isolated glycan hole–specific antibody but did compete with N332 glycan supersite broadly neutralizing antibodies. A 3.5-Å cryoEM structure of one of the antibodies in complex with the BG505 SOSIP.v5.2 trimer demonstrated that while the epitope recognized overlapped the N332 glycan supersite by contacting the GDIR motif at the base of V3, primary contacts were located in the variable V1 loop. These data suggest that strain-specific responses to V1 may interfere with broadly neutralizing responses to the N332 glycan supersite and vaccine immunogens may require engineering to minimize these off-target responses or steer them toward a more desirable pathway

    Linguistic feature analysis for protein interaction extraction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels.</p> <p>Results</p> <p>Our results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared.</p> <p>Conclusion</p> <p>Our findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches.</p

    Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction

    Get PDF
    BACKGROUND: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively. CONCLUSIONS/SIGNIFICANCE: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/

    Resistance to the CCR5 Inhibitor 5P12-RANTES Requires a Difficult Evolution from CCR5 to CXCR4 Coreceptor Use

    Get PDF
    Viral resistance to small molecule allosteric inhibitors of CCR5 is well documented, and involves either selection of preexisting CXCR4-using HIV-1 variants or envelope sequence evolution to use inhibitor-bound CCR5 for entry. Resistance to macromolecular CCR5 inhibitors has been more difficult to demonstrate, although selection of CXCR4-using variants might be expected. We have compared the in vitro selection of HIV-1 CC1/85 variants resistant to either the small molecule inhibitor maraviroc (MVC) or the macromolecular inhibitor 5P12-RANTES. High level resistance to MVC was conferred by the same envelope mutations as previously reported after 16–18 weeks of selection by increasing levels of MVC. The MVC-resistant mutants were fully sensitive to inhibition by 5P12-RANTES. By contrast, only transient and low level resistance to 5P12-RANTES was achieved in three sequential selection experiments, and each resulted in a subsequent collapse of virus replication. A fourth round of selection by 5P12-RANTES led, after 36 weeks, to a “resistant” variant that had switched from CCR5 to CXCR4 as a coreceptor. Envelope sequences diverged by 3.8% during selection of the 5P12-RANTES resistant, CXCR4-using variants, with unique and critical substitutions in the V3 region. A subset of viruses recovered from control cultures after 44 weeks of passage in the absence of inhibitors also evolved to use CXCR4, although with fewer and different envelope mutations. Control cultures contained both viruses that evolved to use CXCR4 by deleting four amino acids in V3, and others that maintained entry via CCR5. These results suggest that coreceptor switching may be the only route to resistance for compounds like 5P12-RANTES. This pathway requires more mutations and encounters more fitness obstacles than development of resistance to MVC, confirming the clinical observations that resistance to small molecule CCR5 inhibitors very rarely involves coreceptor switching

    Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

    Get PDF
    To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches

    A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

    Get PDF
    The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods

    Toll-Like Receptor 3 (TLR3) Plays a Major Role in the Formation of Rabies Virus Negri Bodies

    Get PDF
    Human neurons express the innate immune response receptor, Toll-like receptor 3 (TLR3). TLR3 levels are increased in pathological conditions such as brain virus infection. Here, we further investigated the production, cellular localisation, and function of neuronal TLR3 during neuronotropic rabies virus (RABV) infection in human neuronal cells. Following RABV infection, TLR3 is not only present in endosomes, as observed in the absence of infection, but also in detergent-resistant perinuclear inclusion bodies. As well as TLR3, these inclusion bodies contain the viral genome and viral proteins (N and P, but not G). The size and composition of inclusion bodies and the absence of a surrounding membrane, as shown by electron microscopy, suggest they correspond to the previously described Negri Bodies (NBs). NBs are not formed in the absence of TLR3, and TLR3−/− mice—in which brain tissue was less severely infected—had a better survival rate than WT mice. These observations demonstrate that TLR3 is a major molecule involved in the spatial arrangement of RABV–induced NBs and viral replication. This study shows how viruses can exploit cellular proteins and compartmentalisation for their own benefit

    Linking genes to literature: text mining, information extraction, and retrieval applications for biology

    Get PDF
    Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet
    corecore