30 research outputs found

    Text Mining for Protein Docking

    Get PDF
    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate

    The Gut Microbiome, Aging, and Longevity: A Systematic Review.

    No full text

    Natural language processing in text mining for structural modeling of protein complexes

    Get PDF
    Abstract Background Structural modeling of protein-protein interactions produces a large number of putative configurations of the protein complexes. Identification of the near-native models among them is a serious challenge. Publicly available results of biomedical research may provide constraints on the binding mode, which can be essential for the docking. Our text-mining (TM) tool, which extracts binding site residues from the PubMed abstracts, was successfully applied to protein docking (Badal et al., PLoS Comput Biol, 2015; 11: e1004630). Still, many extracted residues were not relevant to the docking. Results We present an extension of the TM tool, which utilizes natural language processing (NLP) for analyzing the context of the residue occurrence. The procedure was tested using generic and specialized dictionaries. The results showed that the keyword dictionaries designed for identification of protein interactions are not adequate for the TM prediction of the binding mode. However, our dictionary designed to distinguish keywords relevant to the protein binding sites led to considerable improvement in the TM performance. We investigated the utility of several methods of context analysis, based on dissection of the sentence parse trees. The machine learning-based NLP filtered the pool of the mined residues significantly more efficiently than the rule-based NLP. Constraints generated by NLP were tested in docking of unbound proteins from the DOCKGROUND X-ray benchmark set 4. The output of the global low-resolution docking scan was post-processed, separately, by constraints from the basic TM, constraints re-ranked by NLP, and the reference constraints. The quality of a match was assessed by the interface root-mean-square deviation. The results showed significant improvement of the docking output when using the constraints generated by the advanced TM with NLP. Conclusions The basic TM procedure for extracting protein-protein binding site residues from the PubMed abstracts was significantly advanced by the deep parsing (NLP techniques for contextual analysis) in purging of the initial pool of the extracted residues. Benchmarking showed a substantial increase of the docking success rate based on the constraints generated by the advanced TM with NLP

    Dynamics of task-based confidence in schizophrenia using seasonal decomposition approach

    No full text
    Objective: Introspective Accuracy (IA) is a metacognitive construct that refers to alignment of self-generated accuracy judgments, confidence, and objective information regarding performance. IA not only refers to accuracy and confidence during tasks, but also predicts functional outcomes. The consistency and magnitude of IA deficits suggest a sustained disconnect between self-assessments and actual performance. The cognitive origins of IA are unclear and are not simply due to poor performance. We tried to capture task and diagnosis-related differences through examining confidence as a timeseries. Method: This relatively large sample (N = 171; Bipolar = 71, Schizophrenia = 100) study used item by item confidence judgments for tasks including the Wisconsin Card Sorting Task (WCST) and the Emotion Recognition task (ER-40). Using a seasonal decomposition approach and AutoRegressive, Integrative and Moving Averages (ARIMA) time-series analyses we tested for the presence of randomness and perseveration. Results: For the WCST, comparisons across participants with schizophrenia and bipolar disorder found similar trends and residuals, thus excluding perseverative or random responding. However, seasonal components were weaker in participants with schizophrenia, reflecting a reduced impact of feedback on confidence. In contrast, for the ER40, which does not require identification of a sustained construct, seasonal, trend, and residual analyses were highly comparable. Conclusion: Seasonal analysis revealed that confidence judgments in participants with schizophrenia on tasks requiring responses to feedback reflected diminished incorporation of external information, not random or preservative responding. These analyses highlight how time series analyses can specify potential faulty processes for future intervention

    Performance of basic and SVM-enhanced TM protocols.

    No full text
    <p><sup>a</sup> Number of complexes for which TM protocol found at least one abstract with residues</p><p><sup>b</sup> Number of complexes with at least one interface residue found in abstracts</p><p><sup>c</sup> Ratio of <b><i>L</i></b><sub><b>tot</b></sub> and total number of complexes</p><p><sup>d</sup> Ratio of <b><i>L</i></b><sub><b>int</b></sub> and total number of complexes</p><p><sup>e</sup> Ratio of <b><i>L</i></b><sub><b>int</b></sub> and <b><i>L</i></b><sub><b>tot</b></sub></p><p>The SVM models were trained and tested on abstracts retrieved by the AND-queries. Best models were applied to abstracts retrieved by the OR-queries (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004630#sec002" target="_blank">Methods</a>). Total number of complexes in the dataset is 579, if not specified otherwise.</p

    Docking with TM constraints.

    No full text
    <p>The results of benchmarking on the unbound X-ray set from Dockground. A complex was predicted successfully if at least one in top ten matches had ligand C<sup>α</sup> interface RMSD ≤ 5 Å (A), and one in top hundred had RMSD ≤ 8 Å (B). The success rate is the percentage of successfully predicted complexes in the set. The low-resolution geometric scan output (20,000 matches) from GRAMM docking, with no post-processing, except removal of redundant matches, was scored by TM results. The reference bars show scoring by the actual interface residues (see text).</p

    Examples of residues extracted from an abstracts retrieved by OR-query.

    No full text
    <p>The structure, chain ID, and residue numbers are from 1m27. Interface and non-interface residues are in brown and magenta, correspondingly.</p
    corecore