72 research outputs found

    Global alignment of pairwise protein interaction networks for maximal common conserved patterns

    Get PDF
    A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis. Most of alignment tools focus on finding conserved interaction regions across the PPI networks through either local or global mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the size of true orthologs across species is small comparing to the total number of proteins in all species, we take a different approach based on a precompiled list of homologs identified by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster (fly), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identified in the alignment. The results are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specificity and sensitivity, and can be extended to multiple alignments easily

    Learning Contextual Embeddings for Knowledge Graph Completion

    Get PDF
    Knowledge Graphs capture entities and their relationships. However, many knowledge graphs are afflicted by missing data. Recently, embedding methods have been used to alleviate this issue via knowledge graph completion. However, most existing methods only consider the relationship in triples, even though contextual relation types, consisting of the surrounding relation types of a triple, can substantially improve prediction accuracy. Therefore, we propose a contextual embedding method that learns the embeddings of entities and predicates while taking contextual relation types into account. The main benefits of our approach are: (1) improved scalability via a reduced number of epochs needed to achieve comparable or better results with the same memory complexity, (2) higher prediction accuracy (an average of 14%) compared to the related algorithms, and (3) high accuracy for both missing entity and predicate predictions. The source code and the YAGO43k dataset of this paper can be found from (https://github.ncsu.edu/cmoon2/kg)

    WebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 48-56. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Impact of Pretreated Switchgrass and Biomass Carbohydrates on Clostridium thermocellum ATCC 27405 Cellulosome Composition: A Quantitative Proteomic Analysis

    Get PDF
    Background: Economic feasibility and sustainability of lignocellulosic ethanol production requires the development of robust microorganisms that can efficiently degrade and convert plant biomass to ethanol. The anaerobic thermophilic bacterium Clostridium thermocellum is a candidate microorganism as it is capable of hydrolyzing cellulose and fermenting the hydrolysis products to ethanol and other metabolites. C. thermocellum achieves efficient cellulose hydrolysis using multiprotein extracellular enzymatic complexes, termed cellulosomes. Methodology/Principal Findings: In this study, we used quantitative proteomics (multidimensional LC-MS/MS and 15N-metabolic labeling) to measure relative changes in levels of cellulosomal subunit proteins (per CipA scaffoldin basis) when C. thermocellum ATCC 27405 was grown on a variety of carbon sources [dilute-acid pretreated switchgrass, cellobiose, amorphous cellulose, crystalline cellulose (Avicel) and combinations of crystalline cellulose with pectin or xylan or both]. Cellulosome samples isolated from cultures grown on these carbon sources were compared to 15N labeled cellulosome samples isolated from crystalline cellulose-grown cultures. In total from all samples, proteomic analysis identified 59 dockerin- and 8 cohesin-module containing components, including 16 previously undetected cellulosomal subunits. Many cellulosomal components showed differential protein abundance in the presence of non-cellulose substrates in the growt

    A high-throughput \u3ci\u3ede novo\u3c/i\u3e sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Get PDF
    Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode webcite

    Complex biomarker discovery in neuroimaging data: Finding a needle in a haystack

    Get PDF
    AbstractNeuropsychiatric disorders such as schizophrenia, bipolar disorder and Alzheimer's disease are major public health problems. However, despite decades of research, we currently have no validated prognostic or diagnostic tests that can be applied at an individual patient level. Many neuropsychiatric diseases are due to a combination of alterations that occur in a human brain rather than the result of localized lesions. While there is hope that newer imaging technologies such as functional and anatomic connectivity MRI or molecular imaging may offer breakthroughs, the single biomarkers that are discovered using these datasets are limited by their inability to capture the heterogeneity and complexity of most multifactorial brain disorders. Recently, complex biomarkers have been explored to address this limitation using neuroimaging data. In this manuscript we consider the nature of complex biomarkers being investigated in the recent literature and present techniques to find such biomarkers that have been developed in related areas of data mining, statistics, machine learning and bioinformatics
    corecore