Search CORE

22,218 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Sensitivity dependent model of protein-protein interaction networks

Author: Estojak J
Eugene I Shakhnovich
Jingshan Zhang
Wagner A
Publication venue: 'IOP Publishing'
Publication date: 05/09/2008
Field of study

The scale free structure p(k)~k^{-gamma} of protein-protein interaction networks can be reproduced by a static physical model in simulation. We inspect the model theoretically, and find the key reason for the model to generate apparent scale free degree distributions. This explanation provides a generic mechanism of "scale free" networks. Moreover, we predict the dependence of gamma on experimental protein concentrations or other sensitivity factors in detecting interactions, and find experimental evidence to support the prediction.Comment: organization improved, and experimental evidence of predicted dependence on sensitivity is addresse

arXiv.org e-Print Archive

Crossref

DART-ID increases single-cell proteome coverage.

Author: Chen Albert Tian
Franks Alexander
Slavov Nikolai
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net

Directory of Open Access Journals

eScholarship - University of California

Deducing topology of protein-protein interaction networks from experimentally measured sub-networks.

Author: Han Zhangang
Maclellan W Robb
Qu Zhilin
Vondriska Thomas M
Weiss James N
Yang Ling
Publication venue: eScholarship, University of California
Publication date: 01/07/2008
Field of study

BackgroundProtein-protein interaction networks are commonly sampled using yeast two hybrid approaches. However, whether topological information reaped from these experimentally-measured sub-networks can be extrapolated to complete protein-protein interaction networks is unclear.ResultsBy analyzing various experimental protein-protein interaction datasets, we found that they are not random samples of the parent networks. Based on the experimental bait-prey behaviors, our computer simulations show that these non-random sampling features may affect the topological information. We tested the hypothesis that a core sub-network exists within the experimentally sampled network that better maintains the topological characteristics of the parent protein-protein interaction network. We developed a method to filter the experimentally sampled network to result in a core sub-network that more accurately reflects the topology of the parent network. These findings have fundamental implications for large-scale protein interaction studies and for our understanding of the behavior of cellular networks.ConclusionThe topological information from experimental measured networks network as is may not be the correct source for topological information about the parent protein-protein interaction network. We define a core sub-network that more accurately reflects the topology of the parent network

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Large scale localization of protein phosphorylation by use of electron capture dissociation mass spectrometry.

Author: Akbarzadeh
Bailey
Beausoleil
Blom
Chi
Christopher M. Bailey
Cooper
Cooper
Creese
Creese
Debbie L. Cunningham
Ding
Ficarro
Geer
Good
Helen J. Cooper
John K. Heath
Kelleher
Kjeldsen
Lind
Littlefield
Mirgorodskaya
Molina
Molina
Nielsen
Olsen
Palumbo
Perkins
Rush
Ruttenberg
Satake
Savitski
Savitski
Savitski
Shen
Steen
Stensballe
Steve. M.M Sweet
Sweet
Sweet
Sweet
Syka
Thingholm
Thingholm
Wan
Woodling
Zubarev
Zubarev
Publication venue
Publication date: 01/01/2009
Field of study

We used on-line electron capture dissociation (ECD) for the large scale identification and localization of sites of phosphorylation. Each FT-ICR ECD event was paired with a linear ion trap collision-induced dissociation (CID) event, allowing a direct comparison of the relative merits of ECD and CID for phosphopeptide identification and site localization. Linear ion trap CID was shown to be most efficient for phosphopeptide identification, whereas FT-ICR ECD was superior for localization of sites of phosphorylation. The combination of confident CID and ECD identification and confident CID and ECD localization is particularly valuable in cases where a phosphopeptide is identified just once within a phosphoproteomics experiment

Crossref

University of Birmingham Research Portal

PubMed Central

Sussex Research Online

Recommended from our members

A mass spectrometry-guided genome mining approach for natural product peptidogenomics.

Author: Cimermancic Peter
Dorrestein Pieter C
Fenical William
Fischbach Michael A
Kersten Roland D
Moore Bradley S
Nam Sang-Jip
Xu Yuquan
Yang Yu-Liang
Publication venue: eScholarship, University of California
Publication date: 01/10/2011
Field of study

Peptide natural products show broad biological properties and are commonly produced by orthogonal ribosomal and nonribosomal pathways in prokaryotes and eukaryotes. To harvest this large and diverse resource of bioactive molecules, we introduce here natural product peptidogenomics (NPP), a new MS-guided genome-mining method that connects the chemotypes of peptide natural products to their biosynthetic gene clusters by iteratively matching de novo tandem MS (MS(n)) structures to genomics-based structures following biosynthetic logic. In this study, we show that NPP enabled the rapid characterization of over ten chemically diverse ribosomal and nonribosomal peptide natural products of previously unidentified composition from Streptomycete bacteria as a proof of concept to begin automating the genome-mining process. We show the identification of lantipeptides, lasso peptides, linardins, formylated peptides and lipopeptides, many of which are from well-characterized model Streptomycetes, highlighting the power of NPP in the discovery of new peptide natural products from even intensely studied organisms

eScholarship - University of California

Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

Author: Patil Ashwini
Srihari Sriganesh
Wong Limsoon
Yong Chern Han
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

arXiv.org e-Print Archive

Elsevier - Publisher Connector

University of Queensland eSpace