281 research outputs found
Predictive and Personalized Medicine with Systems Biology Solutions
poster abstractSystems biology refers to the use of systems engineering and systems science techniques to the understanding of biological systems. At Indiana Center for Systems Biology and Personalized Medicine (ICSBPM), we are particularly interested in developing systems biology techniques that can help shorten the gaps between basic biomedical research and clinical applications of genome sciences toward predictive and personalized medicine. In the past several years, ICSBPM has developed many critical informatics resources for the systems biology and personalized medicine community.
The database and software tools that we developed have promoted systems biology and personalized medicine research communities at the national scale. These tools include: HPD, an integrated human pathway database and analysis tool (Chowbina et al., in BMC Bioinformatics 2009, 10(S11): S5); HAPPI, a human annotated and predicted protein interaction database (Chen et al., in BMC Genomics 2009, 10(S1):S16); HIP2, a Database of Healthy Human Individual's Integrated Plasma Proteome (Saha et al., in BMC Medical Genomics 2008, 1(1):12); PEPPI, a Peptidomic Database of Protein Isoforms (Zhou et al., in BMC bioinformatics 2010, 11(S6), S7); ProteoLens, a multi-scale network visualization and data mining tool (Huan et al., in BMC bioinformatics 2008, 9(S9):S5); GeneTerrain, a visual exploration tool for network-organized expression panel biomarker development (You et al., in Information Visualization 2010, 9(1)), and C-Maps, comprehensive molecular connectivity maps between disease-specific proteins and drugs (Li et al., in PLoS Computational Biology, 5(7), e1000450).
These tools has been demonstrated to help improve tumor classifications, understand cancer biological systems at the systems scale, tackle biomarker discovery challenges, and facilitate clinical adoption of predictive models developed from computational techniques. We hope that our experience and resources can cement collaborative translational medicine research towards predictive and personalized medicine applications
Discovery of pathway biomarkers from coupled proteomics and systems biology methods
Background: Breast cancer is worldwide the second most common type of cancer after lung cancer. Plasma
proteome profiling may have a higher chance to identify protein changes between plasma samples such as
normal and breast cancer tissues. Breast cancer cell lines have long been used by researches as model system for
identifying protein biomarkers. A comparison of the set of proteins which change in plasma with previously
published findings from proteomic analysis of human breast cancer cell lines may identify with a higher
confidence a subset of candidate protein biomarker.
Results: In this study, we analyzed a liquid chromatography (LC) coupled tandem mass spectrometry (MS/MS)
proteomics dataset from plasma samples of 40 healthy women and 40 women diagnosed with breast cancer.
Using a two-sample t-statistics and permutation procedure, we identified 254 statistically significant, differentially
expressed proteins, among which 208 are over-expressed and 46 are under-expressed in breast cancer plasma. We
validated this result against previously published proteomic results of human breast cancer cell lines and signaling
pathways to derive 25 candidate protein biomarkers in a panel. Using the pathway analysis, we observed that the
25 âactivatedâ plasma proteins were present in several cancer pathways, including âComplement and coagulation
cascadesâ, âRegulation of actin cytoskeletonâ, and âFocal adhesionâ, and match well with previously reported studies.
Additional gene ontology analysis of the 25 proteins also showed that cellular metabolic process and response to
external stimulus (especially proteolysis and acute inflammatory response) were enriched functional annotations of
the proteins identified in the breast cancer plasma samples. By cross-validation using two additional proteomics
studies, we obtained 86% and 83% similarities in pathway-protein matrix between the first study and the two
testing studies, which is much better than the similarity we measured with proteins.
Conclusions: We presented a âsystems biologyâ method to identify, characterize, analyze and validate panel
biomarkers in breast cancer proteomics data, which includes 1) t statistics and permutation process, 2) network,
pathway and function annotation analysis, and 3) cross-validation of multiple studies. Our results showed that the
systems biology approach is essential to the understanding molecular mechanisms of panel protein biomarkers
An integrated proteomics analysis of bone tissues in response to mechanical stimulation
Bone cells can sense physical forces and convert mechanical stimulation conditions into biochemical signals that lead to expression of mechanically sensitive genes and proteins. However, it is still poorly understood how genes and proteins in bone cells are orchestrated to respond to mechanical stimulations. In this research, we applied integrated proteomics, statistical, and network biology techniques to study proteome-level changes to bone tissue cells in response to two different conditions, normal loading and fatigue loading. We harvested ulna midshafts and isolated proteins from the control, loaded, and fatigue loaded Rats. Using a label-free liquid chromatography tandem mass spectrometry (LC-MS/MS) experimental proteomics technique, we derived a comprehensive list of 1,058 proteins that are differentially expressed among normal loading, fatigue loading, and controls. By carefully developing protein selection filters and statistical models, we were able to identify 42 proteins representing 21 Rat genes that were significantly associated with bone cells' response to quantitative changes between normal loading and fatigue loading conditions. We further applied network biology techniques by building a fatigue loading activated protein-protein interaction subnetwork involving 9 of the human-homolog counterpart of the 21 rat genes in a large connected network component. Our study shows that the combination of decreased anti-apoptotic factor, Raf1, and increased pro-apoptotic factor, PDCD8, results in significant increase in the number of apoptotic osteocytes following fatigue loading. We believe controlling osteoblast differentiation/proliferation and osteocyte apoptosis could be promising directions for developing future therapeutic solutions for related bone diseases
PEPPI: a peptidomic database of human protein isoforms for proteomics experiments
Background
Protein isoform generation, which may derive from alternative splicing, genetic polymorphism, and posttranslational modification, is an essential source of achieving molecular diversity by eukaryotic cells. Previous studies have shown that protein isoforms play critical roles in disease diagnosis, risk assessment, sub-typing, prognosis, and treatment outcome predictions. Understanding the types, presence, and abundance of different protein isoforms in different cellular and physiological conditions is a major task in functional proteomics, and may pave ways to molecular biomarker discovery of human diseases. In tandem mass spectrometry (MS/MS) based proteomics analysis, peptide peaks with exact matches to protein sequence records in the proteomics database may be identified with mass spectrometry (MS) search software. However, due to limited annotation and poor coverage of protein isoforms in proteomics databases, high throughput protein isoform identifications, particularly those arising from alternative splicing and genetic polymorphism, have not been possible.
Results
Therefore, we present the PEPtidomics Protein Isoform Database (PEPPI, http://bio.informatics.iupui.edu/peppi), a comprehensive database of computationally-synthesized human peptides that can identify protein isoforms derived from either alternatively spliced mRNA transcripts or SNP variations. We collected genome, pre-mRNA alternative splicing and SNP information from Ensembl. We synthesized in silico isoform transcripts that cover all exons and theoretically possible junctions of exons and introns, as well as all their variations derived from known SNPs. With three case studies, we further demonstrated that the database can help researchers discover and characterize new protein isoform biomarkers from experimental proteomics data.
Conclusions
We developed a new tool for the proteomics community to characterize protein isoforms from MS-based proteomics experiments. By cataloguing each peptide configurations in the PEPPI database, users can study genetic variations and alternative splicing events at the proteome level. They can also batch-download peptide sequences in FASTA format to search for MS/MS spectra derived from human samples. The database can help generate novel hypotheses on molecular risk factors and molecular mechanisms of complex diseases, leading to identification of potentially highly specific protein isoform biomarkers
HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions
BACKGROUND:
Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine.
RESULTS:
The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI .
CONCLUSIONS:
While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology
Predicting adverse side effects of drugs
<p>Abstract</p> <p>Background</p> <p>Studies of toxicity and unintended side effects can lead to improved drug safety and efficacy. One promising form of study comes from molecular systems biology in the form of "systems pharmacology". Systems pharmacology combines data from clinical observation and molecular biology. This approach is new, however, and there are few examples of how it can practically predict adverse reactions (ADRs) from an experimental drug with acceptable accuracy.</p> <p>Results</p> <p>We have developed a new and practical computational framework to accurately predict ADRs of trial drugs. We combine clinical observation data with drug target data, protein-protein interaction (PPI) networks, and gene ontology (GO) annotations. We use cardiotoxicity, one of the major causes for drug withdrawals, as a case study to demonstrate the power of the framework. Our results show that an <it>in silico </it>model built on this framework can achieve a satisfactory cardiotoxicity ADR prediction performance (median AUC = 0.771, Accuracy = 0.675, Sensitivity = 0.632, and Specificity = 0.789). Our results also demonstrate the significance of incorporating prior knowledge, including gene networks and gene annotations, to improve future ADR assessments.</p> <p>Conclusions</p> <p>Biomolecular network and gene annotation information can significantly improve the predictive accuracy of ADR of drugs under development. The use of PPI networks can increase prediction specificity and the use of GO annotations can increase prediction sensitivity. Using cardiotoxicity as an example, we are able to further identify cardiotoxicity-related proteins among drug target expanding PPI networks. The systems pharmacology approach that we developed in this study can be generally applicable to all future developmental drug ADR assessments and predictions.</p
New threats to health data privacy
<p>Abstract</p> <p>Background</p> <p>Along with the rapid digitalization of health data (e.g. Electronic Health Records), there is an increasing concern on maintaining data privacy while garnering the benefits, especially when the data are required to be published for secondary use. Most of the current research on protecting health data privacy is centered around data de-identification and data anonymization, which removes the identifiable information from the published health data to prevent an adversary from reasoning about the privacy of the patients. However, published health data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against health data privacy become practical. Limited efforts have been devoted to studying these attacks yet.</p> <p>Results</p> <p>We study how patient privacy could be compromised with the help of todayâs information technologies. In particular, we show that private healthcare information could be collected by aggregating and associating disparate pieces of information from multiple online data sources including online social networks, public records and search engine results. We demonstrate a real-world case study to show user identity and privacy are highly vulnerable to the attribution, inference and aggregation attacks. We also show that people are highly identifiable to adversaries even with inaccurate information pieces about the target, with real data analysis.</p> <p>Conclusion</p> <p>We claim that too much information has been made available electronic and available online that people are very vulnerable without effective privacy protection.</p
SLDR: a computational technique to identify novel genetic regulatory relationships
We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. In SLDR, we model each candidate target gene as being controlled by N binary-state regulators that lead to â¤2N observable states ("step-levels") for the target. We applied SLDR to the study of the GEO microarray data set GSE25644, which consists of 158 different mutant S. cerevisiae gene expressional profiles. For each target gene t, we first clustered ordered samples into various clusters, each approximating an observable step-level of t to screen out the "de-centric" target. Then, we ordered each gene x as a candidate regulator and aligned t to x for the purpose of examining the step-level correlations between low expression set of x (Ro) and high expression set of x (Rh) from the regulator x to t, by finding max f(t, x): |Ro-Rh| over all candidate Ă in the genome for each t. We therefore obtained activation and inhibitions events from different combinations of Ro and Rh. Furthermore, we developed criteria for filtering out less-confident regulators, estimated the number of regulators for each target t, and evaluated identified top-ranking regulator-target relationship. Our results can be cross-validated with the Yeast Fitness database. SLDR is also computationally efficient with o(N²) complexity. In summary, we believe SLDR can be applied to the mining of functional genomics big data for future network biology and network medicine applications
"Super Gene Set" Causal Relationship Discovery from Functional Genomics Data
In this article, we present a computational framework to identify "causal relationships" among super gene sets. For "causal relationships," we refer to both stimulatory and inhibitory regulatory relationships, regardless of through direct or indirect mechanisms. For super gene sets, we refer to "pathways, annotated lists, and gene signatures," or PAGs. To identify causal relationships among PAGs, we extend the previous work on identifying PAG-to-PAG regulatory relationships by further requiring them to be significantly enriched with gene-to-gene co-expression pairs across the two PAGs involved. This is achieved by developing a quantitative metric based on PAG-to-PAG Co-expressions (PPC), which we use to infer the likelihood that PAG-to-PAG relationships under examination are causal-either stimulatory or inhibitory. Since true causal relationships are unknown, we approximate the overall performance of inferring causal relationships with the performance of recalling known r-type PAG-to-PAG relationships from causal PAG-to-PAG inference, using a functional genomics benchmark dataset from the GEO database. We report the area-under-curve (AUC) performance for both precision and recall being 0.81. By applying our framework to a myeloid-derived suppressor cells (MDSC) dataset, we further demonstrate that this framework is effective in helping build multi-scale biomolecular systems models with new insights on regulatory and causal links for downstream biological interpretations
- âŚ