Search CORE

18 research outputs found

Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species

Author: Claudia Angelini (90871)
Italia De Feis (137741)
Pietro Liò (34327)
Viet-Anh Nguyen (137745)
Publication venue
Publication date: 11/09/2012
Field of study

<div>A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms. </div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

Motifs detected for RAM transcriptional network in the Ascomycota (case study 3).

Author: Claudia Angelini (90871)
Italia De Feis (137741)
Pietro Liò (34327)
Viet-Anh Nguyen (137745)
Publication venue
Publication date
Field of study

Motifs detected for RAM transcriptional network in the Ascomycota (case study 3).</p

FigShare

Motifs detected for cytokinesis transcriptional network in Candida clade (case study 2).

Author: Claudia Angelini (90871)
Italia De Feis (137741)
Pietro Liò (34327)
Viet-Anh Nguyen (137745)
Publication venue
Publication date
Field of study

Motifs detected for cytokinesis transcriptional network in Candida clade (case study 2).</p

FigShare

RAM ML tree.

Author: Claudia Angelini (90871)
Italia De Feis (137741)
Pietro Liò (34327)
Viet-Anh Nguyen (137745)
Publication venue
Publication date
Field of study

Maximum Likelihood tree, based on JTT model of evolution, inferred using RAM protein sequences from the species of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0042489#pone-0042489-g001" target="_blank">figure 1</a>. We validate this phylogeny with a phylogeny with the same number of species, based on cdc5, a regulator of G2/M transition of mitotic cell cycle with the same visualisation as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0042489#pone.0042489-Nye1" target="_blank">[28]</a>.</p

FigShare

Image_2_Combining Pathway Identification and Breast Cancer Survival Prediction via Screening-Network Methods.PDF

Author: Annalisa Occhipinti (739792)
Antonella Iuliano (5393873)
Claudia Angelini (90871)
Italia De Feis (137741)
Pietro Liò (34327)
Publication venue
Publication date
Field of study

Breast cancer is one of the most common invasive tumors causing high mortality among women. It is characterized by high heterogeneity regarding its biological and clinical characteristics. Several high-throughput assays have been used to collect genome-wide information for many patients in large collaborative studies. This knowledge has improved our understanding of its biology and led to new methods of diagnosing and treating the disease. In particular, system biology has become a valid approach to obtain better insights into breast cancer biological mechanisms. A crucial component of current research lies in identifying novel biomarkers that can be predictive for breast cancer patient prognosis on the basis of the molecular signature of the tumor sample. However, the high dimension and low sample size of data greatly increase the difficulty of cancer survival analysis demanding for the development of ad-hoc statistical methods. In this work, we propose novel screening-network methods that predict patient survival outcome by screening key survival-related genes and we assess the capability of the proposed approaches using METABRIC dataset. In particular, we first identify a subset of genes by using variable screening techniques on gene expression data. Then, we perform Cox regression analysis by incorporating network information associated with the selected subset of genes. The novelty of this work consists in the improved prediction of survival responses due to the different types of screenings (i.e., a biomedical-driven, data-driven and a combination of the two) before building the network-penalized model. Indeed, the combination of the two screening approaches allows us to use the available biological knowledge on breast cancer and complement it with additional information emerging from the data used for the analysis. Moreover, we also illustrate how to extend the proposed approaches to integrate an additional omic layer, such as copy number aberrations, and we show that such strategies can further improve our prediction capabilities. In conclusion, our approaches allow to discriminate patients in high-and low-risk groups using few potential biomarkers and therefore, can help clinicians to provide more precise prognoses and to facilitate the subsequent clinical management of patients at risk of disease.</p

FigShare