28 research outputs found
Predictive integration of gene functional similarity and co-expression defines treatment response of endothelial progenitor cells
<p>Abstract</p> <p>Background</p> <p>Endothelial progenitor cells (EPCs) have been implicated in different processes crucial to vasculature repair, which may offer the basis for new therapeutic strategies in cardiovascular disease. Despite advances facilitated by functional genomics, there is a lack of systems-level understanding of treatment response mechanisms of EPCs. In this research we aimed to characterize the EPCs response to adenosine (Ado), a cardioprotective factor, based on the systems-level integration of gene expression data and prior functional knowledge. Specifically, we set out to identify novel biosignatures of Ado-treatment response in EPCs.</p> <p>Results</p> <p>The predictive integration of gene expression data and standardized functional similarity information enabled us to identify new treatment response biosignatures. Gene expression data originated from Ado-treated and -untreated EPCs samples, and functional similarity was estimated with Gene Ontology (GO)-based similarity information. These information sources enabled us to implement and evaluate an integrated prediction approach based on the concept of <it>k</it>-nearest neighbours learning (<it>k</it>NN). The method can be executed by expert- and data-driven input queries to guide the search for biologically meaningful biosignatures. The resulting <it>integrated kNN </it>system identified new candidate EPC biosignatures that can offer high classification performance (areas under the operating characteristic curve > 0.8). We also showed that the proposed models can outperform those discovered by standard gene expression analysis. Furthermore, we report an initial independent <it>in vitro </it>experimental follow-up, which provides additional evidence of the potential validity of the top biosignature.</p> <p>Conclusion</p> <p>Response to Ado treatment in EPCs can be accurately characterized with a new method based on the combination of gene co-expression data and GO-based similarity information. It also exploits the incorporation of human expert-driven queries as a strategy to guide the automated search for candidate biosignatures. The proposed biosignature improves the systems-level characterization of EPCs. The new integrative predictive modeling approach can also be applied to other phenotype characterization or biomarker discovery problems.</p
Coordinated modular functionality and prognostic potential of a heart failure biomarker-driven interaction network
<p>Abstract</p> <p>Background</p> <p>The identification of potentially relevant biomarkers and a deeper understanding of molecular mechanisms related to heart failure (HF) development can be enhanced by the implementation of biological network-based analyses. To support these efforts, here we report a global network of protein-protein interactions (PPIs) relevant to HF, which was characterized through integrative bioinformatic analyses of multiple sources of "omic" information.</p> <p>Results</p> <p>We found that the structural and functional architecture of this PPI network is highly modular. These network modules can be assigned to specialized processes, specific cellular regions and their functional roles tend to partially overlap. Our results suggest that HF biomarkers may be defined as key coordinators of intra- and inter-module communication. Putative biomarkers can, in general, be distinguished as "information traffic" mediators within this network. The top high traffic proteins are encoded by genes that are not highly differentially expressed across HF and non-HF patients. Nevertheless, we present evidence that the integration of expression patterns from high traffic genes may support accurate prediction of HF. We quantitatively demonstrate that intra- and inter-module functional activity may be controlled by a family of transcription factors known to be associated with the prevention of hypertrophy.</p> <p>Conclusion</p> <p>The systems-driven analysis reported here provides the basis for the identification of potentially novel biomarkers and understanding HF-related mechanisms in a more comprehensive and integrated way.</p
Metrics for GO based protein semantic similarity: a systematic evaluation
<p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p
Clustering-based approaches to SAGE data mining
Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation
Inferring adaptive regulation thresholds and association rules from gene expression data through combinatorial optimization learning
There is a need to design computational methods to support the prediction of gene regulatory networks (GRNs). Such models should offer both biologically meaningful and computationally accurate predictions which, in combination with other techniques, may improve large-scale integrative studies. This paper presents a new machine-learning method for the prediction of putative regulatory associations from expression data which exhibit properties never or only partially addressed by other techniques recently published. The method was tested on a Saccharomyces cerevisiae gene expression data set. The results were statistically validated and compared with the relationships inferred by two machine-learning approaches to GRN prediction. Furthermore, the resulting predictions were assessed using domain knowledge. The proposed algorithm may be able to accurately predict relevant biological associations between genes. One of the most relevant features of this new method is the prediction of adaptive regulation thresholds for the discretization of gene expression values, which is required prior to the rule association learning process. Moreover, an important advantage consists of its low computational cost to infer association rules. The proposed system may significantly support exploratory large-scale studies of automated identification of potentially relevant gene expression associations.Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Azuaje, Francisco J.. University of Ulster; Reino UnidoFil: Augusto, Juan C.. University of Ulster; Reino UnidoFil: Glass, David H.. University of Ulster; Reino Unid