182 research outputs found

    Analysis of correlation-based biomolecular networks from different omics data by fitting stochastic block models

    Get PDF
    Background: Biological entities such as genes, promoters, mRNA, metabolites or proteins do not act alone, but in concert in their network context. Modules, i.e., groups of nodes with similar topological properties in these networks characterize important biological functions of the underlying biomolecular system. Edges in such molecular networks represent regulatory and physical interactions, and comparing them between conditions provides valuable information on differential molecular mechanisms. However, biological data is inherently noisy and network reduction techniques can propagate errors particularly to the level of edges. We aim to improve the analysis of networks of biological molecules by deriving modules together with edge relevance estimations that are based on global network characteristics. Methods: The key challenge we address here is investigating the capability of stochastic block models (SBMs) for representing and analyzing different types of biomolecular networks. Fitting them to SBMs both delivers modules of the networks and enables the derivation of edge confidence scores, and it has not yet been investigated for analyzing biomolecular networks. We apply SBM-based analysis independently to three correlation-based networks of breast cancer data originating from high-throughput measurements of different molecular layers: either transcriptomics, proteomics, or metabolomics. The networks were reduced by thresholding for correlation significance or by requirements on scale-freeness. Results and discussion: We find that the networks are best represented by the hierarchical version of the SBM, and many of the predicted blocks have a biologically and phenotypically relevant functional annotation. The edge confidence scores are overall in concordance with the biological evidence given by the measurements. We conclude that biomolecular networks can be appropriately represented and analyzed by fitting SBMs. As the SBM-derived edge confidence scores are based on global network connectivity characteristics and potential hierarchies within the biomolecular networks are considered, they could be used as additional, integrated features in network-based data comparisons

    User relationship classification of facebook messenger mobile data using WEKA

    Full text link
    © Springer Nature Switzerland AG 2018. Mobile devices are a wealth of information about its user and their digital and physical activities (e.g. online browsing and physical location). Therefore, in any crime investigation artifacts obtained from a mobile device can be extremely crucial. However, the variety of mobile platforms, applications (apps) and the significant size of data compound existing challenges in forensic investigations. In this paper, we explore the potential of machine learning in mobile forensics, and specifically in the context of Facebook messenger artifact acquisition and analysis. Using Quick and Choo (2017)’s Digital Forensic Intelligence Analysis Cycle (DFIAC) as the guiding framework, we demonstrate how one can acquire Facebook messenger app artifacts from an Android device and an iOS device (the latter is, using existing forensic tools. Based on the acquired evidence, we create 199 data-instances to train WEKA classifiers (i.e. ZeroR, J48 and Random tree) with the aim of classifying the device owner’s contacts and determine their mutual relationship strength

    Transforming growth factor β receptor 1 is a new candidate prognostic biomarker after acute myocardial infarction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of left ventricular (LV) remodeling after acute myocardial infarction (MI) is clinically important and would benefit from the discovery of new biomarkers.</p> <p>Methods</p> <p>Blood samples were obtained upon admission in patients with acute ST-elevation MI who underwent primary percutaneous coronary intervention. Messenger RNA was extracted from whole blood cells. LV function was evaluated by echocardiography at 4-months.</p> <p>Results</p> <p>In a test cohort of 32 MI patients, integrated analysis of microarrays with a network of protein-protein interactions identified subgroups of genes which predicted LV dysfunction (ejection fraction ≤ 40%) with areas under the receiver operating characteristic curve (AUC) above 0.80. Candidate genes included transforming growth factor beta receptor 1 (TGFBR1). In a validation cohort of 115 MI patients, TGBFR1 was up-regulated in patients with LV dysfunction (P < 0.001) and was associated with LV function at 4-months (P = 0.003). TGFBR1 predicted LV function with an AUC of 0.72, while peak levels of troponin T (TnT) provided an AUC of 0.64. Adding TGFBR1 to the prediction of TnT resulted in a net reclassification index of 8.2%. When added to a mixed clinical model including age, gender and time to reperfusion, TGFBR1 reclassified 17.7% of misclassified patients. TGFB1, the ligand of TGFBR1, was also up-regulated in patients with LV dysfunction (P = 0.004), was associated with LV function (P = 0.006), and provided an AUC of 0.66. In the rat MI model induced by permanent coronary ligation, the TGFB1-TGFBR1 axis was activated in the heart and correlated with the extent of remodeling at 2 months.</p> <p>Conclusions</p> <p>We identified TGFBR1 as a new candidate prognostic biomarker after acute MI.</p

    Data Mining Approaches to Diffuse Large B–Cell Lymphoma Gene Expression Data Interpretation

    Get PDF
    This paper presents a comprehensive study of gene expression patterns originating from a diffuse large B–cell lymphoma (DLBCL) database. It focuses on the implementation of feature selection and classification techniques. Thus, it firstly tackles the identification of relevant genes for the prediction of DLBCL types. It also allows the determination of key biomarkers to differentiate two subtypes of DLBCL samples: Activated B–Like and Germinal Centre B–Like DLBCL. Decision trees provide knowledge–based models to predict types and subtypes of DLBCL. This research suggests that the data may be insufficient to accurately predict DLBCL types or even detect functionally relevant genes. However, these methods represent reliable and understandable tools to start thinking about possible interesting non–linear interdependencies

    Coordinated modular functionality and prognostic potential of a heart failure biomarker-driven interaction network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of potentially relevant biomarkers and a deeper understanding of molecular mechanisms related to heart failure (HF) development can be enhanced by the implementation of biological network-based analyses. To support these efforts, here we report a global network of protein-protein interactions (PPIs) relevant to HF, which was characterized through integrative bioinformatic analyses of multiple sources of "omic" information.</p> <p>Results</p> <p>We found that the structural and functional architecture of this PPI network is highly modular. These network modules can be assigned to specialized processes, specific cellular regions and their functional roles tend to partially overlap. Our results suggest that HF biomarkers may be defined as key coordinators of intra- and inter-module communication. Putative biomarkers can, in general, be distinguished as "information traffic" mediators within this network. The top high traffic proteins are encoded by genes that are not highly differentially expressed across HF and non-HF patients. Nevertheless, we present evidence that the integration of expression patterns from high traffic genes may support accurate prediction of HF. We quantitatively demonstrate that intra- and inter-module functional activity may be controlled by a family of transcription factors known to be associated with the prevention of hypertrophy.</p> <p>Conclusion</p> <p>The systems-driven analysis reported here provides the basis for the identification of potentially novel biomarkers and understanding HF-related mechanisms in a more comprehensive and integrated way.</p

    Predictive integration of gene functional similarity and co-expression defines treatment response of endothelial progenitor cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Endothelial progenitor cells (EPCs) have been implicated in different processes crucial to vasculature repair, which may offer the basis for new therapeutic strategies in cardiovascular disease. Despite advances facilitated by functional genomics, there is a lack of systems-level understanding of treatment response mechanisms of EPCs. In this research we aimed to characterize the EPCs response to adenosine (Ado), a cardioprotective factor, based on the systems-level integration of gene expression data and prior functional knowledge. Specifically, we set out to identify novel biosignatures of Ado-treatment response in EPCs.</p> <p>Results</p> <p>The predictive integration of gene expression data and standardized functional similarity information enabled us to identify new treatment response biosignatures. Gene expression data originated from Ado-treated and -untreated EPCs samples, and functional similarity was estimated with Gene Ontology (GO)-based similarity information. These information sources enabled us to implement and evaluate an integrated prediction approach based on the concept of <it>k</it>-nearest neighbours learning (<it>k</it>NN). The method can be executed by expert- and data-driven input queries to guide the search for biologically meaningful biosignatures. The resulting <it>integrated kNN </it>system identified new candidate EPC biosignatures that can offer high classification performance (areas under the operating characteristic curve > 0.8). We also showed that the proposed models can outperform those discovered by standard gene expression analysis. Furthermore, we report an initial independent <it>in vitro </it>experimental follow-up, which provides additional evidence of the potential validity of the top biosignature.</p> <p>Conclusion</p> <p>Response to Ado treatment in EPCs can be accurately characterized with a new method based on the combination of gene co-expression data and GO-based similarity information. It also exploits the incorporation of human expert-driven queries as a strategy to guide the automated search for candidate biosignatures. The proposed biosignature improves the systems-level characterization of EPCs. The new integrative predictive modeling approach can also be applied to other phenotype characterization or biomarker discovery problems.</p

    Metrics for GO based protein semantic similarity: a systematic evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p
    corecore