Search CORE

16 research outputs found

A Common-Factor Approach for Multivariate Data Cleaning with an Application to Mars Phoenix Mission Data

Author: Ding Wei
Fang Dongping
Kounaves Samuel P.
Oberlin Elizabeth
Publication venue
Publication date: 07/10/2015
Field of study

Data quality is fundamentally important to ensure the reliability of data for stakeholders to make decisions. In real world applications, such as scientific exploration of extreme environments, it is unrealistic to require raw data collected to be perfect. As data miners, when it is infeasible to physically know the why and the how in order to clean up the data, we propose to seek the intrinsic structure of the signal to identify the common factors of multivariate data. Using our new data driven learning method, the common-factor data cleaning approach, we address an interdisciplinary challenge on multivariate data cleaning when complex external impacts appear to interfere with multiple data measurements. Existing data analyses typically process one signal measurement at a time without considering the associations among all signals. We analyze all signal measurements simultaneously to find the hidden common factors that drive all measurements to vary together, but not as a result of the true data measurements. We use common factors to reduce the variations in the data without changing the base mean level of the data to avoid altering the physical meaning.Comment: 12 pages, 10 figures, 1 tabl

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis

Author: Clark Vincent P.
Niculescu-Mizil Alexandru
Ostroff Rachel
Oyen Diane
Stewart Alex
Publication venue
Publication date: 09/07/2013
Field of study

Graphical models have gained a lot of attention recently as a tool for learning and representing dependencies among variables in multivariate data. Often, domain scientists are looking specifically for differences among the dependency networks of different conditions or populations (e.g. differences between regulatory networks of different species, or differences between dependency networks of diseased versus healthy populations). The standard method for finding these differences is to learn the dependency networks for each condition independently and compare them. We show that this approach is prone to high false discovery rates (low precision) that can render the analysis useless. We then show that by imposing a bias towards learning similar dependency networks for each condition the false discovery rates can be reduced to acceptable levels, at the cost of finding a reduced number of differences. Algorithms developed in the transfer learning literature can be used to vary the strength of the imposed similarity bias and provide a natural mechanism to smoothly adjust this differential precision-recall tradeoff to cater to the requirements of the analysis conducted. We present real case studies (oncological and neurological) where domain experts use the proposed technique to extract useful differential networks that shed light on the biological processes involved in cancer and brain function

arXiv.org e-Print Archive

CiteSeerX

Differential Regulatory Analysis Based on Coexpression Network in Cancer Research

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks

Author: Clarke Robert
Herrington David M.
Hoffman Eric P.
Shih Ie-Ming
Tian Ye
Wang Yue
Xuan Jianhua
Zhang Bai
Zhang Zhen
Publication venue
Publication date: 01/01/2014
Field of study

Modeling biological networks serves as both a major goal and an effective tool of systems biology in studying mechanisms that orchestrate the activities of gene products in cells. Biological networks are context specific and dynamic in nature. To systematically characterize the selectively activated regulatory components and mechanisms, the modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. We formulated the inference of differential dependency networks that incorporates both conditional data and prior knowledge as a convex optimization problem, and developed an efficient learning algorithm to jointly infer the conserved biological network and the significant rewiring across different conditions. We used a novel sampling scheme to estimate the expected error rate due to random knowledge and based on which, developed a strategy that fully exploits the benefit of this data-knowledge integrated approach. We demonstrated and validated the principle and performance of our method using synthetic datasets. We then applied our method to yeast cell line and breast cancer microarray data and obtained biologically plausible results.Comment: 7 pages, 7 figure

arXiv.org e-Print Archive

Springer - Publisher Connector

PubMed Central

George Washington University: Health Sciences Research Commons (HSRC)

Link-based quantitative methods to identify differentially coexpressed genes and gene Pairs

Author: Li Chun
Li Yi-Xue
Li Yuan-Yuan
Liu Bao-Hong
Ye Zhi-Qiang
Yu Hui
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Differential coexpression analysis (DCEA) is increasingly used for investigating the global transcriptional mechanisms underlying phenotypic changes. Current DCEA methods mostly adopt a gene connectivity-based strategy to estimate differential coexpression, which is characterized by comparing the numbers of gene neighbors in different coexpression networks. Although it simplifies the calculation, this strategy mixes up the identities of different coexpression neighbors of a gene, and fails to differentiate significant differential coexpression changes from those trivial ones. Especially, the correlation-reversal is easily missed although it probably indicates remarkable biological significance. Results We developed two link-based quantitative methods, DCp and DCe, to identify differentially coexpressed genes and gene pairs (links). Bearing the uniqueness of exploiting the quantitative coexpression change of each gene pair in the coexpression networks, both methods proved to be superior to currently popular methods in simulation studies. Re-mining of a publicly available type 2 diabetes (T2D) expression dataset from the perspective of differential coexpression analysis led to additional discoveries than those from differential expression analysis. Conclusions This work pointed out the critical weakness of current popular DCEA methods, and proposed two link-based DCEA algorithms that will make contribution to the development of DCEA and help extend it to a broader spectrum.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks

Author: Dubois Marieke
Inzé Dirk
Van de Peer Yves
Van Landeghem Sofie
Van Parys Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time. Results: In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator. Availability: Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

UPSpace at the University of Pretoria

An inferential framework for biological network hypothesis tests

Author: Mukhopadhyay Nitai D.
Yates Phillip D.
Publication venue: VCU Scholars Compass
Publication date: 01/01/2013
Field of study

Background Networks are ubiquitous in modern cell biology and physiology. A large literature exists for inferring/proposing biological pathways/networks using statistical or machine learning algorithms. Despite these advances a formal testing procedure for analyzing network-level observations is in need of further development. Comparing the behaviour of a pharmacologically altered pathway to its canonical form is an example of a salient one-sample comparison. Locating which pathways differentiate disease from no-disease phenotype may be recast as a two-sample network inference problem. Results We outline an inferential method for performing one- and two-sample hypothesis tests where the sampling unit is a network and the hypotheses are stated via network model(s). We propose a dissimilarity measure that incorporates nearby neighbour information to contrast one or more networks in a statistical test. We demonstrate and explore the utility of our approach with both simulated and microarray data; random graphs and weighted (partial) correlation networks are used to form network models. Using both a well-known diabetes dataset and an ovarian cancer dataset, the methods outlined here could better elucidate co-regulation changes for one or more pathways between two clinically relevant phenotypes. Conclusions Formal hypothesis tests for gene- or protein-based networks are a logical progression from existing gene-based and gene-set tests for differential expression. Commensurate with the growing appreciation and development of systems biology, the dissimilarity-based testing methods presented here may allow us to improve our understanding of pathways and other complex regulatory systems. The benefit of our method was illustrated under select scenarios

Crossref

Springer - Publisher Connector

PubMed Central

VCU Scholars Compass

An inferential framework for biological network hypothesis tests

Author
Publication venue: BioMed Central
Publication date: 14/03/2013
Field of study

Springer - Publisher Connector