16 research outputs found
A Common-Factor Approach for Multivariate Data Cleaning with an Application to Mars Phoenix Mission Data
Data quality is fundamentally important to ensure the reliability of data for
stakeholders to make decisions. In real world applications, such as scientific
exploration of extreme environments, it is unrealistic to require raw data
collected to be perfect. As data miners, when it is infeasible to physically
know the why and the how in order to clean up the data, we propose to seek the
intrinsic structure of the signal to identify the common factors of
multivariate data. Using our new data driven learning method, the common-factor
data cleaning approach, we address an interdisciplinary challenge on
multivariate data cleaning when complex external impacts appear to interfere
with multiple data measurements. Existing data analyses typically process one
signal measurement at a time without considering the associations among all
signals. We analyze all signal measurements simultaneously to find the hidden
common factors that drive all measurements to vary together, but not as a
result of the true data measurements. We use common factors to reduce the
variations in the data without changing the base mean level of the data to
avoid altering the physical meaning.Comment: 12 pages, 10 figures, 1 tabl
Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis
Graphical models have gained a lot of attention recently as a tool for
learning and representing dependencies among variables in multivariate data.
Often, domain scientists are looking specifically for differences among the
dependency networks of different conditions or populations (e.g. differences
between regulatory networks of different species, or differences between
dependency networks of diseased versus healthy populations). The standard
method for finding these differences is to learn the dependency networks for
each condition independently and compare them. We show that this approach is
prone to high false discovery rates (low precision) that can render the
analysis useless. We then show that by imposing a bias towards learning similar
dependency networks for each condition the false discovery rates can be reduced
to acceptable levels, at the cost of finding a reduced number of differences.
Algorithms developed in the transfer learning literature can be used to vary
the strength of the imposed similarity bias and provide a natural mechanism to
smoothly adjust this differential precision-recall tradeoff to cater to the
requirements of the analysis conducted. We present real case studies
(oncological and neurological) where domain experts use the proposed technique
to extract useful differential networks that shed light on the biological
processes involved in cancer and brain function
Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks
Modeling biological networks serves as both a major goal and an effective
tool of systems biology in studying mechanisms that orchestrate the activities
of gene products in cells. Biological networks are context specific and dynamic
in nature. To systematically characterize the selectively activated regulatory
components and mechanisms, the modeling tools must be able to effectively
distinguish significant rewiring from random background fluctuations. We
formulated the inference of differential dependency networks that incorporates
both conditional data and prior knowledge as a convex optimization problem, and
developed an efficient learning algorithm to jointly infer the conserved
biological network and the significant rewiring across different conditions. We
used a novel sampling scheme to estimate the expected error rate due to random
knowledge and based on which, developed a strategy that fully exploits the
benefit of this data-knowledge integrated approach. We demonstrated and
validated the principle and performance of our method using synthetic datasets.
We then applied our method to yeast cell line and breast cancer microarray data
and obtained biologically plausible results.Comment: 7 pages, 7 figure
Link-based quantitative methods to identify differentially coexpressed genes and gene Pairs
<p>Abstract</p> <p>Background</p> <p>Differential coexpression analysis (DCEA) is increasingly used for investigating the global transcriptional mechanisms underlying phenotypic changes. Current DCEA methods mostly adopt a gene connectivity-based strategy to estimate differential coexpression, which is characterized by comparing the numbers of gene neighbors in different coexpression networks. Although it simplifies the calculation, this strategy mixes up the identities of different coexpression neighbors of a gene, and fails to differentiate significant differential coexpression changes from those trivial ones. Especially, the correlation-reversal is easily missed although it probably indicates remarkable biological significance.</p> <p>Results</p> <p>We developed two link-based quantitative methods, DCp and DCe, to identify differentially coexpressed genes and gene pairs (links). Bearing the uniqueness of exploiting the quantitative coexpression change of each gene pair in the coexpression networks, both methods proved to be superior to currently popular methods in simulation studies. Re-mining of a publicly available type 2 diabetes (T2D) expression dataset from the perspective of differential coexpression analysis led to additional discoveries than those from differential expression analysis.</p> <p>Conclusions</p> <p>This work pointed out the critical weakness of current popular DCEA methods, and proposed two link-based DCEA algorithms that will make contribution to the development of DCEA and help extend it to a broader spectrum.</p
Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks
Background: Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time.
Results: In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator.
Availability: Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/
An inferential framework for biological network hypothesis tests
Background
Networks are ubiquitous in modern cell biology and physiology. A large literature exists for inferring/proposing biological pathways/networks using statistical or machine learning algorithms. Despite these advances a formal testing procedure for analyzing network-level observations is in need of further development. Comparing the behaviour of a pharmacologically altered pathway to its canonical form is an example of a salient one-sample comparison. Locating which pathways differentiate disease from no-disease phenotype may be recast as a two-sample network inference problem. Results
We outline an inferential method for performing one- and two-sample hypothesis tests where the sampling unit is a network and the hypotheses are stated via network model(s). We propose a dissimilarity measure that incorporates nearby neighbour information to contrast one or more networks in a statistical test. We demonstrate and explore the utility of our approach with both simulated and microarray data; random graphs and weighted (partial) correlation networks are used to form network models. Using both a well-known diabetes dataset and an ovarian cancer dataset, the methods outlined here could better elucidate co-regulation changes for one or more pathways between two clinically relevant phenotypes. Conclusions
Formal hypothesis tests for gene- or protein-based networks are a logical progression from existing gene-based and gene-set tests for differential expression. Commensurate with the growing appreciation and development of systems biology, the dissimilarity-based testing methods presented here may allow us to improve our understanding of pathways and other complex regulatory systems. The benefit of our method was illustrated under select scenarios