8,279 research outputs found
A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data
Due to rapid technological advances, a wide range of different measurements
can be obtained from a given biological sample including single nucleotide
polymorphisms, copy number variation, gene expression levels, DNA methylation
and proteomic profiles. Each of these distinct measurements provides the means
to characterize a certain aspect of biological diversity, and a fundamental
problem of broad interest concerns the discovery of shared patterns of
variation across different data types. Such data types are heterogeneous in the
sense that they represent measurements taken at very different scales or
described by very different data structures. We propose a distance-based
statistical test, the generalized RV (GRV) test, to assess whether there is a
common and non-random pattern of variability between paired biological
measurements obtained from the same random sample. The measurements enter the
test through distance measures which can be chosen to capture particular
aspects of the data. An approximate null distribution is proposed to compute
p-values in closed-form and without the need to perform costly Monte Carlo
permutation procedures. Compared to the classical Mantel test for association
between distance matrices, the GRV test has been found to be more powerful in a
number of simulation settings. We also report on an application of the GRV test
to detect biological pathways in which genetic variability is associated to
variation in gene expression levels in ovarian cancer samples, and present
results obtained from two independent cohorts
Adaptive Mantel Test for AssociationTesting in Imaging Genetics Data
Mantel's test (MT) for association is conducted by testing the linear
relationship of similarity of all pairs of subjects between two observational
domains. Motivated by applications to neuroimaging and genetics data, and
following the succes of shrinkage and kernel methods for prediction with
high-dimensional data, we here introduce the adaptive Mantel test as an
extension of the MT. By utilizing kernels and penalized similarity measures,
the adaptive Mantel test is able to achieve higher statistical power relative
to the classical MT in many settings. Furthermore, the adaptive Mantel test is
designed to simultaneously test over multiple similarity measures such that the
correct type I error rate under the null hypothesis is maintained without the
need to directly adjust the significance threshold for multiple testing. The
performance of the adaptive Mantel test is evaluated on simulated data, and is
used to investigate associations between genetics markers related to
Alzheimer's Disease and heatlhy brain physiology with data from a working
memory study of 350 college students from Beijing Normal University
Differential analysis of biological networks
In cancer research, the comparison of gene expression or DNA methylation
networks inferred from healthy controls and patients can lead to the discovery
of biological pathways associated to the disease. As a cancer progresses, its
signalling and control networks are subject to some degree of localised
re-wiring. Being able to detect disrupted interaction patterns induced by the
presence or progression of the disease can lead to the discovery of novel
molecular diagnostic and prognostic signatures. Currently there is a lack of
scalable statistical procedures for two-network comparisons aimed at detecting
localised topological differences. We propose the dGHD algorithm, a methodology
for detecting differential interaction patterns in two-network comparisons. The
algorithm relies on a statistic, the Generalised Hamming Distance (GHD), for
assessing the degree of topological difference between networks and evaluating
its statistical significance. dGHD builds on a non-parametric permutation
testing framework but achieves computationally efficiency through an asymptotic
normal approximation. We show that the GHD is able to detect more subtle
topological differences compared to a standard Hamming distance between
networks. This results in the dGHD algorithm achieving high performance in
simulation studies as measured by sensitivity and specificity. An application
to the problem of detecting differential DNA co-methylation subnetworks
associated to ovarian cancer demonstrates the potential benefits of the
proposed methodology for discovering network-derived biomarkers associated with
a trait of interest
Ensemble Analysis of Adaptive Compressed Genome Sequencing Strategies
Acquiring genomes at single-cell resolution has many applications such as in
the study of microbiota. However, deep sequencing and assembly of all of
millions of cells in a sample is prohibitively costly. A property that can come
to rescue is that deep sequencing of every cell should not be necessary to
capture all distinct genomes, as the majority of cells are biological
replicates. Biologically important samples are often sparse in that sense. In
this paper, we propose an adaptive compressed method, also known as distilled
sensing, to capture all distinct genomes in a sparse microbial community with
reduced sequencing effort. As opposed to group testing in which the number of
distinct events is often constant and sparsity is equivalent to rarity of an
event, sparsity in our case means scarcity of distinct events in comparison to
the data size. Previously, we introduced the problem and proposed a distilled
sensing solution based on the breadth first search strategy. We simulated the
whole process which constrained our ability to study the behavior of the
algorithm for the entire ensemble due to its computational intensity. In this
paper, we modify our previous breadth first search strategy and introduce the
depth first search strategy. Instead of simulating the entire process, which is
intractable for a large number of experiments, we provide a dynamic programming
algorithm to analyze the behavior of the method for the entire ensemble. The
ensemble analysis algorithm recursively calculates the probability of capturing
every distinct genome and also the expected total sequenced nucleotides for a
given population profile. Our results suggest that the expected total sequenced
nucleotides grows proportional to of the number of cells and
proportional linearly with the number of distinct genomes
- …