137 research outputs found
Correlation structures in applied probability
This thesis examines consequences of correlation structure in three areas of applied probability: mathematical population genetics, birth processes, and "exchangeable" measures on distributive lattices. The first three chapters concern probabilistic models in genetics. Initially we generalize the Moran model to allow more than one individual to reproduce per generation, investigating the effect of this on the behaviour of the model. The conclusion is that while things apparently happen faster, the basic properties are the same. This model also serves to unify conventional neutral theory, as it links the Moran model to the Wright-Fisher model. We then consider aspects of the neutral theory. Commonly a neutral model is supposed in which successive generations behave independently. This may well be unrealistic. Here we take the Moran model and adapt it to allow for correlations in offspring numbers between generations. An analysis of the model shows that the conditional distribution of allele frequencies is unchanged, although the expected number of alleles present decreases. Similar results are also obtained when correlation is introduced to the more general model with more than one reproducer per generation. In each case the approach involves a detailed study of the genealogy of the models. Next we consider the effect of correlation in Markov Birth Processes. We show that if the birth rates form a super(sub) linear sequence then the sizes of its families are positively(negatively) correlated. From this we prove a conjecture of Faddy which says that if the birth rates of a process X(t) are super(sub)-linear then the variance ratio V (t) (defined as VarX(t)/(EX(t)[EX(t)/X(0)-1])) is greater than (less than) 1. Finally we study correlation inequalities. The FKG Inequality is a well known result giving sufficient conditions for positive correlations in probability measures on distributive lattices. There are few analogous results concerning negative correlation. We give sufficient conditions for a particular form of negative correlation when the underlying distributions possess a certain exchangeability property
Recommended from our members
Mutational signatures in colon cancer.
ObjectiveRecently, many tumor sequencing studies have inferred and reported on mutational signatures, short nucleotide patterns at which particular somatic base substitutions appear more often. A number of signatures reflect biological processes in the patient and factors associated with cancer risk. Our goal is to infer mutational signatures appearing in colon cancer, a cancer for which environmental risk factors vary by cancer subtype, and compare the signatures to those in adult stem cells from normal colon. We also compare the mutational signatures to others in the literature.ResultsWe apply a probabilistic mutation signature model to somatic mutations previously reported for six adult normal colon stem cells and 431 colon adenocarcinomas. We infer six mutational signatures in colon cancer, four being specific to tumors with hypermutation. Just two signatures explained the majority of mutations in the small number of normal aging colon samples. All six signatures are independently identified in a series of 295 Chinese colorectal cancers
Fast "coalescent" simulation
BACKGROUND: The amount of genome-wide molecular data is increasing rapidly, as is interest in developing methods appropriate for such data. There is a consequent increasing need for methods that are able to efficiently simulate such data. In this paper we implement the sequentially Markovian coalescent algorithm described by McVean and Cardin and present a further modification to that algorithm which slightly improves the closeness of the approximation to the full coalescent model. The algorithm ignores a class of recombination events known to affect the behavior of the genealogy of the sample, but which do not appear to affect the behavior of generated samples to any substantial degree. RESULTS: We show that our software is able to simulate large chromosomal regions, such as those appropriate in a consideration of genome-wide data, in a way that is several orders of magnitude faster than existing coalescent algorithms. CONCLUSION: This algorithm provides a useful resource for those needing to simulate large quantities of data for chromosomal-length regions using an approach that is much more efficient than traditional coalescent models
Threshold Response to Stochasticity in Morphogenesis
During development of biological organisms, multiple complex structures are
formed. In many instances, these structures need to exhibit a high degree of
order to be functional, although many of their constituents are intrinsically
stochastic. Hence, it has been suggested that biological robustness ultimately
must rely on complex gene regulatory networks and clean-up mechanisms. Here we
explore developmental processes that have evolved inherent robustness against
stochasticity. In the context of the Drosophila eye disc, multiple optical
units, ommatidia, develop into crystal-like patterns. During the larva-to-pupa
stage of metamorphosis, the centers of the ommatidia are specified initially
through the diffusion of morphogens, followed by the specification of R8 cells.
Establishing the R8 cell is crucial in setting up the geometric, and
functional, relationships of cells within an ommatidium and among neighboring
ommatidia. Here we study a mathematical model of these spatio-temporal
processes in the presence of stochasticity, defining and applying measures that
quantify order within the resulting spatial patterns. We observe a universal
sigmoidal response to increasing transcriptional noise. Ordered patterns
persist up to a threshold noise level in the model parameters. As the noise is
further increased past a threshold point of no return, these ordered patterns
rapidly become disordered. Such robustness in development allows for the
accumulation of genetic variation without any observable changes in phenotype.
We argue that the observed sigmoidal dependence introduces robustness allowing
for sizable amounts of genetic variation and transcriptional noise to be
tolerated in natural populations without resulting in phenotype variation
Modeling measurement error in tumor characterization studies
<p>Abstract</p> <p>Background</p> <p>Etiologic studies of cancer increasingly use molecular features such as gene expression, DNA methylation and sequence mutation to subclassify the cancer type. In large population-based studies, the tumor tissues available for study are archival specimens that provide variable amounts of amplifiable DNA for molecular analysis. As molecular features measured from small amounts of tumor DNA are inherently noisy, we propose a novel approach to improve statistical efficiency when comparing groups of samples. We illustrate the phenomenon using the MethyLight technology, applying our proposed analysis to compare <it>MLH1 </it>DNA methylation levels in males and females studied in the Colon Cancer Family Registry.</p> <p>Results</p> <p>We introduce two methods for computing empirical weights to model heteroscedasticity that is caused by sampling variable quantities of DNA for molecular analysis. In a simulation study, we show that using these weights in a linear regression model is more powerful for identifying differentially methylated loci than standard regression analysis. The increase in power depends on the underlying relationship between variation in outcome measure and input DNA quantity in the study samples.</p> <p>Conclusions</p> <p>Tumor characteristics measured from small amounts of tumor DNA are inherently noisy. We propose a statistical analysis that accounts for the measurement error due to sampling variation of the molecular feature and show how it can improve the power to detect differential characteristics between patient groups.</p
Copy number variation in the Framingham Heart Study
In this paper we test for association between copy number variation and diabetes in a subset of individuals from the Framingham Heart Study. We used the 500 k SNP data and called copy number variation using two algorithms: the genome alteration detection algorithm of Pique-Regi et al. and the software Golden Helix. We then tested for association between copy number and diabetes using a gene-based analysis. Our results show little evidence of association between copy number and diabetes status. Furthermore, our results indicate a relatively poor level of agreement between copy number calls resulting from the two programs. We then examined potential causes for this difference in results and the implications for future studies
- …