704 research outputs found
Modelling the correlation between the activities of adjacent genes in drosophila
BACKGROUND: Correlation between the expression levels of genes which are located close to each other on the genome has been found in various organisms, including yeast, drosophila and humans. Since such a correlation could be explained by several biochemical, evolutionary, genetic and technological factors, there is a need for statistical models that correspond to specific biological models for the correlation structure. RESULTS: We modelled the pairwise correlation between the expressions of the genes in a Drosophila microarray experiment as a normal mixture under Fisher's z-transform, and fitted the model to the correlations of expressions of adjacent as well as non-adjacent genes. We also analyzed simulated data for comparison. The model provided a good fit to the data. Further, correlation between the activities of two genes could, in most cases, be attributed to either of two factors: the two genes both being active in the same age group (adult or embryo), or the two genes being in proximity of each other on the chromosome. The interaction between these two factors was weak. CONCLUSIONS: Correlation between the activities of adjacent genes is higher than between non-adjacent genes. In the data we analyzed, this appeared, for the most part, to be a constant effect that applied to all pairs of adjacent genes
Associating multiple longitudinal traits with high-dimensional single-nucleotide polymorphism data: application to the Framingham Heart Study
Cardiovascular diseases are associated with combinations of phenotypic traits, which are in turn caused by a combination of environmental and genetic factors. Because of the diversity of pathways that may lead to cardiovascular diseases, we examined the so-called intermediate phenotypes, which are often repeatedly measured. We developed a penalized nonlinear canonical correlation analysis to associate multiple repeatedly measured traits with high-dimensional single-nucleotide polymorphism data
Association of repeatedly measured intermediate risk factors for complex diseases with high dimensional SNP data
BACKGROUND: The causes of complex diseases are difficult to grasp since many different factors play a role in their onset. To find a common genetic background, many of the existing studies divide their population into controls and cases; a classification that is likely to cause heterogeneity within the two groups. Rather than dividing the study population into cases and controls, it is better to identify the phenotype of a complex disease by a set of intermediate risk factors. But these risk factors often vary over time and are therefore repeatedly measured. RESULTS: We introduce a method to associate multiple repeatedly measured intermediate risk factors with a high dimensional set of single nucleotide polymorphisms (SNPs). Via a two-step approach, we summarized the time courses of each individual and, secondly apply these to penalized nonlinear canonical correlation analysis to obtain sparse results. CONCLUSIONS: Application of this method to two datasets which study the genetic background of cardiovascular diseases, show that compared to progression over time, mainly the constant levels in time are associated with sets of SNPs
Modeling Sage data with a truncated gamma-Poisson model
BACKGROUND: Serial Analysis of Gene Expressions (SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, whereas other variance sources, in particular biological variance, should be modeled using a continuous distribution function, i.e. a prior on the intensity of the Poisson distribution. One challenge is that such a model predicts a large number of genes with zero counts, which cannot be observed. RESULTS: We present a hierarchical Poisson model with a gamma prior and three different algorithms for estimating the parameters in the model. It turns out that the rate parameter in the gamma distribution can be estimated on the basis of a single SAGE library, whereas the estimate of the shape parameter becomes unstable. This means that the number of zero counts cannot be estimated reliably. When a bivariate model is applied to two SAGE libraries, however, the number of predicted zero counts becomes more stable and in approximate agreement with the number of transcripts observed across a large number of experiments. In all the libraries we analyzed there was a small population of very highly expressed tags, typically 1% of the tags, that could not be accounted for by the model. To handle those tags we chose to augment our model with a non-parametric component. We also show some results based on a log-normal distribution instead of the gamma distribution. CONCLUSION: By modeling SAGE data with a hierarchical Poisson model it is possible to separate the sampling variance from the variance in gene expression. If expression levels are reported at the gene level rather than at the tag level, genes mapped to multiple tags must be kept separate, since their expression levels show a different statistical behavior. A log-normal prior provided a better fit to our data than the gamma prior, but except for a small subpopulation of tags with very high counts, the two priors are similar
Comparing transformation methods for DNA microarray data
BACKGROUND: When DNA microarray data are used for gene clustering, genotype/phenotype correlation studies, or tissue classification the signal intensities are usually transformed and normalized in several steps in order to improve comparability and signal/noise ratio. These steps may include subtraction of an estimated background signal, subtracting the reference signal, smoothing (to account for nonlinear measurement effects), and more. Different authors use different approaches, and it is generally not clear to users which method they should prefer. RESULTS: We used the ratio between biological variance and measurement variance (which is an F-like statistic) as a quality measure for transformation methods, and we demonstrate a method for maximizing that variance ratio on real data. We explore a number of transformations issues, including Box-Cox transformation, baseline shift, partial subtraction of the log-reference signal and smoothing. It appears that the optimal choice of parameters for the transformation methods depends on the data. Further, the behavior of the variance ratio, under the null hypothesis of zero biological variance, appears to depend on the choice of parameters. CONCLUSIONS: The use of replicates in microarray experiments is important. Adjustment for the null-hypothesis behavior of the variance ratio is critical to the selection of transformation method
Responsiveness: a reinvention of the wheel?
BACKGROUND: Since the mid eighties, responsiveness is considered to be a separate property of health status questionnaires distinct from reliability and validity. The aim of the study was to assess the strength of the relationship between internal consistency reliability, referring to an instrument's sensitivity to differences in health status among subjects at one point in time, and responsiveness referring to sensitivity to health status changes over time. METHODS: We used three different datasets comprising the scores of patients on the Barthel, the SIP and the GO-QoL instruments at two points in time. The internal consistency was reduced stepwise by removing the item that contributed most to a scale's reliability. We calculated the responsiveness expressed by the Standardized Response Mean (SRM) on each set of remaining items. The strength of the relationship between the thus obtained internal consistency coefficients and SRMs was quantified by Spearman rank correlation coefficients. RESULTS: Strong to perfect correlations (0.90 – 1.00) was found between internal consistency coefficients and SRMs for all instruments indicating, that the two can be used interchangeably. CONCLUSION: The results contradict the conviction that responsiveness is a separate psychometric property. The internal consistency coefficient adequately reflects an instrument's potential sensitivity to changes over time
Cardiovascular research: data dispersion issues
Biological processes are full of variations and so are responses to therapy as measured in clinical research. Estimators of clinical efficacy are, therefore, usually reported with a measure of uncertainty, otherwise called dispersion. This study aimed to review both the flaws of data reports without measure of dispersion and those with over-dispersion
- …