1,451 research outputs found
Gains in Power from Structured Two-Sample Tests of Means on Graphs
We consider multivariate two-sample tests of means, where the location shift
between the two populations is expected to be related to a known graph
structure. An important application of such tests is the detection of
differentially expressed genes between two patient populations, as shifts in
expression levels are expected to be coherent with the structure of graphs
reflecting gene properties such as biological process, molecular function,
regulation, or metabolism. For a fixed graph of interest, we demonstrate that
accounting for graph structure can yield more powerful tests under the
assumption of smooth distribution shift on the graph. We also investigate the
identification of non-homogeneous subgraphs of a given large graph, which poses
both computational and multiple testing problems. The relevance and benefits of
the proposed approach are illustrated on synthetic data and on breast cancer
gene expression data analyzed in context of KEGG pathways
Multiple tests of association with biological annotation metadata
We propose a general and formal statistical framework for multiple tests of
association between known fixed features of a genome and unknown parameters of
the distribution of variable features of this genome in a population of
interest. The known gene-annotation profiles, corresponding to the fixed
features of the genome, may concern Gene Ontology (GO) annotation, pathway
membership, regulation by particular transcription factors, nucleotide
sequences, or protein sequences. The unknown gene-parameter profiles,
corresponding to the variable features of the genome, may be, for example,
regression coefficients relating possibly censored biological and clinical
outcomes to genome-wide transcript levels, DNA copy numbers, and other
covariates. A generic question of great interest in current genomic research
regards the detection of associations between biological annotation metadata
and genome-wide expression measures. This biological question may be translated
as the test of multiple hypotheses concerning association measures between
gene-annotation profiles and gene-parameter profiles. A general and rigorous
formulation of the statistical inference question allows us to apply the
multiple hypothesis testing methodology developed in [Multiple Testing
Procedures with Applications to Genomics (2008) Springer, New York] and related
articles, to control a broad class of Type I error rates, defined as
generalized tail probabilities and expected values for arbitrary functions of
the numbers of Type I errors and rejected hypotheses. The resampling-based
single-step and stepwise multiple testing procedures of [Multiple Testing
Procedures with Applications to Genomics (2008) Springer, New York] take into
account the joint distribution of the test statistics and provide Type I error
control in testing problems involving general data generating distributions
(with arbitrary dependence structures among variables), null hypotheses, and
test statistics.Comment: Published in at http://dx.doi.org/10.1214/193940307000000446 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
GenomeGraphs: integrated genomic data visualization with R.
BackgroundBiological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses.ResultsWe developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system.ConclusionGenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R
Quantification and Visualization of LD Patterns and Identification of Haplotype Blocks
Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution of alleles at these loci, present noisy patterns. In this paper, we propose a new distance-based LD measure, R, which takes into account multilocus haplotypes around the two loci in order to exploit information from neighboring loci. The LD measure R yields a matrix of pairwise distances between markers, based on the correlation between the lengths of shared haplotypes among chromosomes around these markers. Data analysis demonstrates that visualization of LD patterns through the R matrix reveals more deterministic patterns, with much less noise, than using classical LD measures. Moreover, the patterns are highly compatible with recently suggested models of haplotype block structure. We propose to apply the new LD measure to define haplotype blocks through cluster analysis. Specifically, we present a distance-based clustering algorithm, DHPBlocker, which performs hierarchical partitioning of an ordered sequence of markers into disjoint and adjacent blocks with a hierarchical structure. The proposed method integrates information on the two main existing criteria in defining haplotype blocks, namely, LD and haplotype diversity, through the use of silhouette width and description length as cluster validity measures, respectively. The new LD measure and clustering procedure are applied to single nucleotide polymorphism (SNP) datasets from the human 5q31 region (Daly et al. 2001) and the class II region of the human major histocompatibility complex (Jeffreys et al. 2001). Our results are in good agreement with published results. In addition, analyses performed on different subsets of markers indicate that the method is robust with regards to the allele frequency and density of the genotyped markers. Unlike previously proposed methods, our new cluster-based method can uncover hierarchical relationships among blocks and can be applied to polymorphic DNA markers or amino acid sequence data
Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data
This work studies the theoretical rules of feature selection in linear
discriminant analysis (LDA), and a new feature selection method is proposed for
sparse linear discriminant analysis. An minimization method is used to
select the important features from which the LDA will be constructed. The
asymptotic results of this proposed two-stage LDA (TLDA) are studied,
demonstrating that TLDA is an optimal classification rule whose convergence
rate is the best compared to existing methods. The experiments on simulated and
real datasets are consistent with the theoretical results and show that TLDA
performs favorably in comparison with current methods. Overall, TLDA uses a
lower minimum number of features or genes than other approaches to achieve a
better result with a reduced misclassification rate.Comment: 20 pages, 3 figures, 5 tables, accepted by Computational Statistics
and Data Analysi
The Discursive Effects of the Haiku-based SADUPA Poetry Technique in Palliative Care
International audienceThis qualitative study seeks to present the discursive effects of SADUPA, a new poetry-based technique centered on haiku, in the context of psycho-oncological treatment. The technique is used with a terminal cancer patient, Mr. A. The psychological processes involved with and the poetic writings arising from the technique are discussed. In particular, the discursive variations in Mr. A’s narrative of his illness are described as they occurred before and after his poetry writing. The authors suggest that writing workshops based on the brief poetic structures of the haiku can enable patients to produce a larger and more singular narrative about their end-of- life experiences
- …