21,106 research outputs found
Recommended from our members
Patterns of genomic and phenomic diversity in wine and table grapes.
Grapes are one of the most economically and culturally important crops worldwide, and they have been bred for both winemaking and fresh consumption. Here we evaluate patterns of diversity across 33 phenotypes collected over a 17-year period from 580 table and wine grape accessions that belong to one of the world's largest grape gene banks, the grape germplasm collection of the United States Department of Agriculture. We find that phenological events throughout the growing season are correlated, and quantify the marked difference in size between table and wine grapes. By pairing publicly available historical phenotype data with genome-wide polymorphism data, we identify large effect loci controlling traits that have been targeted during domestication and breeding, including hermaphroditism, lighter skin pigmentation and muscat aroma. Breeding for larger berries in table grapes was traditionally concentrated in geographic regions where Islam predominates and alcohol was prohibited, whereas wine grapes retained the ancestral smaller size that is more desirable for winemaking in predominantly Christian regions. We uncover a novel locus with a suggestive association with berry size that harbors a signature of positive selection for larger berries. Our results suggest that religious rules concerning alcohol consumption have had a marked impact on patterns of phenomic and genomic diversity in grapes
Prediction with Dimension Reduction of Multiple Molecular Data Sources for Patient Survival
Predictive modeling from high-dimensional genomic data is often preceded by a
dimension reduction step, such as principal components analysis (PCA). However,
the application of PCA is not straightforward for multi-source data, wherein
multiple sources of 'omics data measure different but related biological
components. In this article we utilize recent advances in the dimension
reduction of multi-source data for predictive modeling. In particular, we apply
exploratory results from Joint and Individual Variation Explained (JIVE), an
extension of PCA for multi-source data, for prediction of differing response
types. We conduct illustrative simulations to illustrate the practical
advantages and interpretability of our approach. As an application example we
consider predicting survival for Glioblastoma Multiforme (GBM) patients from
three data sources measuring mRNA expression, miRNA expression, and DNA
methylation. We also introduce a method to estimate JIVE scores for new samples
that were not used in the initial dimension reduction, and study its
theoretical properties; this method is implemented in the R package R.JIVE on
CRAN, in the function 'jive.predict'.Comment: 11 pages, 9 figure
Unconventional machine learning of genome-wide human cancer data
Recent advances in high-throughput genomic technologies coupled with
exponential increases in computer processing and memory have allowed us to
interrogate the complex aberrant molecular underpinnings of human disease from
a genome-wide perspective. While the deluge of genomic information is expected
to increase, a bottleneck in conventional high-performance computing is rapidly
approaching. Inspired in part by recent advances in physical quantum
processors, we evaluated several unconventional machine learning (ML)
strategies on actual human tumor data. Here we show for the first time the
efficacy of multiple annealing-based ML algorithms for classification of
high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas.
To assess algorithm performance, we compared these classifiers to a variety of
standard ML methods. Our results indicate the feasibility of using
annealing-based ML to provide competitive classification of human cancer types
and associated molecular subtypes and superior performance with smaller
training datasets, thus providing compelling empirical evidence for the
potential future application of unconventional computing architectures in the
biomedical sciences
Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes
By expressing prior distributions as general stochastic processes,
nonparametric Bayesian methods provide a flexible way to incorporate prior
knowledge and constrain the latent structure in statistical inference. The
Indian buffet process (IBP) is such an example that can be used to define a
prior distribution on infinite binary features, where the exchangeability among
subjects is assumed. The phylogenetic Indian buffet process (pIBP), a
derivative of IBP, enables the modeling of non-exchangeability among subjects
through a stochastic process on a rooted tree, which is similar to that used in
phylogenetics, to describe relationships among the subjects. In this paper, we
study the theoretical properties of IBP and pIBP under a binary factor model.
We establish the posterior contraction rates for both IBP and pIBP and
substantiate the theoretical results through simulation studies. This is the
first work addressing the frequentist property of the posterior behaviors of
IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP
prior to a real data example arising in the field of cancer genomics where the
exchangeability among subjects is violated
Applicability of in vivo staging of regional amyloid burden in a cognitively normal cohort with subjective memory complaints: the INSIGHT-preAD study.
BACKGROUND:Current methods of amyloid PET interpretation based on the binary classification of global amyloid signal fail to identify early phases of amyloid deposition. A recent analysis of 18F-florbetapir PET data from the Alzheimer's disease Neuroimaging Initiative cohort suggested a hierarchical four-stage model of regional amyloid deposition that resembles neuropathologic estimates and can be used to stage an individual's amyloid burden in vivo. Here, we evaluated the validity of this in vivo amyloid staging model in an independent cohort of older people with subjective memory complaints (SMC). We further examined its potential association with subtle cognitive impairments in this population at elevated risk for Alzheimer's disease (AD). METHODS:The monocentric INSIGHT-preAD cohort includes 318 cognitively intact older individuals with SMC. All individuals underwent 18F-florbetapir PET scanning and extensive neuropsychological testing. We projected the regional amyloid uptake signal into the previously proposed hierarchical staging model of in vivo amyloid progression. We determined the adherence to this model across all cases and tested the association between increasing in vivo amyloid stage and cognitive performance using ANCOVA models. RESULTS:In total, 156 participants (49%) showed evidence of regional amyloid deposition, and all but 2 of these (99%) adhered to the hierarchical regional pattern implied by the in vivo amyloid progression model. According to a conventional binary classification based on global signal (SUVRCereb = 1.10), individuals in stages III and IV were classified as amyloid-positive (except one in stage III), but 99% of individuals in stage I and even 28% of individuals in stage II were classified as amyloid-negative. Neither in vivo amyloid stage nor conventional binary amyloid status was significantly associated with cognitive performance in this preclinical cohort. CONCLUSIONS:The proposed hierarchical staging scheme of PET-evidenced amyloid deposition generalizes well to data from an independent cohort of older people at elevated risk for AD. Future studies will determine the prognostic value of the staging approach for predicting longitudinal cognitive decline in older individuals at increased risk for AD
Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum.
BackgroundPlasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5'- and 3'-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum.ResultsUsing binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47%) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6%). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated.ConclusionOur results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq)
- …