2,769 research outputs found
Gene expression meta-analysis supports existence of molecular apocrine breast cancer with a role for androgen receptor and implies interactions with ErbB family
<p>Abstract</p> <p>Background</p> <p>Pathway discovery from gene expression data can provide important insight into the relationship between signaling networks and cancer biology. Oncogenic signaling pathways are commonly inferred by comparison with signatures derived from cell lines. We use the Molecular Apocrine subtype of breast cancer to demonstrate our ability to infer pathways directly from patients' gene expression data with pattern analysis algorithms.</p> <p>Methods</p> <p>We combine data from two studies that propose the existence of the Molecular Apocrine phenotype. We use quantile normalization and XPN to minimize institutional bias in the data. We use hierarchical clustering, principal components analysis, and comparison of gene signatures derived from Significance Analysis of Microarrays to establish the existence of the Molecular Apocrine subtype and the equivalence of its molecular phenotype across both institutions. Statistical significance was computed using the Fasano & Franceschini test for separation of principal components and the hypergeometric probability formula for significance of overlap in gene signatures. We perform pathway analysis using LeFEminer and Backward Chaining Rule Induction to identify a signaling network that differentiates the subset. We identify a larger cohort of samples in the public domain, and use Gene Shaving and Robust Bayesian Network Analysis to detect pathways that interact with the defining signal.</p> <p>Results</p> <p>We demonstrate that the two separately introduced ER<sup>- </sup>breast cancer subsets represent the same tumor type, called Molecular Apocrine breast cancer. LeFEminer and Backward Chaining Rule Induction support a role for AR signaling as a pathway that differentiates this subset from others. Gene Shaving and Robust Bayesian Network Analysis detect interactions between the AR pathway, EGFR trafficking signals, and ErbB2.</p> <p>Conclusion</p> <p>We propose criteria for meta-analysis that are able to demonstrate statistical significance in establishing molecular equivalence of subsets across institutions. Data mining strategies used here provide an alternative method to comparison with cell lines for discovering seminal pathways and interactions between signaling networks. Analysis of Molecular Apocrine breast cancer implies that therapies targeting AR might be hampered if interactions with ErbB family members are not addressed.</p
Unsupervised empirical Bayesian multiple testing with external covariates
In an empirical Bayesian setting, we provide a new multiple testing method,
useful when an additional covariate is available, that influences the
probability of each null hypothesis being true. We measure the posterior
significance of each test conditionally on the covariate and the data, leading
to greater power. Using covariate-based prior information in an unsupervised
fashion, we produce a list of significant hypotheses which differs in length
and order from the list obtained by methods not taking covariate-information
into account. Covariate-modulated posterior probabilities of each null
hypothesis are estimated using a fast approximate algorithm. The new method is
applied to expression quantitative trait loci (eQTL) data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS158 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification
<p>Abstract</p> <p>Background</p> <p>Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets.</p> <p>Results</p> <p>Use of a model-based clustering algorithm accurately classified more than 400 microarray samples into 19 different tissue types on the basis of bimodal gene expression. Bimodal expression patterns were also highly effective in differentiating between infectious diseases in model-based clustering of microarray data. Supervised classification with feature selection restricted to switch-like genes also recognized tissue specific and infectious disease specific signatures in independent test datasets reserved for validation. Determination of "on" and "off" states of switch-like genes in various tissues and diseases allowed for the identification of activated/deactivated pathways. Activated switch-like genes in neural, skeletal muscle and cardiac muscle tissue tend to have tissue-specific roles. A majority of activated genes in infectious disease are involved in processes related to the immune response.</p> <p>Conclusion</p> <p>Switch-like bimodal gene sets capture genome-wide signatures from microarray data in health and infectious disease. A subset of bimodal genes coding for extracellular and membrane proteins are associated with tissue specificity, indicating a potential role for them as biomarkers provided that expression is altered in the onset of disease. Furthermore, we provide evidence that bimodal genes are involved in temporally and spatially active mechanisms including tissue-specific functions and response of the immune system to invading pathogens.</p
Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data
We present the extention and application of a new unsupervised statistical
learning technique--the Partition Decoupling Method--to gene expression data.
Because it has the ability to reveal non-linear and non-convex geometries
present in the data, the PDM is an improvement over typical gene expression
analysis algorithms, permitting a multi-gene analysis that can reveal
phenotypic differences even when the individual genes do not exhibit
differential expression. Here, we apply the PDM to publicly-available gene
expression data sets, and demonstrate that we are able to identify cell types
and treatments with higher accuracy than is obtained through other approaches.
By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may
be used to find sets of mechanistically-related genes that discriminate
phenotypes.Comment: Revise
SOM-VAE: Interpretable Discrete Representation Learning on Time Series
High-dimensional time series are common in many domains. Since human
cognition is not optimized to work well in high-dimensional spaces, these areas
could benefit from interpretable low-dimensional representations. However, most
representation learning algorithms for time series data are difficult to
interpret. This is due to non-intuitive mappings from data features to salient
properties of the representation and non-smoothness over time. To address this
problem, we propose a new representation learning framework building on ideas
from interpretable discrete dimensionality reduction and deep generative
modeling. This framework allows us to learn discrete representations of time
series, which give rise to smooth and interpretable embeddings with superior
clustering performance. We introduce a new way to overcome the
non-differentiability in discrete representation learning and present a
gradient-based version of the traditional self-organizing map algorithm that is
more performant than the original. Furthermore, to allow for a probabilistic
interpretation of our method, we integrate a Markov model in the representation
space. This model uncovers the temporal transition structure, improves
clustering performance even further and provides additional explanatory
insights as well as a natural representation of uncertainty. We evaluate our
model in terms of clustering performance and interpretability on static
(Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST
images, a chaotic Lorenz attractor system with two macro states, as well as on
a challenging real world medical time series application on the eICU data set.
Our learned representations compare favorably with competitor methods and
facilitate downstream tasks on the real world data.Comment: Accepted for publication at the Seventh International Conference on
Learning Representations (ICLR 2019
Machine Learning Modeling from Omics Data as Prospective Tool for Improvement of Inflammatory Bowel Disease Diagnosis and Clinical Classifications
Research of inflammatory bowel disease (IBD) has identified numerous molecular players involved in the disease development. Even so, the understanding of IBD is incomplete, while disease treatment is still far from the precision medicine. Reliable diagnostic and prognostic biomarkers in IBD are limited which may reduce efficient therapeutic outcomes. High-throughput technologies and artificial intelligence emerged as powerful tools in search of unrevealed molecular patterns that could give important insights into IBD pathogenesis and help to address unmet clinical needs. Machine learning, a subtype of artificial intelligence, uses complex mathematical algorithms to learn from existing data in order to predict future outcomes. The scientific community has been increasingly employing machine learning for the prediction of IBD outcomes from comprehensive patient data-clinical records, genomic, transcriptomic, proteomic, metagenomic, and other IBD relevant omics data. This review aims to present fundamental principles behind machine learning modeling and its current application in IBD research with the focus on studies that explored genomic and transcriptomic data. We described different strategies used for dealing with omics data and outlined the best-performing methods. Before being translated into clinical settings, the developed machine learning models should be tested in independent prospective studies as well as randomized controlled trials
Exploring Patterns of Epigenetic Information With Data Mining Techniques
[Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y TecnologĂa para el Desarrollo; 209RT-0366Galicia. ConsellerĂa de EconomĂa e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000
- …