Search CORE

14,074 research outputs found

Sparse integrative clustering of multiple omics data sets

Author: Mo Qianxing
Shen Ronglai
Wang Sijian
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/02/2012
Field of study

High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

PubMed Central

Collection Of Biostatistics Research Archive

Extending colonic mucosal microbiome analysis - Assessment of colonic lavage as a proxy for endoscopic colonic biopsies

Author: A Durban
A Jain
AC Ouwehand
AD Kostic
AD Kostic
B Willing
CL O’Brien
E Pruesse
EG Zoetendal
EH Simpson
F Backhed
F Chierico Del
G Li
GL Hold
HJ Flint
HL Cash
I Mukhopadhya
I Rangel
J Handelsman
J Jalanka
J Qin
JJ Kozich
JM Choo
JR Marchesi
L Chen
L Drago
L Harrell
M Morotomi
MG Langille
MH McLean
N Segata
NA Kennedy
P Lepage
P Louis
PB Eckburg
PD Schloss
PJ Turnbaugh
R Bibiloni
R Hansen
RE Ley
RL Warren
RM Shobar
S Delgado
SJ Salter
T Vatanen
Team RC
TZ DeSantis
V Mai
Y Momozawa
Y Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/11/2016
Field of study

This study was supported through GI Research funds and MRC Grant Ref: MR/M00533X/1 to GH.Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Springer - Publisher Connector

PubMed Central

UNSWorks

FigShare

Recommended from our members

Broad and thematic remodeling of the surfaceome and glycoproteome on isogenic cells transformed with driving proliferative oncogenes.

Author: Coon Joshua
Kirkemo Lisa
Leung Kevin
Riley Nicholas
Wells James
Wilson Gary
Publication venue: eScholarship, University of California
Publication date: 07/04/2020
Field of study

The cell surface proteome, the surfaceome, is the interface for engaging the extracellular space in normal and cancer cells. Here we apply quantitative proteomics of N-linked glycoproteins to reveal how a collection of some 700 surface proteins is dramatically remodeled in an isogenic breast epithelial cell line stably expressing any of six of the most prominent proliferative oncogenes, including the receptor tyrosine kinases, EGFR and HER2, and downstream signaling partners such as KRAS, BRAF, MEK, and AKT. We find that each oncogene has somewhat different surfaceomes, but the functions of these proteins are harmonized by common biological themes including up-regulation of nutrient transporters, down-regulation of adhesion molecules and tumor suppressing phosphatases, and alteration in immune modulators. Addition of a potent MEK inhibitor that blocks MAPK signaling brings each oncogene-induced surfaceome back to a common state reflecting the strong dependence of the oncogene on the MAPK pathway to propagate signaling. Cell surface protein capture is mediated by covalent tagging of surface glycans, yet current methods do not afford sequencing of intact glycopeptides. Thus, we complement the surfaceome data with whole cell glycoproteomics enabled by a recently developed technique called activated ion electron transfer dissociation (AI-ETD). We found massive oncogene-induced changes to the glycoproteome and differential increases in complex hybrid glycans, especially for KRAS and HER2 oncogenes. Overall, these studies provide a broad systems-level view of how specific driver oncogenes remodel the surfaceome and the glycoproteome in a cell autologous fashion, and suggest possible surface targets, and combinations thereof, for drug and biomarker discovery

eScholarship - University of California

A computational pipeline to identify phenotypic manifestations related to genes

Author: Ilhéu Ana Cristina Gonçalves
Publication venue
Publication date: 01/01/2022
Field of study

Tese de Mestrado, Bioinformática e Biologia computacional, 2022, Universidade de Lisboa, Faculdade de CiênciasUma proporção de pacientes com doenças de neuro desenvolvimento, tem uma mutação genética diretamente ligada à sua doença. A Perturbação do Espectro do Autismo (PEA) é uma patologia de neuro desenvolvimento com apresentação clínica muito heterogênea (Cummings et al., 2005). PEA é caracterizada por ter padrões de ações ou interesses repetitivos, dificuldades/limitações em interações sociais e comunicação que se manifestam desde a infância. Estes sintomas afetam mais homens que mulheres e podem variar em severidade. Talvez o maior avanço em perceber a fisiopatologia do PEA é ter sido reconhecido a contribuição genética para a etiologia do PEA com a ajuda do aparecimento de métodos NGS e WES (Daniel H. Geschwind, 2011; Asif et al., 2018). Há vários genes e mutações associados com o PEA o que aponta a uma origem heterogenia da doença. A combinação de uma arquitetura genética complexa e pouco compreendida, heterogeneidade fenotípica e o envolvimento de múltiplos loci que interagem entre si dificulta a descoberta dos genes com mutações específicas que levam ao PEA. Consequentemente, a etiologia genética dos distúrbios relacionados ao PEA permanece em grande parte desconhecida (Gupta et al., 2006). Vários estudos demonstraram que duplicações ou deleções de segmentos do genoma denominados de Variantes de Número de Cópias (CNVs), polimorfismos de nucleotídeo único (SNPs) e variantes de nucleotídeo único (SNVs) provavelmente têm um papel causal na PEA (Chang et al., 2014; Soler et al., 2018). O estabelecimento da relação entre os diferentes genes com as variantes do fenótipo do PEA pode facilitar o diagnóstico dos pacientes e, assim, possibilitar que os pacientes obtenham o tratamento mais eficiente e específico numa idade mais jovem. Devido aos recentes avanços nas tecnologias genómicas, os estudos genéticos em larga escala estão a revelar um grande número de variantes genéticas que potencialmente contribuem para o risco de doenças. O objetivo global deste trabalho foi propor um pipeline para identificar a manifestação fenotípica de variantes genéticas putativas causadoras de doenças. Para isso, foram estabelecidos dois objetivos específicos: • Identificação de clusters de genes funcionalmente semelhantes; • Inferir o fenótipo da doença para cada cluster separadamente. Para alcançar estes objetivos, neste estudo foi usado um dataset que contem 3707 genes de pacientes diagnosticados com PEA. A este dataset são aplicadas ferramenta como o DishIn e GoSemSim para calcular o valor da semelhança semântica em pares de genes, obtendo no fim uma matriz quadrada de semelhança semântica. Este valor é obtido pelas ferramentas ao quantificar a informação partilhada entre dois termos GO, associados a cada gene, como o conteúdo de informação do ancestral comum mais informativo de dois termos. As medidas para calcular a semelhança semântica do conteúdo de informação usadas neste trabalho são Lin, Jiang & Conrath e Rel. Através da matriz de semelhança semântica é calculada a matriz de distâncias à qual são aplicados os algoritmos de clustering DBSCAN, Kmeans e hierárquico, de modo a obter grupos de genes que sejam funcionalmente semelhantes. Após a análise dos resultados, foi possível concluir que variantes genéticas podem ser agrupados usando cálculos de semelhança semântica. Demonstrou-se que os genes que foram agrupados são funcionalmente semelhantes, estavam inseridos em redes de interação genética e podem levar a diferentes grupos de fenótipos de PEA. Os genes agrupados foram enriquecidos para diferentes pathways e sub fenótipos relacionados ao PEA.In most neurodevelopmental diseases, a proportion of patients carries a known gene mutation directly linked to their illness. Autism Spectrum Disorder (ASD) is a neurodevelopmental pathology with very heterogeneous clinical presentation (Cummings et al., 2005). ASD is characterized by symptoms of repetitive patterns of actions or interests, difficulties/limitations in social interactions and communication that appear since childhood. These symptoms affect more men than women and can vary in severity. Perhaps the greatest advance in understanding the pathophysiology of ASD is the recognition of the genetic contribution to the etiology of ASD with the help of the emergence of NGS and WES methods (Daniel H. Geschwind, 2011; Asif et al., 2018). There are several genes and mutations associated with ASD which point to a heterogeneous origin of the disease. A combination of a complex and poorly understood genetic architecture, phenotypic heterogeneity and the involvement of multiple loci interacting with one another hinder efforts to discover the genes with specific mutations that lead to ASD. Consequently, the genetic etiology of disorders related to ASD remains largely unknown (Gupta et al., 2006). Several studies demonstrated that duplications or deletions of genome segments called Copy Number Variants (CNVs), single nucleotide polymorphisms (SNPs) and single nucleotide variants (SNVs) are likely to have a causal role in ASD (Chang et al., 2014; Soler et al., 2018). The establishment of the relationship between different genes to the ASD phenotype variants may facilitate the diagnosis of patients and thus enable patients to obtain the correct treatment at a younger age. Due to recent advances in genomic technologies, the large-scale genetic studies are unraveling large numbers of genetic variants potentially contributing to disease risk. The global objective of this work was to propose a pipeline to identify the phenotypic manifestation of putative disease-causing genetic variants. For this purpose, two specific objectives were pursued: • Identification of clusters of functionally similar genes; • Inferring the disease phenotype for each cluster separately. To achieve these goals in this study, a dataset containing 3707 genes from patients diagnosed with ASD was used. Tools such as DishIn and GoSemSim are applied to this dataset to calculate the value of semantic similarity in pairs of genes, obtaining in the end a square matrix of semantic similarity. This value is obtained by the tools by quantifying the pairwise GO term semantic similarity through the amount of information shared between two terms, such as the information content of the most informative common ancestor of two terms. The measures to calculate the similarity of information content used in this work are Lin, Jiang & Conrath and Rel. Through the matrix of the semantic similarity matrix, the distance matrix is calculated to which the DBSCAN, Kmeans and hierarchical clustering algorithms are applied, to obtain functionally similar clusters of genes. After analyzing the results, it was possible to conclude that genes that were disrupted by genetic variants in patients can be clustered using semantic similarity measures. Clustered genes were functionally similar, also indicated by gene interaction networks and can lead to different ASD sub-phenotype. Genes clusters were enriched for different pathways and phenotype that were related to ASD subtypes

Universidade de Lisboa: Repositório.UL

Identification of transcriptional regulatory networks specific to pilocytic astrocytoma.

Author: Deshmukh Hrishikesh
Gutmann David H
MacDonald Tobey J
Nagarajan Rakesh
Payton Jacqueline E
Perry Arie
Shaik Jahangheer
Watson Mark A
Yu Jinsheng
Publication venue: eScholarship, University of California
Publication date: 01/01/2011
Field of study

BackgroundPilocytic Astrocytomas (PAs) are common low-grade central nervous system malignancies for which few recurrent and specific genetic alterations have been identified. In an effort to better understand the molecular biology underlying the pathogenesis of these pediatric brain tumors, we performed higher-order transcriptional network analysis of a large gene expression dataset to identify gene regulatory pathways that are specific to this tumor type, relative to other, more aggressive glial or histologically distinct brain tumours.MethodsRNA derived from frozen human PA tumours was subjected to microarray-based gene expression profiling, using Affymetrix U133Plus2 GeneChip microarrays. This data set was compared to similar data sets previously generated from non-malignant human brain tissue and other brain tumour types, after appropriate normalization.ResultsIn this study, we examined gene expression in 66 PA tumors compared to 15 non-malignant cortical brain tissues, and identified 792 genes that demonstrated consistent differential expression between independent sets of PA and non-malignant specimens. From this entire 792 gene set, we used the previously described PAP tool to assemble a core transcriptional regulatory network composed of 6 transcription factor genes (TFs) and 24 target genes, for a total of 55 interactions. A similar analysis of oligodendroglioma and glioblastoma multiforme (GBM) gene expression data sets identified distinct, but overlapping, networks. Most importantly, comparison of each of the brain tumor type-specific networks revealed a network unique to PA that included repressed expression of ONECUT2, a gene frequently methylated in other tumor types, and 13 other uniquely predicted TF-gene interactions.ConclusionsThese results suggest specific transcriptional pathways that may operate to create the unique molecular phenotype of PA and thus opportunities for corresponding targeted therapeutic intervention. Moreover, this study also demonstrates how integration of gene expression data with TF-gene and TF-TF interaction data is a powerful approach to generating testable hypotheses to better understand cell-type specific genetic programs relevant to cancer

Digital Commons@Becker

PubMed Central

eScholarship - University of California

Centronuclear myopathy in labrador retrievers: a recent founder mutation in the PTPLA gene has rapidly disseminated worldwide

Author: Christophe Hitte
G. Diane Shelton
Geneviève Aubin-Houzelstein
Inès Barthélémy
Jacques Penderis
Jean-Jacques Panthier
Jean-Laurent Thibaud
Jérôme Mary
Laurent Guillaud
Laurent Tiret
Manuel Pelé
Marie Maurer
Marilyn Fender
Natasha Olby
Reiner Albert Veitia
Stéphane Blot
Thomas Bilzer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/11/2011
Field of study

Centronuclear myopathies (CNM) are inherited congenital disorders characterized by an excessive number of internalized nuclei. In humans, CNM results from ~70 mutations in three major genes from the myotubularin, dynamin and amphiphysin families. Analysis of animal models with altered expression of these genes revealed common defects in all forms of CNM, paving the way for unified pathogenic and therapeutic mechanisms. Despite these efforts, some CNM cases remain genetically unresolved. We previously identified an autosomal recessive form of CNM in French Labrador retrievers from an experimental pedigree, and showed that a loss-of-function mutation in the protein tyrosine phosphatase-like A (PTPLA) gene segregated with CNM. Around the world, client-owned Labrador retrievers with a similar clinical presentation and histopathological changes in muscle biopsies have been described. We hypothesized that these Labradors share the same PTPLAcnm mutation. Genotyping of an international panel of 7,426 Labradors led to the identification of PTPLAcnm carriers in 13 countries. Haplotype analysis demonstrated that the PTPLAcnm allele resulted from a single and recent mutational event that may have rapidly disseminated through the extensive use of popular sires. PTPLA-deficient Labradors will help define the integrated role of PTPLA in the existing CNM gene network. They will be valuable complementary large animal models to test innovative therapies in CNM

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Towards knowledge-based gene expression data mining

Author: Bellazzi Riccado
Zupan Blaz
Publication venue
Publication date: 01/01/2007
Field of study

The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing microarray analysis with data and knowledge from diverse available sources. In this review, we report on the plethora of gene expression data mining techniques and focus on their evolution toward knowledge-based data analysis approaches. In particular, we discuss recent developments in gene expression-based analysis methods used in association and classification studies, phenotyping and reverse engineering of gene networks

Elsevier - Publisher Connector

ePrints.FRI

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California