Search CORE

237 research outputs found

Gene communities in co-expression networks across different tissues

Author: Aqil Alber
Gokcumen Omer
Masuda Naoki
Russell Madison
Saitou Marie
Publication venue
Publication date: 07/12/2023
Field of study

With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest that KRTAP3-1, KRTAP3-3, and KRTAP3-5 share regulatory elements in skin and pancreas. Furthermore, we find that CELA3A and CELA3B share associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes

arXiv.org e-Print Archive

Model-based approaches for the detection of biologically active genomic regions from next generation sequencing data

Author: Rashid Naim Ur
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2013
Field of study

Next Generation Sequencing (NGS) technologies are quickly gaining popularity in biomedical research. A popular application of NGS is to detect potential gene regulatory elements that are captured or enriched by certain experimental procedures, for example, Chromatin Immunoprecipitation (ChIP-seq), DNase hypersensitive site mapping (DNase-seq), and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE-seq), among others. While ChIP-seq can be use to identify protein-DNA interaction sites, both DNase-seq and FAIRE-seq can be used to identify open chromatin regions, which are more likely to contain elements involved in gene expression regulation. We collectively refer to these types of sequencing data as DAE-seq, where DAE stands for DNA After Enrichment. DAE-seq data can provide important insight into gene regulation, which is crucial to understanding the molecular mechanism of phenotypic outcomes, such as complex diseases. Here we address several practical issues facing biomedical researchers in the analysis of DAE-seq data through the development of several new and relevant statistical methods. We first introduce a three-component mixture regression model to discover ``enriched regions, i.e., the genomic regions with more DAE-seq signal than expected in relation to background regions. We demonstrate its practical utility and accuracy in detecting regions of active regulatory elements across a wide range of commonly used DAE-seq datasets and experimental conditions. We then develop a novel Autoregressive Hidden Markov Model (AR-HMM) to account for often-ignored spatial dependence in DAE-seq data, and demonstrate that accounting for such dependence leads to increased performance in identifying biologically active genomic regions in both simulated and real datasets. We also introduce an efficient and novel variable selection procedure in the context of Hidden Markov Models when the means of the emission distributions of each state are modelled with covariates. We study the asymptotic properties of the proposed variable selection procedure and apply this approach to simulated and real DAE-seq data. Lastly, we introduce a new method for the joint analysis of total and allele-specific read counts from DAE-seq data and RNA-seq data. In all, we develop several statistical procedures for the analysis of DAE-seq data that are highly relevant to biomedical researchers and have broader applicability to other problems in statistics.Doctor of Philosoph

Carolina Digital Repository

Analysis of High-dimensional and Left-censored Data with Applications in Lipidomics and Genomics

Author: Pesonen Maiju
Publication venue: Annales Universitatis Turkuensis A I 548
Publication date: 24/11/2016
Field of study

Recently, there has been an occurrence of new kinds of high- throughput measurement techniques enabling biological research to focus on fundamental building blocks of living organisms such as genes, proteins, and lipids. In sync with the new type of data that is referred to as the omics data, modern data analysis techniques have emerged. Much of such research is focusing on finding biomarkers for detection of abnormalities in the health status of a person as well as on learning unobservable network structures representing functional associations of biological regulatory systems. The omics data have certain specific qualities such as left-censored observations due to the limitations of the measurement instruments, missing data, non-normal observations and very large dimensionality, and the interest often lies in the connections between the large number of variables. There are two major aims in this thesis. First is to provide efficient methodology for dealing with various types of missing or censored omics data that can be used for visualisation and biomarker discovery based on, for example, regularised regression techniques. Maximum likelihood based covariance estimation method for data with censored values is developed and the algorithms are described in detail. Second major aim is to develop novel approaches for detecting interactions displaying functional associations from large-scale observations. For more complicated data connections, a technique based on partial least squares regression is investigated. The technique is applied for network construction as well as for differential network analyses both on multiple imputed censored data and next- generation sequencing count data.Uudet mittausteknologiat ovat mahdollistaneet kokonaisvaltaisen ymmärryksen lisäämisen elollisten organismien molekyylitason prosesseista. Niin kutsutut omiikka-teknologiat, kuten genomiikka, proteomiikka ja lipidomiikka, kykenevät tuottamaan valtavia määriä mittausdataa yksittäisten geenien, proteiinien ja lipidien ekspressio- tai konsentraatiotasoista ennennäkemättömällä tarkkuudella. Samanaikaisesti tarve uusien analyysimenetelmien kehittämiselle on kasvanut. Kiinnostuksen kohteena ovat olleet erityisesti tiettyjen sairauksien riskiä tai prognoosia ennustavien merkkiaineiden tunnistaminen sekä biologisten verkkojen rekonstruointi. Omiikka-aineistoilla on useita erityisominaisuuksia, jotka rajoittavat tavanomaisten menetelmien suoraa ja tehokasta soveltamista. Näistä tärkeimpiä ovat vasemmalta sensuroidut ja puuttuvat havainnot, sekä havaittujen muuttujien suuri lukumäärä. Tämän väitöskirjan ensimmäisenä tavoitteena on tarjota räätälöityjä analyysimenetelmiä epätäydellisten omiikka-aineistojen visualisointiin ja mallin valintaan käyttäen esimerkiksi regularisoituja regressiomalleja. Kuvailemme myös sensuroidulle aineistolle sopivan suurimman uskottavuuden estimaattorin kovarianssimatriisille. Toisena tavoitteena on kehittää uusia menetelmiä omiikka-aineistojen assosiaatiorakenteiden tarkasteluun. Monimutkaisempien rakenteiden tarkasteluun, visualisoimiseen ja vertailuun esitetään erilaisia variaatioita osittaisen pienimmän neliösumman menetelmään pohjautuvasta algoritmista, jonka avulla voidaan rekonstruoida assosiaatioverkkoja sekä multi-imputoidulle sensuroidulle että lukumääräaineistoille.Siirretty Doriast

UTUPub

Recommended from our members

The interplay of global chromosomal organisation, promoter-enhancer interactions and transcription

Author: Thiecke Michiel
Publication venue: University of Cambridge
Publication date: 30/09/2019
Field of study

All somatic cells within an organism contain the same genetic material, yet they display pronounced differences in function and morphology. Precise control of gene expression is of fundamental importance to allow cells to properly develop, maintain homeostasis, and respond to external stimuli. The first step in gene expression is transcription, which starts at the core promoter region. While core promoters are crucial for transcriptional initiation, they are insufficient for establishing complex tissue- and condition-specific gene expression patterns in multicellular organisms. Additional transcriptional control elements, such as gene enhancers, are required for this, with many such elements localising considerable distances away from their target promoters. Enhancers commonly convey their regulatory signals to target promoters by forming physical contacts with them through three-dimensional DNA looping, underpinning the importance of chromosomal organisation in transcriptional control. In recent years, the emergence of chromosome conformation capture and related methodologies has dramatically increased our understanding of chromosomal organisation. In particular, high-throughput Hi-C analyses across cell types have led to the identification of spatial genomic structures, including Topologically Associating Domains (TADs). In parallel, high-resolution versions of these technologies (such as 5C, CHiA-PET, HiChIP and Capture Hi-C) have detected multitudes of novel looping interactions, including connections between promoters and enhancers. The interplay between precise regulatory interactions, the higher-order chromosomal organisation, and their joint contribution to transcriptional control is incompletely understood and is the focus of this work. In the first part of this work, I take advantage of high-resolution Promoter Capture Hi-C (PCHi-C) data to investigate the localisation of promoter interactions with respect to TAD boundaries in human primary blood cells and cell-cycle synchronised HeLa cells. I show that the majority of promoter interactions originate at, and are constrained by TAD boundaries. However, a minority of promoter interactions appear to cross TAD boundaries in all analysed cell types. Furthermore, I identify genes with multiple TAD-boundary crossing interactions per promoter and present evidence that these interactions may be supported by transcriptional machinery. These results suggest a role for transcriptional machinery in shaping promoter interactions in a TAD independent manner. In the second part of this work, I investigate promoter interaction rewiring upon perturbations of architectural proteins. For this analysis, I use PCHi-C data from HeLa cells, in which cohesin or CTCF are rapidly depleted using Auxin-induced degradation. I show that promoter interactions that are lost, maintained, or gained upon cohesin depletion possess distinct distance profiles and relate to TAD organisation in markedly different ways. I demonstrate that promoter-interacting regions that are lost upon cohesin depletion associate with architectural proteins, while those that are maintained or gained show characteristics of enhancers. Finally, I show evidence for a functional role of cohesin-mediated interactions in transcriptional regulation. Collectively, this work reveals the interplay between TADs, promoter interactions and transcription, while suggesting that promoter interactions may be supported by TAD independent mechanisms.MRC DTP studentship BBS/E/B/000M0816 and164259

Apollo (Cambridge)

Investigating the functional relevance of Fibroblast-like Synoviocytes in Rheumatoid Arthritis using integrated epigenomic datasets and fine-mapping

Author: Ge Xiangyu
Publication venue
Publication date: 31/12/2022
Field of study

The University of Manchester - Institutional Repository

IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity

Author: Chen Ting-Huei
Chu Haitao
Crowley James J.
de Villena Fernando Pardo-Manuel
Huang Shunping
Kuan Pei-Fen
Li Yuan
Liu Yufeng
McMillan Leonard
Miller Darla
Shaw Ginger
Sullivan Patrick F.
Sun Wei
Wu Yichao
Zhabotynsky Vasyl
Zhou Hua
Zou Fei
Publication venue
Publication date: 29/10/2014
Field of study

We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, and to test DIE/DIU for one case versus one control. The latter task is not an uncommon situation in practice, e.g., comparing paternal and maternal allele of one individual or comparing tumor and normal sample of one cancer patient. Simulation studies demonstrate the high sensitivity and specificity of IsoDOT. We apply IsoDOT to study the effects of haloperidol treatment on mouse transcriptome and identify a group of genes whose isoform usages respond to haloperidol treatment

arXiv.org e-Print Archive

Crossref

PubMed Central

Carolina Digital Repository

eScholarship - University of California

FigShare

FCAT: A FLEXIBLE CLASSIFICATION TOOLBOX FOR SIGNAL DETECTION IN HIGH-THROUGHPUT SEQUENCING DATA

Author: He Bing
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 03/10/2018
Field of study

As applications of high-throughput sequencing technologies continue to grow at a fast rate, being able to conveniently develop effective data analysis solutions that can take full advantage of application-specific data characteristics is becoming increasingly important. FCAT is a flexible classification framework and toolbox for signal detection in a wide class of high-throughput sequencing applications where the objective is to locate signals in the genome based on their enrichment, shape and other features. FCAT takes aligned sequence reads (BAM files) as input. It uses supervised learning to automatically extract application-specific features that distinguish signals from noises. Users can aggregate multiple learning algorithms including random forests, L1- and L2-regularized logistic regression to improve prediction accuracy and robustness. A non-parametric inference method is developed for estimating false discovery rate of prediction results. We demonstrate FCAT through a variety of applications including analyses of DNase-seq, ATAC-seq, ChIP-seq, GRO-seq and TIP-seq data. We show that FCAT not only offers flexibility and convenience to handle data from different sequencing applications, but also yields competitive or improved signal detection accuracy compared to existing tools for each application. The FCAT framework can greatly increase the efficiency and reduce the burden for developing bioinformatics solutions to new sequencing applications. FCAT is an open source software package developed using C++ and Python. It is freely available at https://github.com/HeBing/FCAT

JScholarship

Revealing the vectors of cellular identity with single-cell genomics

Author: Regev Aviv
Wagner Allon
Yosef Nir
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/07/2018
Field of study

Single-cell genomics has now made it possible to create a comprehensive atlas of human cells. At the same time, it has reopened definitions of a cell's identity and of the ways in which identity is regulated by the cell's molecular circuitry. Emerging computational analysis methods, especially in single-cell RNA sequencing (scRNA-seq), have already begun to reveal, in a data-driven way, the diverse simultaneous facets of a cell's identity, from discrete cell types to continuous dynamic transitions and spatial locations. These developments will eventually allow a cell to be represented as a superposition of 'basis vectors', each determining a different (but possibly dependent) aspect of cellular organization and function. However, computational methods must also overcome considerable challenges-from handling technical noise and data scale to forming new abstractions of biology. As the scale of single-cell experiments continues to increase, new computational approaches will be essential for constructing and characterizing a reference map of cell identities.National Institutes of Health (U.S.) (grant P50 HG006193)BRAIN Initiative (grant U01 MH105979)National Institutes of Health (U.S.) (BRAIN grant 1U01MH105960-01)National Cancer Institute (U.S.) (grant 1U24CA180922)National Institute of Allergy and Infectious Diseases (U.S.) (grant 1U24AI118672-01

DSpace@MIT

Inferring community-driven structure in complex networks

Author: Signorelli Mirko
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2017
Field of study

ARTS repository - University of Groningen