Search CORE

145,399 research outputs found

Recommended from our members

Common DNA sequence variation influences 3-dimensional conformation of the human genome.

Author: Chiou Joshua
Fletez-Brant Kipper
Gaulton Kyle J
Gorkin David U
Hansen Kasper D
Hu Ming
Li Yun
Liu Tristin
Noor Amina
Qiu Yunjiang
Ren Bing
Schmitt Anthony D
Sebat Jonathan
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

BACKGROUND:The 3-dimensional (3D) conformation of chromatin inside the nucleus is integral to a variety of nuclear processes including transcriptional regulation, DNA replication, and DNA damage repair. Aberrations in 3D chromatin conformation have been implicated in developmental abnormalities and cancer. Despite the importance of 3D chromatin conformation to cellular function and human health, little is known about how 3D chromatin conformation varies in the human population, or whether DNA sequence variation between individuals influences 3D chromatin conformation. RESULTS:To address these questions, we perform Hi-C on lymphoblastoid cell lines from 20 individuals. We identify thousands of regions across the genome where 3D chromatin conformation varies between individuals and find that this variation is often accompanied by variation in gene expression, histone modifications, and transcription factor binding. Moreover, we find that DNA sequence variation influences several features of 3D chromatin conformation including loop strength, contact insulation, contact directionality, and density of local cis contacts. We map hundreds of quantitative trait loci associated with 3D chromatin features and find evidence that some of these same variants are associated at modest levels with other molecular phenotypes as well as complex disease risk. CONCLUSION:Our results demonstrate that common DNA sequence variants can influence 3D chromatin conformation, pointing to a more pervasive role for 3D chromatin conformation in human phenotypic variation than previously recognized

eScholarship - University of California

Knowledge Base Population using Semantic Label Propagation

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Sterckx Lucas
Publication venue
Publication date: 01/01/2016
Field of study

A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge Bases for Natural Language Processin

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Deep-coverage whole genome sequences and blood lipids among 16,324 individuals.

Author: Abecasis Goncalo
Alver Maris
Bloom Jonathan M
Chaffin Mark
Correa Adolfo
Cupples L Adrienne
Engreitz Jesse M
Ernst Jason
Esko Tonu
Ganna Andrea
Johnson W Craig
Kathiresan Sekar
Kellis Manolis
Khera Amit V
Lander Eric S
Manichaikul Ani
Mitchell Braxton
Montasser May
Natarajan Pradeep
Neale Benjamin M
NHLBI TOPMed Lipids Working Group
O'Connell Jeffrey R
Peloso Gina M
Perry James A
Poterba Timothy
Rich Stephen S
Ripatti Samuli
Rotter Jerome I
Ruotsalainen Sanni E
Salomaa Veikko
Seed Cotton
Surakka Ida L
Vasan Ramachandran S
Willer Cristen J
Wilson James G
Zekavat Seyedeh Maryam
Zhou Wei
Publication venue: eScholarship, University of California
Publication date: 01/08/2018
Field of study

Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits-plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia

DSpace@MIT

Directory of Open Access Journals

eScholarship - University of California

George Washington University: Health Sciences Research Commons (HSRC)

Topic Similarity Networks: Visual Analytics for Large Document Sets

Author: Maiya Arun S.
Rolfe Robert M.
Publication venue
Publication date: 26/09/2014
Field of study

We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014

arXiv.org e-Print Archive

Crossref

Recommended from our members

An atlas of cortical circular RNA expression in Alzheimer disease brains demonstrates clinical and pathological associations.

Author: Bateman Randall J
Budde John P
Chhatwal Jasmeer P
Cruchaga Carlos
Del-Aguila Jorge L
Dominantly Inherited Alzheimer Network (DIAN)
Dube Umber
Farias Fabiana
Fernandez Maria Victoria
Gentsch Jen
Graff-Radford Neill R
Harari Oscar
Hsu Simon
Ibanez Laura
Jiang Shan
Karch Celeste M
Lee Jae-Hong
Li Zeran
Masters Colin L
Morris John C
Norton Joanne
Salloway Stephen
Wang Fengxian
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Parietal cortex RNA-sequencing (RNA-seq) data were generated from individuals with and without Alzheimer disease (AD; ncontrol = 13; nAD = 83) from the Knight Alzheimer Disease Research Center (Knight ADRC). Using this and an independent (Mount Sinai Brain Bank (MSBB)) AD RNA-seq dataset, cortical circular RNA (circRNA) expression was quantified in the context of AD. Significant associations were identified between circRNA expression and AD diagnosis, clinical dementia severity and neuropathological severity. It was demonstrated that most circRNA-AD associations are independent of changes in cognate linear messenger RNA expression or estimated brain cell-type proportions. Evidence was provided for circRNA expression changes occurring early in presymptomatic AD and in autosomal dominant AD. It was also observed that AD-associated circRNAs co-expressed with known AD genes. Finally, potential microRNA-binding sites were identified in AD-associated circRNAs for miRNAs predicted to target AD genes. Together, these results highlight the importance of analyzing non-linear RNAs and support future studies exploring the potential roles of circRNAs in AD pathogenesis

eScholarship - University of California

Replication of linkage at chromosome 20p13 and identification of suggestive sex-differential risk loci for autism spectrum disorder.

Author: Cantor Rita M
Geschwind Daniel H
Lowe Jennifer K
Luo Rui
Werling Donna M
Werling Donna M
Publication venue: eScholarship, University of California
Publication date: 01/02/2014
Field of study

BackgroundAutism spectrum disorders (ASDs) are male-biased and genetically heterogeneous. While sequencing of sporadic cases has identified de novo risk variants, the heritable genetic contribution and mechanisms driving the male bias are less understood. Here, we aimed to identify familial and sex-differential risk loci in the largest available, uniformly ascertained, densely genotyped sample of multiplex ASD families from the Autism Genetics Resource Exchange (AGRE), and to compare results with earlier findings from AGRE.MethodsFrom a total sample of 1,008 multiplex families, we performed genome-wide, non-parametric linkage analysis in a discovery sample of 847 families, and separately on subsets of families with only male, affected children (male-only, MO) or with at least one female, affected child (female-containing, FC). Loci showing evidence for suggestive linkage (logarithm of odds ≥2.2) in this discovery sample, or in previous AGRE samples, were re-evaluated in an extension study utilizing all 1,008 available families. For regions with genome-wide significant linkage signal in the discovery stage, those families not included in the corresponding discovery sample were then evaluated for independent replication of linkage. Association testing of common single nucleotide polymorphisms (SNPs) was also performed within suggestive linkage regions.ResultsWe observed an independent replication of previously observed linkage at chromosome 20p13 (P < 0.01), while loci at 6q27 and 8q13.2 showed suggestive linkage in our extended sample. Suggestive sex-differential linkage was observed at 1p31.3 (MO), 8p21.2 (FC), and 8p12 (FC) in our discovery sample, and the MO signal at 1p31.3 was supported in our expanded sample. No sex-differential signals met replication criteria, and no common SNPs were significantly associated with ASD within any identified linkage regions.ConclusionsWith few exceptions, analyses of subsets of families from the AGRE cohort identify different risk loci, consistent with extreme locus heterogeneity in ASD. Large samples appear to yield more consistent results, and sex-stratified analyses facilitate the identification of sex-differential risk loci, suggesting that linkage analyses in large cohorts are useful for identifying heritable risk loci. Additional work, such as targeted re-sequencing, is needed to identify the specific variants within these loci that are responsible for increasing ASD risk

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Expression cartography of human tissues using self organizing maps

Author: Hans Binder
Henry Wirth
Publication venue
Publication date: 21/03/2011
Field of study

Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.&#xa

Nature Precedings

Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders

Author: A Okbay
A Ramasamy
AE Poropat
Anders M Dale
Andrew Schork
B Bulik-Sullivan
B Devlin
B Franke
B Howie
BK Bulik-Sullivan
BM Henn
C Giambartolomei
CA Rietveld
Carol Franz
CG DeYoung
Chi-Hua Chen
Chun-Chieh Fan
CJ Soto
CJ Willer
D Falush
D Trabzuni
Daniel J Smith
David A Hinds
DF Gudbjartsson
DJ Smith
Dominic Holland
DP Hibar
DR Nyholt
E Green
G Bjornsdottir
Gyda Bjornsdottir
HC So
HN Kim
Hreinn Stefansson
J Van Os
J Yang
JA Gray
JH Barnett
JK Pickrell
JM Hettema
Joyce Y Tung
JR Gulcher
K Åberg
Kari Stefansson
Karolina Kauppi
KS Kendler
L Mezquita
Linda K McEvoy
LR Goldberg
MA Distel
MH de Moor
MH de Moor
Michael O'Donovan
Min-Tzu Lo
Nilotpal Sanyal
Olav B Smeland
Ole A Andreassen
P Bůžková
R Plomin
R Tabarés-Seisdedos
RA Power
RA Power
RJ Pruim
S Jakobwitz
S Purcell
S Ripke
SBG Eysenck
SL Karalunas
SM van den Berg
SM van den Berg
SR Browning
T Insel
T Vukasović
TA Greenwood
Thorgeir E Thorgeirsson
TJ Trull
V Boraska
Valentina Escott-Price
WK Thompson
Y Hu
Y Ono
Y Wang
Yunpeng Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2016
Field of study

Personality is influenced by genetic and environmental factors1 and associated with mental health. However, the underlying genetic determinants are largely unknown. We identified six genetic loci, including five novel loci2,3, significantly associated with personality traits in a meta-analysis of genome-wide association studies (N = 123,132–260,861). Of these genomewide significant loci, extraversion was associated with variants in WSCD2 and near PCDH15, and neuroticism with variants on chromosome 8p23.1 and in L3MBTL2. We performed a principal component analysis to extract major dimensions underlying genetic variations among five personality traits and six psychiatric disorders (N = 5,422–18,759). The first genetic dimension separated personality traits and psychiatric disorders, except that neuroticism and openness to experience were clustered with the disorders. High genetic correlations were found between extraversion and attention-deficit– hyperactivity disorder (ADHD) and between openness and schizophrenia and bipolar disorder. The second genetic dimension was closely aligned with extraversion–introversion and grouped neuroticism with internalizing psychopathology (e.g., depression or anxiety)

Crossref

Online Research @ Cardiff

PubMed Central

eScholarship - University of California

Enlighten