Search CORE

56 research outputs found

Identification of trans-splicing sites in Leishmania major using probabilistic methods

Author: Habegger Lukas
Publication venue: RIT Scholar Works
Publication date: 01/01/2007
Field of study

Leishmania major, a member of the Kinetoplastida family, is a primitive protozoan that causes a human disease, called leishmaniases, affecting numerous people worldwide. The identification of new drug targets to combat leishmaniases necessitates a thorough understanding of how genomic instructions are transformed into functional proteins. It requires not only the prediction and categorization of all the genes, but also a profound understanding of their regulation. Much of gene regulation may occur through a process known as frarcs-splicing. Trans-splicing, which is mechanistically similar to cissplicing, is the process of cleaving a large polycistronic transcript into smaller monocistronic components. The goal of this project was to establish a model to accurately predict sites where /rans-splicing occurs. After carefully analyzing the data set, a second-order log odds ratio model was created. This method achieved an overall accuracy of 89% in predicting transsplice sites. Furthermore, this new method has been applied to a small data set with alternative trans-splice sites. Of the 70 EST-indicated alternative frans-splice sites 60 were identified as such. This represents the first computational method for the prediction of alternative splice sites. In addition, we have found the first real evidence for the branch point signal which plays an essential role in the ^raws-splicing process

RIT Scholar Works

IQSeq: Integrated Isoform Quantification Analysis Based on Next-Generation Sequencing

Author: Du Jiang
Gerstein Mark
Habegger Lukas
Leng Jing
McDermott Drew
Sboner Andrea
Publication venue: Public Library of Science
Publication date: 06/01/2012
Field of study

With the recent advances in high-throughput RNA sequencing (RNA-Seq), biologists are able to measure transcription with unprecedented precision. One problem that can now be tackled is that of isoform quantification: here one tries to reconstruct the abundances of isoforms of a gene. We have developed a statistical solution for this problem, based on analyzing a set of RNA-Seq reads, and a practical implementation, available from archive.gersteinlab.org/proj/rnaseq/IQSeq, in a tool we call IQSeq (Isoform Quantification in next-generation Sequencing). Here, we present theoretical results which IQSeq is based on, and then use both simulated and real datasets to illustrate various applications of the tool. In order to measure the accuracy of an isoform-quantification result, one would try to estimate the average variance of the estimated isoform abundances for each gene (based on resampling the RNA-seq reads), and IQSeq has a particularly fast algorithm (based on the Fisher Information Matrix) for calculating this, achieving a speedup of times compared to brute-force resampling. IQSeq also calculates an information theoretic measure of overall transcriptome complexity to describe isoform abundance for a whole experiment. IQSeq has many features that are particularly useful in RNA-Seq experimental design, allowing one to optimally model the integration of different sequencing technologies in a cost-effective way. In particular, the IQSeq formalism integrates the analysis of different sample (i.e. read) sets generated from different technologies within the same statistical framework. It also supports a generalized statistical partial-sample-generation function to model the sequencing process. This allows one to have a modular, “plugin-able” read-generation function to support the particularities of the many evolving sequencing technologies

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

Defining the human reference protein-coding gene set

Author: Adam Frankish
Chris Tyler-Smith
Daniel MacArthur
Jennifer Harrow
Lukas Habegger
M Pertea
Mark Gerstein
Rachel Harte
Suganthi Balasubramanian
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays

Author: Agarwal Ashish
Gerstein Mark
Habegger Lukas
Hillier LaDeana W
Koppstein David
Reinke Valerie
Rozowsky Joel
Sasidharan Rajkumar
Sboner Andrea
Waterston Robert H
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs. Results Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of <it>C. elegans</it>. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center. Conclusions Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UNSWorks

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries

Author: Andrea Sboner
Ashish Agarwal
Greenbaum
Guttman
Hillier
Joel Rozowsky
Kampa
Li
Lowrance
Lukas Habegger
Mark Gerstein
Michael Snyder
Mortazavi
Royce
Tara A. Gianoulis
Trapnell
Trapnell
Wang
Publication venue: Oxford University Press
Publication date
Field of study

Summary: The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses

Crossref

PubMed Central

FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data

We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including eight cancers with and without known rearrangements

Crossref

Springer - Publisher Connector

PubMed Central

Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure

Author: Abecasis G. (Goncalo)
Almgren P. (Peter)
Andersson C. (Charlotte)
Aragam K.G. (Krishna G.)
Asselbergs F.W. (Folkert)
Backman J. (Joshua)
Backman J.D. (Joshua D.)
Bai X. (Xiaodong)
Balasubramanian S. (Suganthi)
Banerjee N. (Nilanjana)
Baras A. (Aris)
Barnard L. (Leland)
Beechert C. (Christina)
Biggs M.L. (Mary L.)
Bloom H.L. (Heather L.)
Blumenfeld A. (Andrew)
Brandimarto J. (Jeffrey)
Brown M.R. (Michael R.)
Buckbinder L. (Leonard)
Cantor M. (Michael)
Cappola T.P. (Thomas P.)
Carey D.J. (David J.)
Chaffin M.D. (Mark D.)
Chai Y. (Yating)
Chasman D.I. (Daniel I.)
Chen X. (Xing)
Chen X. (Xu)
Chung J. (Jonathan)
Chung J. (Jonathan)
Chutkow W. (William)
Cook J.P. (James P.)
Coppola G. (Giovanni)
Damask A. (Amy)
Dehghan A. (Abbas)
Delgado G.
Denaxas S. (Spiros)
Dewey F. (Frederick)
Doney A.S.F. (Alex)
Dudley S.C. (Samuel C.)
Dunn M.E. (Michael E.)
Dörr M. (Marcus)
Economides A. (Aris)
Ellinor P.T. (Patrick)
Engström G.
Eom G. (Gisu)
Esko T. (Tõnu)
Fatemifar G. (Ghazaleh)
Felix S.B. (Stephan B.)
Finan C. (Chris)
Ford I. (Ian)
Forsythe C. (Caitlin)
Fuller E.D. (Erin D.)
Ghanbari M. (Mohsen)
Ghasemi S. (Sahar)
Giedraitis V. (Vilmantas)
Giulianini F. (Franco)
Gottdiener J.S. (John)
Gross S. (Stefan)
Gu Z. (Zhenhua)
Gurski L. (Lauren)
Gutmann R. (Rebecca)
Guzzardo P.M. (Paloma M.)
Guðbjartsson D.F. (Daníel F.)
Habegger L. (Lukas)
Haggerty C.M. (Christopher M.)
Hahn Y. (Young)
Harst P. (Pim) van der
Hawes A. (Alicia)
Hedman A.K. (Asa)
Helgadottir H.T. (Hafdis)
Hemingway H.
Henry A. (Albert)
Hingorani A. (Aroon)
Holm H. (Hilma)
Holmes M.V. (Michael)
Hyde C.L. (Craig L.)
Ingelsson E. (Erik)
Jones M.B. (Marcus B.)
Jukema J.W. (Jan Wouter)
Kao W.H.L. (Wen)
Kavousi M. (Maryam)
Khalid S. (Shareef)
Khaw K.-T. (Kay-Tee)
Kleber M.E. (Marcus)
Koekemoer A. (Andrea)
Kuchenbaecker K.B. (Karoline)
Køber L. (Lars)
Lang C.C. (Chim C.)
Langenberg C. (Claudia)
Lattari M. (Michael)
Li A. (Alexander)
Lin H. (Honghuang)
Lin N. (Nan)
Lindgren C.M. (Cecilia M.)
Liu D. (Daren)
London B. (Barry)
Lopez A. (Alexander)
Lotta L.A. (Luca A.)
Lovering R.C. (Ruth C.)
Luan J.
Lubitz S.A. (Steven)
Lumbers R.T. (R. Thomas)
Magnusson P.K. (Patrik)
Mahajan A. (Anubha)
Manoochehri K. (Kia)
Marchini J. (Jonathan)
Marcketta A. (Anthony)
Margulies K.B. (Kenneth B.)
Maxwell E.K. (Evan K.)
McCarthy S. (Shane)
McMurray J.J.V. (John J. V.)
Melander O. (Olle)
Mitnaul L.J. (Lyndon)
Mordi I.R. (Ify R.)
Morgan T. (Thomas)
Morley M.P. (Michael P.)
Morris A.D. (Andrew D.)
Morris A.P. (Andrew)
Morrison A.C. (Alanna C.)
Mälarstig A. (Anders)
Nagle M.W. (Michael W.)
Nelson C.P. (Christopher P.)
Newton-Cheh C. (Christopher)
Niessner A. (Alexander)
Niiranen T. (Teemu)
Overton J.D. (John D.)
Owens A.T. (Anjali T.)
O’Donoghue M.L. (Michelle L.)
O’Dushlaine C. (Colm)
Padilla M.S. (Maria Sotiropoulos)
Palmer C.N.A. (Colin N. A.)
Parry H.M. (Helen M.)
Paulding C. (Charles)
Penn J. (John)
Perola M. (Markus)
Portilla-Fernandez E. (Eliana)
Pradhan M. (Manasi)
Psaty B.M. (Bruce M.)
Reid J.G. (Jeffrey G.)
Rice K.M. (Kenneth)
Ridker P.M. (Paul M.)
Romaine S.P.R. (Simon P. R.)
Roselli C. (Carolina)
Rotter J.I. (Jerome I.)
Salo P. (Perttu)
Salomaa V. (Veikko)
Samani N.J. (Nilesh J.)
Sattar N. (Naveed)
Schleicher T.D. (Thomas D.)
Schurmann C. (Claudia)
Setten J. (Jessica) van
Shah S. (Sonia)
Shalaby A.A. (Alaa A.)
Shuldiner A. (Alan)
Smelser D.T. (Diane T.)
Smith J.G. (J Gustav)
Smith N.L. (Nicholas L.)
Staples J.C. (Jeffrey C.)
Stefansson K. (Kari)
Stender S. (Steen)
Stott D.J. (David. J.)
Sun D. (Dylan)
Sveinbjörnsson G. (Garðar)
Svensson P. (Per)
Swerdlow D.I. (Daniel)
Tammesoo M.L.
Taylor K.D. (Kent D.)
Teder-Laving M. (Maris)
Teumer A. (Alexander)
Thorgeirsson G. (Guðmundur)
Thorsteinsdottir U. (Unnur)
Toledo K. (Karina)
Torp-Pedersen C. (Christian Tobias)
Trompet S. (Stella)
Tyl B. (Benoit)
Uitterlinden A.G. (Andre G.)
Ulloa R.H. (Ricardo H.)
van Hout C. (Cristopher)
Vasan R.S. (Ramachandran Srini)
Veluchamy A. (Abirami)
Verweij N. (Niek)
Visscher P.M. (Peter M.)
Voors A.A. (Adriaan A.)
Völker U. (Uwe)
Wang X. (Xiaosong)
Wareham N.J. (Nick)
Waterworth D. (Dawn)
Weeke P.E. (Peter E.)
Weiss R. (Ram)
Widom L. (Louis)
Wiggins K.L. (Kerri L.)
Wilk J.B. (Jemma)
Wolf S.E. (Sarah E.)
Xing H. (Heming)
Yadav A. (Ashish)
Yang J. (Jian)
Ye B. (Bin)
Ye S. (Shu)
Yerges-Armstrong L.M. (Laura)
Yu B. (Bing)
Zannad F. (Faiez)
Zhao J.H. (Jing Hua)
Ärnlöv J. (Johan)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/01/2020
Field of study

Heart failure (HF) is a leading cause of morbidity and mortality worldwide. A small proportion of HF cases are attributable to monogenic cardiomyopathies and existing genome-wide association studies (GWAS) have yielded only limited insights, leaving the observed heritability of HF largely unexplained. We report results from a GWAS meta-analysis of HF comprising 47,309 cases and 930,014 controls. Twelve independent variants at 11 genomic loci are associated with HF, all of which demonstrate one or more associations with coronary artery disease (CAD), atrial fibrillation, or reduced left ventricular function, suggesting shared genetic aetiology. Functional analysis of non-CAD-associated loci implicate genes involved in cardiac development (MYOZ1, SYNPO2L), protein homoeostasis (BAG3), and cellular senescence (CDKN1A). Mendelian randomisation analysis supports causal roles for several HF risk factors, and demonstrates CAD-independent effects for atrial fibrillation, body mass index, and hypertension. These findings extend our knowledge of the pathways underlying HF and may inform new therapeutic strategies

Erasmus University Digital Repository

Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure

Author: Abecasis Goncalo
Almgren Peter
Andersson Charlotte
Aragam Krishna G
Arnlov Johan
Asselbergs Folkert W
Backman Joshua
Backman Joshua D
Bai Xiaodong
Balasubramanian Suganthi
Banerjee Nilanjana
Baras Aris
Barnard Leland
Beechert Christina
Biggs Mary L
Bloom Heather L
Blumenfeld Andrew
Brandimarto Jeffrey
Brown Michael R
Buckbinder Leonard
Cantor Michael
Cappola Thomas P
Carey David J
Chaffin Mark D
Chai Yating
Chasman Daniel I
Chen Xing
Chen Xu
Chung Jonathan
Chung Jonathan
Chutkow William
Cook James P
Coppola Giovanni
Ctr Regeneron Genetics
Damask Amy
Dehghan Abbas
Delgado Graciela E
Denaxas Spiros
Dewey Frederick
Doerr Marcus
Doney Alexander S
Dudley Samuel C
Dunn Michael E
Economides Aris
Ellinor Patrick T
Engstrom Gunnar
Eom Gisu
Esko Tonu
Fatemifar Ghazaleh
Felix Stephan B
Finan Chris
Ford Ian
Forsythe Caitlin
Fuller Erin D
Ghanbari Mohsen
Ghasemi Sahar
Giedraitis Vilmantas
Giulianini Franco
Gottdiener John S
Gross Stefan
Gu Zhenhua
Gudbjartsson Daniel F
Gurski Lauren
Gutmann Rebecca
Guzzardo Paloma M
Habegger Lukas
Haggerty Christopher M
Hahn Young
Hawes Alicia
Hedman Asa K
Helgadottir Anna
Hemingway Harry
Henry Albert
Hingorani Aroon D
Holm Hilma
Holmes Michael V
Hyde Craig L
Ingelsson Erik
Jones Marcus B
Jukema J Wouter
Kavousi Maryam
Khalid Shareef
Khaw Kay-Tee
Kleber Marcus E
Kober Lars
Koekemoer Andrea
Kuchenbaecker Karoline
Lang Chim C
Langenberg Claudia
Lattari Michael
Li Alexander
Lin Honghuang
Lin Nan
Lind Lars
Lindgren Cecilia M
Liu Daren
London Barry
Lopez Alexander
Lotta Luca A
Lovering Ruth C
Luan Jian'an
Lubitz Steven A
Lumbers R Thomas
Maerz Winfried
Magnusson Patrik
Mahajan Anubha
Malarstig Anders
Manoochehri Kia
Marchini Jonathan
Marcketta Anthony
Margulies Kenneth B
Maxwell Evan K
McCarthy Shane
McMurray John JV
Melander Olle
Mitnaul Lyndon J
Mordi Ify R
Morgan Thomas
Morley Michael P
Morris Andrew D
Morris Andrew P
Morrison Alanna C
Nagle Michael W
Nelson Christopher P
Newton-Cheh Christopher
Niessner Alexander
Niiranen Teemu
O'Donoghue Michelle L
O'Dushlaine Colm
Overton John D
Owens Anjali T
Padilla Maria Sotiropoulos
Palmer Colin NA
Parry Helen M
Paulding Charles
Penn John
Perola Markus
Portilla-Fernandez Eliana
Pradhan Manasi
Psaty Bruce M
Reid Jeffrey G
Rice Kenneth M
Ridker Paul M
Romaine Simon PR
Roselli Carolina
Rotter Jerome I
Salo Perttu
Salomaa Veikko
Samani Nilesh J
Sattar Naveed
Schleicher Thomas D
Schurmann Claudia
Shah Sonia
Shalaby Alaa A
Shuldiner Alan
Smelser Diane T
Smith J Gustav
Smith Nicholas L
Staples Jeffrey C
Stefansson Kari
Stender Steen
Stott David J
Sun Dylan
Sveinbjornsson Gardar
Svensson Per
Swerdlow Daniel I
Tammesoo Mari-Liis
Taylor Kent D
Teder-Laving Maris
Teumer Alexander
Thorgeirsson Gudmundur
Thorsteinsdottir Unnur
Toledo Karina
Torp-Pedersen Christian
Trompet Stella
Tyl Benoit
Uitterlinden Andre G
Ulloa Ricardo H
van der Harst Pim
van Hout Cristopher
van Setten Jessica
Vasan Ramachandran S
Veluchamy Abirami
Verweij Niek
Visscher Peter M
Voelker Uwe
Voors Adriaan A
Wang Xiaosong
Wareham Nicholas J
Waterworth Dawn
Weeke Peter E
Weiss Raul
Widom Louis
Wiggins Kerri L
Wilk Jemma B
Wolf Sarah E
Xing Heming
Yadav Ashish
Yang Jian
Ye Bin
Yerges-Armstrong Laura M
Yu Bing
Zannad Faiez
Zhao Jing Hua
Publication venue: Nature Communications
Publication date: 01/01/2020
Field of study

Abstract: Heart failure (HF) is a leading cause of morbidity and mortality worldwide. A small proportion of HF cases are attributable to monogenic cardiomyopathies and existing genome-wide association studies (GWAS) have yielded only limited insights, leaving the observed heritability of HF largely unexplained. We report results from a GWAS meta-analysis of HF comprising 47,309 cases and 930,014 controls. Twelve independent variants at 11 genomic loci are associated with HF, all of which demonstrate one or more associations with coronary artery disease (CAD), atrial fibrillation, or reduced left ventricular function, suggesting shared genetic aetiology. Functional analysis of non-CAD-associated loci implicate genes involved in cardiac development (MYOZ1, SYNPO2L), protein homoeostasis (BAG3), and cellular senescence (CDKN1A). Mendelian randomisation analysis supports causal roles for several HF risk factors, and demonstrates CAD-independent effects for atrial fibrillation, body mass index, and hypertension. These findings extend our knowledge of the pathways underlying HF and may inform new therapeutic strategies