Search CORE

2,830 research outputs found

Assessment of Alignment Algorithms, Variant Discovery and Genotype Calling Strategies in Exome Sequencing Data

Author: Corbett Anthony
Publication venue: RIT Scholar Works
Publication date: 24/09/2015
Field of study

Advances in next generation sequencing (NGS) technologies, in the past half decade, have enabled many novel genomic applications and have generated unprecedented amounts of new knowledge that is quickly changing how biomedical research is being conducted, as well as, how we view human diseases and diversity. As the methods, algorithms and software used to process NGS data are constantly being developed and improved, performing analysis and determining the validity of the results become complex. Moreover, as sequencing moves from being a research tool into a clinical diagnostic tool understanding the performance and limitations of bioinformatics pipelines and the results they produce becomes imperative. This thesis aims to assess the performance of nine bioinformatics pipelines for sequence read alignment, variant calling and genotyping in a Mendelian inherited disease, parent-trio exome sequencing design. A well-characterized reference variant call set from the National Institute of Standards and Technology and the Genome in a Bottle Consortium is be used for producing and comparing the analytical performance of each pipeline on the GRCh37 and GRCh38 human references

RIT Scholar Works

Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data

Author: de Cid Rafael
Duran Albareda Xavier
Galván Femenía Iván
Gavaldà Mestre Ricard
Rafael-Palou Xavier
Ribas Ripoll Vicent
Valdés María Gabriela
Yokota Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background: During the last decade, the interest to apply machine learning algorithms to genomic data has increased in many bioinformatics applications. Analyzing this type of data entails difficulties for managing high-dimensional data, class imbalance for knowledge extraction, identifying important features and classifying individuals. In this study, we propose a general framework to tackle these challenges with different machine learning algorithms and techniques. We apply the configuration of this framework on lung cancer patients, identifying genetic signatures for classifying response to drug treatment response. We intersect these relevant SNPs with the GWAS Catalog of the National Human Genome Research Institute and explore the Regulomedb, GTEx databases for functional analysis purposes. Results: The machine learning based solution proposed in this study is a scalable and flexible alternative to the classical uni-variate regression approach to analyze large-scale data. From 36 experiments executed using the machine learning framework design, we obtain good classification performance from the top 5 models with the highest cross-validation score and the smallest standard deviation. One thousand two hundred twenty four SNPs corresponding to the key features from the top 20 models (cross validation F1 mean >= 0.65) were compared with the GWAS Catalog finding no intersection with genome-wide significant reported hits. From these, new genetic signatures in MAE, CEP104, PRKCZ and ADRB2 show relevant biological regulatory functionality related to lung physiology. Conclusions: We have defined a machine learning framework using data with an unbalanced large data-set of SNP-arrays and imputed genotyping data from a pharmacogenomics study in lung cancer patients subjected to first-line platinum-based treatment. This approach found genome signals with no genome-wide significance in the uni-variate regression approach (GWAS Catalog) that are valuable for classifying patients, only few of them with related biological function. The effect results of these variants can be explained by the recently proposed omnigenic model hypothesis, which states that complex traits can be influenced mostly by genes outside not only by the “core genes”, mainly found by the genome-wide significant SNPs, but also by the rest of genes outside of the “core pathways” with apparent unrelated biological functionality.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

FigShare

Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences

Author: Alonso
Aranda
Aretz
Ashburner
Babtie
Bell
Ben-Ari Fuchs
Bender
Berg
Bernfield
Betzen
Breda
Brum
Bulik-Sullivan
Bumgarner
Burgess
Byron
Cerami
Chatr-Aryamontri
Cho
Claudia Manzoni
Colantuoni
Collins
Consortium GT
Consortium UK
Cook
Croft
Cusick
Darmanis
Demis A Kia
Dilthey
Eichler
ENCODE Project Consortium
Finkbeiner
Fonseca
Gaul
Genomes
Gentleman
Gholami
Gillis
Giordano
Gonzaga-Jauregui
Gostev
Gudbjartsson
Guinney
Harrow
Harrow
Haug
Homer
Horaitis
Huang
Huang
Huang da
International HapMap Consortium
International Human Genome Sequencing Consortium
Jana Vandrovcova
Jin
John Hardy
Jostins
Kaiser
Kanehisa
Kang
Kannan
Kartashov
Kellis
Kerrien
Khatri
Koh
Kohler
Koufaris
Kristensen
Kukurba
Kulkarni
Lamb
Langfelder
Lappalainen
Law
Li
Liang
Londin
Manolio
Manolio
Manzoni
Marchini
Martens
Martens
Marx
Mattick
McKenna
McWilliam
Menzel
Metzker
Mi
Modelska
MSI Board Menbers
Nagalakshmi
Naifang
Nalls
Nicholas W Wood
Orchard
Orchard
Orchard
Orchard
Pantaleo
Pathan
Patrick A Lewis
Pearson
Perez-Riverol
Pible
Pop
Pritchard
Protein
Purcell
Raffaele Ferrari
Ramasamy
Reimand
Rhee
Rivas
Roider
Salek
Sales
Sanger
Scalbert
Schaefer
Schneider
Searls
Shendure
Shi
Smith
Speir
Szklarczyk
Tasan
Turing
Uda
UniProt Consortium
van Dijk
van Karnebeek
Vaughan
Venter
Vizcaino
Vizcaino
Vogelzang
Von Bertalanffy
Wain
Wang
Wang
Wanichthanarak
Warde-Farley
Watson
Westra
Wild
Williams
Wingender
Wishart
Wishart
Wu
Yang
Yao
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/03/2018
Field of study

Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research

Central Archive at the University of Reading

Crossref

UCL Discovery

Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations

Author: Bigdeli TB
Brick L
Carey CE
Chen C-Y
Chen J
Cuellar-Barboza A
Duncan LE
Edenberg HJ
Edwards AC
Huang H
Iyegbe C
Kalungi A
Koen N
Kuchenbaecker K
Lam M
Majara L
Martin AR
Meyers JL
Mowry B
Periyasamy S
Peterson RE
Popejoy AB
Prieto ML
Schwarz E
Smoller JW
Stahl EA
Strawbridge RJ
Su J
Sullivan PF
Vassos E
Walters RK
Publication venue
Publication date: 17/10/2019
Field of study

Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well

UCL Discovery

A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases

Crossref

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

African genomes illuminate the early history and transition to selfing in Arabidopsis thaliana

Author: Alacakaptan S.
Alonso-Blanco C.
Burbano H.
Durvasula A.
Flood P.
Fulgione A.
Gutaker R.
Hancock A.
Neto C.
Tsuchimatsu T.
Xavier Pico F.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2017
Field of study

Over the past 20 y, many studies have examined the history of the plant ecological and molecular model, Arabidopsis thaliana, in Europe and North America. Although these studies informed us about the recent history of the species, the early history has remained elusive. In a large-scale genomic analysis of African A. thaliana, we sequenced the genomes of 78 modern and herbarium samples from Africa and analyzed these together with over 1,000 previously sequenced Eurasian samples. In striking contrast to expectations, we find that all African individuals sampled are native to this continent, including those from sub-Saharan Africa. Moreover, we show that Africa harbors the greatest variation and represents the deepest history in the A. thaliana lineage. Our results also reveal evidence that selfing, a major defining characteristic of the species, evolved in a single geographic region, best represented today within Africa. Demographic inference supports a model in which the ancestral A. thaliana population began to split by 120-90 kya, during the last interglacial and Abbassia pluvial, and Eurasian populations subsequently separated from one another at around 40 kya. This bears striking similarities to the patterns observed for diverse species, including humans, implying a key role for climatic events during interglacial and pluvial periods in shaping the histories and current distributions of a wide range of species

Digital.CSIC

MPG.PuRe

Beyond Greed and Grievance: Feasibility and Civil War

Author: Anke Hoeffler
Dominic Rohner
Paul Collier
Publication venue
Publication date
Field of study

A key distinction among theories of civil war is between those that are built upon motivation and those that are built upon feasibility. We analyze a comprehensive global sample of civil wars for the period 1965-2004 and subject the results to a range of robustness tests. The data constitute a substantial advance on previous work. We find that variables that are close proxies for feasibility have powerful consequences for the risk of a civil war. Our results substantiate the ’feasibility hypothesis’ that where civil war is feasible it will occur without reference to motivation.

Research Papers in Economics

Updates in metabolomics tools and resources: 2014-2015

Author: Misra Biswapriya B.
van der Hooft Justin
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

Enlighten

Assembly of 809 whole mitochondrial genomes with clinical, imaging, and fluid biomarker phenotyping

Author: Alzheimer’s Disease Neuroimaging Initiative
Green Robert C.
Kauwe John S. K.
Miller Justin B.
Ridge Perry G.
Saykin Andrew J.
Wadsworth Mark E.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2018
Field of study

INTRODUCTION: Mitochondrial genetics are an important but largely neglected area of research in Alzheimer's disease. A major impediment is the lack of data sets. METHODS: We used an innovative, rigorous approach, combining several existing tools with our own, to accurately assemble and call variants in 809 whole mitochondrial genomes. RESULTS: To help address this impediment, we prepared a data set that consists of 809 complete and annotated mitochondrial genomes with samples from the Alzheimer's Disease Neuroimaging Initiative. These whole mitochondrial genomes include rich phenotyping, such as clinical, fluid biomarker, and imaging data, all of which is available through the Alzheimer's Disease Neuroimaging Initiative website. Genomes are cleaned, annotated, and prepared for analysis. DISCUSSION: These data provide an important resource for investigating the impact of mitochondrial genetic variation on risk for Alzheimer's disease and other phenotypes that have been measured in the Alzheimer's Disease Neuroimaging Initiative samples

IUPUIScholarWorks