Search CORE

9 research outputs found

Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing

Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS

Aquila Digital Community (University of Southern Mississippi, USM)

Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30x. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.Peer reviewe

Aquila Digital Community (University of Southern Mississippi, USM)

PubMed Central

Helsingin yliopiston digitaalinen arkisto

Data mining in bioinformatics using weka

Author: Eibe Frank
Geoffrey Holmes
Ian H. Witten
Len Trigg
Mark Hall
Publication venue
Publication date: 01/01/2004
Field of study

The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it

CiteSeerX

Research Commons@Waikato

Weka: Practical Machine Learning Tools and Techniques with Java Implementations

Author: Eibe Frank
Geoffrey Holmes
Ian H. Witten
Ian Witten Eibe
Len Trigg
Mark Hall
Sally Jo Cunningham
Publication venue: Morgan Kaufmann
Publication date: 01/01/1999
Field of study

Introduction The Waikato Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java class libraries that implement many state-of-the-art machine learning and data mining algorithms. Weka is freely available on the World-Wide Web and accompanies a new text on data mining [1] which documents and fully explains all the algorithms it contains. Applications written using the Weka class libraries can be run on any computer with a Web browsing capability; this allows users to apply machine learning techniques to their own data regardless of computer platform. Tools are provided for pre-processing data, feeding it into a variety of learning schemes, and analyzing the resulting classifiers and their performance. An important resource for navigating through Weka is its on-line documentation, which is automatically generated from the source. The primary learning methods in Weka are classifiers, and they induce a rule set or decision tree that models the data. Weka als

CiteSeerX

Research Commons@Waikato

eweitz/ideogram: v1.45.1

Author: Aleksandr Zelenin
Aman Patel
Claudio Lorenzi
Eric Weitz
jimlund
Len Trigg
Mathieu Rouard
Matthew
NicoNekoru
ProjectProgramAMark
Rich Wandell
StantonMartin
Publication venue: Zenodo
Publication date: 16/01/2024
Field of study

<ul> <li>Fix URL code for tissue cache range fetch (#368)</li> </ul&gt

ZENODO

Recommended from our members

Best practices for benchmarking germline small-variant calls in human genomes.

Author: Asimenos George
Boutros Paul C
Chapman Brad A
De La Vega Francisco M
Eberle Michael A
Fleharty Mark
Funke Birgit
Global Alliance for Genomics and Health Benchmarking Team
Gonzalez-Porta Mar
Krusche Peter
Lababidi Samir
Mason Christopher E
Moore Benjamin L
Salit Marc
Tezak Zivana
Trigg Len
Truty Rebecca
Zook Justin M
Publication venue: eScholarship, University of California
Publication date: 01/05/2019
Field of study

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results

eScholarship - University of California

Recommended from our members

Best practices for benchmarking germline small-variant calls in human genomes.

Author: Asimenos George
Boutros Paul C
Chapman Brad A
De La Vega Francisco M
Eberle Michael A
Fleharty Mark
Funke Birgit
Global Alliance for Genomics and Health Benchmarking Team
Gonzalez-Porta Mar
Krusche Peter
Lababidi Samir
Mason Christopher E
Moore Benjamin L
Salit Marc
Tezak Zivana
Trigg Len
Truty Rebecca
Zook Justin M
Publication venue: eScholarship, University of California
Publication date: 01/05/2019
Field of study

eScholarship - University of California

Australia (Including Papua New Guinea

Author: Adam Shoemaker
Adi Wimmer
Alan Gould
Anderson Ethel
Andrew Peek
Annette Stewart
Astley Thea
Beaver Bruce
Beth Watzke
Beverley Farmer
Brand Mona
Broderick Damien
Bruce Beaver
Carey Peter
Carol Hetherington
Carol Merli
Chris Wallace-Crabbe
Chris Wallace-Crabbe
Corris Peter
D.R. Bums
Dane Thwaites
David Watt
Davis Beatrice
Davis Jack
Day Marele
De Groen Alma
Dean Tuttle
Dobson Rosemary
Domahidy Andras
Drewe Robert
Ehrlich David
Elizabeth Jolley
Elizabeth Perkins
Ercole Velia
Farmer Beverley
Faye Christenberry
Foster David
Furphy Joseph
Gareth Griffiths
Garner Helen
Gellert Leon
Geoffrey Bingham
Geoffrey Dutton
Geoffrey Serle
Gillian Whitlock
Goldsworthy Peter
Gow Michael
Graham Rowlands
Graham Rowlands
Gray Tom
Halligan Marion
Hanrahan Barbara
Hardy Frank
Harris Robert
Harry J.S.
Harwood Gwen
Hasluck Nicholas
Hazzard Shirley
Helen Daniel
Helen Daniel
Helen Gamer
Helen Garner
Howell John
Hughes A. McC.
Hume Fergus
Hungerford T.A.G.
Ian Syson
Ian Syson
Ivor Indyk
Jennifer Strauss
Jones Gail
Jose Nicholas
Josie Fantasia
Katharine England
Kelleher Victor
Kevin Brophy
Kevin Hart
Kevin Hart
Kevin Hart
Klaus Neumann
Krauth Nigel
Lawrence Bourke
Lawson Henry
Len Fox
Len Fox
Lewis Julie
Lindsay Barrett
Llewellyn Kate
London Joan
Lyn Jacobs
Margaret Bradstock
Margot Luke
Martin A.E.
Martin Catherine
Martin Catherine
Masters Olga
McCullough Colleen
McNab Claire
Michael Tolley
Miller Alex
Morgan Sally
Mudrooroo
Murnane Gerald
Murray Les
Myron Lysenko
Narelle Shaw
Nick Mansfield
O'Grady John
Pearl Bowman
Peter Pierce
Robert Drewe
Robin Wallace-Crabbe
Rodriguez Judith
Rolls Eric
Rosemary Dobson
Rosemary Sorensen
Rosemary Sorensen
Scott Margaret
Scott Rosie
Shearer Jill
Simon Ryan
Slessor Kenneth
Southall Ivan
Stead Christina
Stephanie Trigg
Stephanie Trigg
Strauss Jennifer
Sue Gillett
Susan Lever
Susan Martin
Sylvia Martin
Tasma
Terry Lane
Tranter John
Turner Ethel
Upfield Arthur
Veronica Brady
Veronica Brady
Walker Brenda
Walwicz Ania
White Patrick
Woolls William
Wright Judith
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref