Search CORE

9 research outputs found

REFBSS: Reference Based Similarity Search in Biological Network Databases

Author: Abul Osman
Soylev Arda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology CIBCB (2015 : Honolulu, HI)Biological networks, mostly abstracted as graphs, are key to many important activities inside the cell. Similarity-based analysis is one of the techniques for understanding the role of a query network. In that context, a database consisting of biological networks is aligned with a query network and the networks having a similarity score higher and lower than a predefined cutoff value are separated. Because of the NP-complete sub-graph isomorphism problem, nontrivial similarity score calculation is computationally too expensive. To this end, several methods are proposed in the literature for an acceptable solution. Reference-based indexing methods are one of the popular solutions which indexes the network database by extracting small sized networks as references to be aligned with the query network. Based on this strategy, we propose a novel model that has methodological and heuristic improvements for fast approximate similarity search, which all turn out to be fast and accurate. We also have a high-performance implementation on Hadoop that achieved 11.42 speedup on a Hadoop cluster with 18 cores on a sample KEGG network database

TOBB ETÜ Institutional Repository

Discovery of tandem and interspersed segmental duplications using high throughput sequencing

Author: Alkan Can
Amini Hajar
Hormozdiari Fereydoun
Le Thong
Soylev Arda
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

MOTIVATION:Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. RESULTS:We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). AVAILABILITY AND IMPLEMENTATION:TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online

Hacettepe University Institutional Repository

eScholarship - University of California

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Discovery of tandem and interspersed segmental duplications using high-throughput sequencing

Author: Alkan Can
Amini Hajar
Hormozdiari Fereydoun
Minh Le Thong
Soylev Arda
Publication venue: Oxford University Press
Publication date: 01/10/2019
Field of study

Motivation Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. Results We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). Availability and implementation TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. Supplementary information Supplementary data are available at Bioinformatics online.ISSN:1367-4803ISSN:1460-205

Repository for Publications and Research Data

Recommended from our members

A comprehensive benchmarking of WGS-based deletion structural variant callers.

Author: Ayyala Ram
Castellanos Jacqueline
Chang Sei
Chhugani Karishma
Chikka Rahul
Comarova Zoia
Darfci-Maher Nicholas
Distler Margaret G
Eskin Eleazar
Flint Jonathan
Kim Minyoung
Littman Russell
Lu Angela
Mangul Serghei
Niehus Sebastian
Rajkumar Neha
Sarkar Aditya
Sarwal Varuni
Soylev Arda
Wesel Emily
Publication venue: eScholarship, University of California
Publication date: 18/07/2022
Field of study

Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories

eScholarship - University of California

A robust benchmark for detection of germline large deletions and insertions

Author: Alexander Noah
Alkan Can
Barrio Alvaro Martinez
Bashir Ali
Boutros Paul C
Carroll Andrew
Catalano Anthony P
Chaisson Mark JP
Chapman Lesley
Chen Ken
Church George
Church George
Davis Jennifer R
English Adam C
Fan Xian
Farrell John J
Fiddes Ian T
Garg Shilpa
Ghaffari Noushin
Hajirasouliha Iman
Hansen Nancy F
Huang Vincent
Jackman Shaun
Kaiser Michael D
Koren Sergey
Lee Joyce
Marschall Tobias
Mason Christopher E
Mills Ryan E
Mullikin James C
Oliver John S
Olson Nathan D
Phillippy Adam M
Ricketts Camir
Rodriguez Oscar L
Rosenfeld Jeffrey A
Rouette Alexandre
Sage Jay M
Sahraeian Sayed Mohammad E
Salit Marc
Schatz Michael C
Sedlazeck Fritz J
Sherry Stephen
Soylev Arda
Spies Noah
Tearle Rick
Wala Jeremiah
Wenger Aaron M
Xiao Chunlin
Zhou Weichen
Zook Justin M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2020
Field of study

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed the first sequence-resolved benchmark set for identification of both false negative and false positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12745 isolated, sequence-resolved insertion (7281) and deletion (5464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5262 insertions and 4095 deletions supported by ≥1 diploid assembly. We demonstrate the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping

Crossref

PubMed Central

eScholarship - University of California

Recommended from our members

Author Correction: A robust benchmark for detection of germline large deletions and insertions.

Author: Alexander Noah
Alkan Can
Barrio Alvaro Martinez
Bashir Ali
Boutros Paul C
Carroll Andrew
Catalano Anthony P
Chaisson Mark JP
Chapman Lesley
Chen Ken
Church George
Church George
Davis Jennifer R
English Adam C
Fan Xian
Farrell John J
Fiddes Ian T
Garg Shilpa
Ghaffari Noushin
Hajirasouliha Iman
Hansen Nancy F
Huang Vincent
Jackman Shaun
Kaiser Michael D
Koren Sergey
Lee Joyce
Marschall Tobias
Mason Christopher E
Mills Ryan E
Mullikin James C
Oliver John S
Olson Nathan D
Phillippy Adam M
Ricketts Camir
Rodriguez Oscar L
Rosenfeld Jeffrey A
Rouette Alexandre
Sage Jay M
Sahraeian Sayed Mohammad E
Salit Marc
Schatz Michael C
Sedlazeck Fritz J
Sherry Stephen
Soylev Arda
Spies Noah
Tearle Rick
Wala Jeremiah
Wenger Aaron M
Xiao Chunlin
Zhou Weichen
Zook Justin M
Publication venue: eScholarship, University of California
Publication date: 01/11/2020
Field of study

An amendment to this paper has been published and can be accessed via a link at the top of the paper

eScholarship - University of California

An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates

Author: Agustinho Daniel Paiva
Al Khleifat Ahmad
Albin Dreycey
Aliyev Elbay
Almabrazi Hakeem
Arslan Ahmed
Balaji Advait
Behera Sairam
Biederstedt Evan
Billingsley Kimberley
Busby Ben
C. Soto Daniela
Chaisson Mark
Chin Chen Shan
Dabbaghie Fawaz
Daw Joyjit
Dawood Moez
De Coster Wouter
Du Haowei
Dunn Christopher
English Adam
Esteban Rocio
Hefferon Timothy
J Sedlazeck Fritz
J. Treangen Todd
Jochum Michael
Jolly Angad
K Kesharwani Rupesh
Kalra Divya
Kille Bryce
Kronenberg Zev
L Cameron Daniel
Liao Chunxiao
Liu Yunxi
Lu Tsung Yu
M Havrilla James
M Khayat Michael
Mahmoud Medhat
Marin Maximillian
Mc Cartney Ann M.
Monlong Jean
Price Stephen
Rafael Gener Alejandro
Ren Jingwen
Sagayaradj Sagayamary
Sapoval Nicolae
Sinner Claude
Smolka Moritz
Soylev Arda
Subramaniyan Arun
Syed Najeeb
T. Dawson Eric
Tadimeti Neha
Tater Pamella
Vats Pankaj
Vaughn Justin
Walker Kimberly
Wang Gaojianyong
Zeng Qiandong
Zhang Shangzhe
Zhao Tingting
Zorman Barry
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2021
Field of study

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community

Directory of Open Access Journals

King's Research Portal

University of St. Andrews - Pure