Search CORE

117 research outputs found

The Intelligence in Developing Systems for Molecular Biology

Author: Sahinalp S. C.
Publication venue
Publication date: 01/01/2007
Field of study

Simon Fraser University Institutional Repository

Combinatorial pattern matching

Author: Dogrusoz U.
Muthukrishnan S.
Sahinalp S. C.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Cataloged from PDF version of article.15th Annual Symposium, CPM 2004 : Istambul, Turkey, July 5-7, 2004 : proceeding

Bilkent University Institutional Repository

Sparsification of RNA Structure Prediction Including Pseudoknots

Author: Mohl Mathias
Sahinalp S. C.
Salari Raheleh
Publication venue
Publication date: 01/01/2010
Field of study

Background: Although many RNA molecules contain pseudoknots, computational prediction of pseudoknottedRNA structure is still in its infancy due to high running time and space consumption implied by the dynamicprogramming formulations of the problem.Results: In this paper, we introduce sparsification to significantly speedup the dynamic programming approachesfor pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification hasbeen applied to a number of RNA-related structure prediction problems in the past few years, we provide the firstapplication of sparsification to pseudoknotted RNA structure prediction specifically and to handling gappedfragments more generally - which has a much more complex recursive structure than other problems to whichsparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms,among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerichalgorithm). In all algorithms the number of “candidate” substructures to be considered is reduced.Conclusions: Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup overthe unsparsified implementation

Simon Fraser University Institutional Repository

mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications

Author: Alkan C.
Eichler E. E.
Hach F.
Hormozdiari F.
Sahinalp S. C.
Sarrafi I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Cataloged from PDF version of article.High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net

CiteSeerX

Bilkent University Institutional Repository

PubMed Central

Optimal pooling for genome re-sequencing with ultra-high-throughput short-read technologies

Author: Bennett
F. Hormozdiari
I. Birol
I. Hajirasouliha
Krzywinski
Margulies
Pevzner
S. C. Sahinalp
Sanger
Sanger
Publication venue: Oxford University Press
Publication date
Field of study

New generation sequencing technologies offer unique opportunities and challenges for re-sequencing studies. In this article, we focus on re-sequencing experiments using the Solexa technology, based on bacterial artificial chromosome (BAC) clones, and address an experimental design problem. In these specific experiments, approximate coordinates of the BACs on a reference genome are known, and fine-scale differences between the BAC sequences and the reference are of interest. The high-throughput characteristics of the sequencing technology makes it possible to multiplex BAC sequencing experiments by pooling BACs for a cost-effective operation. However, the way BACs are pooled in such re-sequencing experiments has an effect on the downstream analysis of the generated data, mostly due to subsequences common to multiple BACs. The experimental design strategy we develop in this article offers combinatorial solutions based on approximation algorithms for the well-known max n-cut problem and the related max n-section problem on hypergraphs. Our algorithms, when applied to a number of sample cases give more than a 2-fold performance improvement over random partitioning

Crossref

PubMed Central

Whole genome sequencing of Turkish genomes reveals functional private alleles and impact of genetic interactions with Europe, Asia and Africa

Author: Alkan C.
Bekpen C.
Bugra K.
Dal E.
Gokcumen O.
Güngör T.
Kavak P.
Sahinalp S.C.
Saygi C.
Somel M.
Ugurlu S.
Özören N.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Turkey is a crossroads of major population movements throughout history and has been a hotspot of cultural interactions. Several studies have investigated the complex population history of Turkey through a limited set of genetic markers. However, to date, there have been no studies to assess the genetic variation at the whole genome level using whole genome sequencing. Here, we present whole genome sequences of 16 Turkish individuals resequenced at high coverage (32 × -48×). Results: We show that the genetic variation of the contemporary Turkish population clusters with South European populations, as expected, but also shows signatures of relatively recent contribution from ancestral East Asian populations. In addition, we document a significant enrichment of non-synonymous private alleles, consistent with recent observations in European populations. A number of variants associated with skin color and total cholesterol levels show frequency differentiation between the Turkish populations and European populations. Furthermore, we have analyzed the 17q21.31 inversion polymorphism region (MAPT locus) and found increased allele frequency of 31.25% for H1/H2 inversion polymorphism when compared to European populations that show about 25% of allele frequency. Conclusion: This study provides the first map of common genetic variation from 16 western Asian individuals and thus helps fill an important geographical gap in analyzing natural human variation and human migration. Our data will help develop population-specific experimental designs for studies investigating disease associations and demographic history in Turkey. © 2014 Alkan et al

Bilkent University Institutional Repository

Dissect: detection and characterization of novel structural alterations in transcribed sequences

Author: Brassesco
Brudno
Burge
B secke
C. C. Collins
Caudevilla
D. Yorukoglu
De Braekeleer
F. Hach
Frantz
Gingeras
Hach
Horiuchi
I. Birol
Kidd
L. Swanson
Labrador
Levin
McPherson
Miller
Minoche
Mott
Nacu
S. C. Sahinalp
Sboner
Slater
Takahashi
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Motivation: Computational identification of genomic structural variants via high-throughput sequencing is an important problem for which a number of highly sophisticated solutions have been recently developed. With the advent of high-throughput transcriptome sequencing (RNA-Seq), the problem of identifying structural alterations in the transcriptome is now attracting significant attention

DSpace@MIT

Crossref

PubMed Central

Structural variation and fusion detection using targeted sequencing data from circulating cell free DNA

Author: Adra Nabil
Asghari Hossein
Collins Colin C.
Gawroński Alexander R.
Hach Faraz
Koçkan Can
LeBihan Stephane
Lin Yen-Yi
McConeghy Brian
Orabi Baraa
Pili Roberto
Sahinalp S. Cenk
Publication venue: 'Oxford University Press (OUP)'
Publication date: 23/04/2019
Field of study

MOTIVATION: Cancer is a complex disease that involves rapidly evolving cells, often forming multiple distinct clones. In order to effectively understand progression of a patient-specific tumor, one needs to comprehensively sample tumor DNA at multiple time points, ideally obtained through inexpensive and minimally invasive techniques. Current sequencing technologies make the 'liquid biopsy' possible, which involves sampling a patient's blood or urine and sequencing the circulating cell free DNA (cfDNA). A certain percentage of this DNA originates from the tumor, known as circulating tumor DNA (ctDNA). The ratio of ctDNA may be extremely low in the sample, and the ctDNA may originate from multiple tumors or clones. These factors present unique challenges for applying existing tools and workflows to the analysis of ctDNA, especially in the detection of structural variations which rely on sufficient read coverage to be detectable. RESULTS: Here we introduce SViCT , a structural variation (SV) detection tool designed to handle the challenges associated with cfDNA analysis. SViCT can detect breakpoints and sequences of various structural variations including deletions, insertions, inversions, duplications and translocations. SViCT extracts discordant read pairs, one-end anchors and soft-clipped/split reads, assembles them into contigs, and re-maps contig intervals to a reference genome using an efficient k-mer indexing approach. The intervals are then joined using a combination of graph and greedy algorithms to identify specific structural variant signatures. We assessed the performance of SViCT and compared it to state-of-the-art tools using simulated cfDNA datasets with properties matching those of real cfDNA samples. The positive predictive value and sensitivity of our tool was superior to all the tested tools and reasonable performance was maintained down to the lowest dilution of 0.01% tumor DNA in simulated datasets. Additionally, SViCT was able to detect all known SVs in two real cfDNA reference datasets (at 0.6-5% ctDNA) and predict a novel structural variant in a prostate cancer cohort

IUPUIScholarWorks

HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology.

Author: Anderson Shawn
Collins Colin C
Dao Phuong
Haffari Gholamreza
Hodzic Ermin
Sahinalp S Cenk
Sauerwald Thomas
Shrestha Raunak
Vandin Fabio
Wang Kendric
Yeung Jake
Publication venue: Genome Res
Publication date: 01/01/2017
Field of study

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology

Crossref

Apollo (Cambridge)

Archivio istituzionale della ricerca - Università di Padova

smyRNA: A Novel Ab Initio ncRNA Gene Finder

Author: A Coventry
A Fontaine
C Dieterich
Cagri Aksay
D di Bernardo
DP Bartel
E Bonnet
E Rivas
E Rivas
Emre Karakoc
G Storz
IL Hofacker
IL Hofacker
IM Meyer
IM Meyer
Iman Hajirasouliha
J Thompson
JS Pedersen
M Margulies
Peter J. Unrau
Raheleh Salari
RJ Carter
S Griffiths-Jones
S Washietl
S. Cenk Sahinalp
SR Eddy
SR Eddy
Stefan Maas
Z Yao
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs. Methodology/Principal Findings: We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences. Conclusions/Significance: Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository