Search CORE

41 research outputs found

FastRemap: A Tool for Quickly Remapping Reads between Genome Assemblies

Author: Alkan Can
Cali Damla Senol
Cavlak Meryem Banu
Firtina Can
Kim Jeremie S.
Mutlu Onur
Publication venue
Publication date: 04/09/2022
Field of study

A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool. With the explosion of available genomic data sets and references, high-performance remapping tools will be even more important for keeping up with the computational demands of genome assembly and analysis. We provide FastRemap, a fast and efficient tool for remapping reads between genome assemblies. FastRemap provides up to a 7.82

\times

speedup (6.47

\times

, on average) and uses as low as 61.7% (80.7%, on average) of the peak memory consumption compared to the state-of-the-art remapping tool, CrossMap. FastRemap is written in C++. The source code and user manual are freely available at: github.com/CMU-SAFARI/FastRemap. Docker image available at: https://hub.docker.com/r/alkanlab/fast. Also available in Bioconda at: https://anaconda.org/bioconda/fastremap-bio.Comment: FastRemap is open source and all scripts needed to replicate the results in this paper can be found at https://github.com/CMU-SAFARI/FastRema

arXiv.org e-Print Archive

GLANET: Genomic loci annotation and enrichment tool

Author: Firtina C.
Keleş S.
Otlu B.
Tastan O.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Motivation: Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations. Results: We present GLANET as a comprehensive annotation and enrichment analysis tool which implements a sampling-based enrichment test that accounts for GC content and/or mappability biases, jointly or separately. GLANET annotates and performs enrichment analysis on these loci with a rich library. We introduce and perform novel data-driven computational experiments for assessing the power and Type-I error of its enrichment procedure which show that GLANET has attained high statistical power and well-controlled Type-I error rate. As a key feature, users can easily extend its library with new gene sets and genomic intervals. Other key features include assessment of impact of single nucleotide variants (SNPs) on TF binding sites and regulation based pathway enrichment analysis. Availability and implementation: GLANET can be run using its GUI or on command line. GLANET's source code is available at https://github.com/burcakotlu/GLANET. Tutorials are provided at https://glanet.readthedocs.org. © 2017 The Author

Bilkent University Institutional Repository

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

Author: Alkan Can
Alser Mohammed
Cali Damla Senol
Firtina Can
Ghiasi Nika Mansouri
Kanellopoulos Konstantinos
Kim Jeremie S.
Mutlu Onur
Park Jisung
Shahroodi Taha
Singh Gagandeep
Publication venue
Publication date: 20/07/2022
Field of study

Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either 1) increasing the use of the costly sequence alignment or 2) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seeds matches. BLEND 1) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and 2) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.6x-63.5x (on average 19.5x), has a lower memory footprint by 0.9x-9.7x (on average 3.6x), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.7x-3.7x (on average 1.7x) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND

arXiv.org e-Print Archive

Repository for Publications and Research Data

Lymphoma Predisposing Gene in an Extended Family: CD70 Signaling Defect.

Author: Altindirek Didem
Bay Sb
Erbilgin Y
Erol Fc
Firtina S
Kaya A
Kebudi R
Khodzhaev K
Kiykim A
Ng Oh
Ng Yy
Sayitoglu Müge
Zengin Fs
Publication venue
Publication date: 01/01/2020
Field of study

İstanbul Üniversitesi Açık Erişim Sistemi

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Author: Alser Mohammed
Cali Damla Senol
Cavlak Meryem Banu
Firtina Can
Kalsi Gurpreet S.
Kim Jeremie
Lindegger Joel
Luna Juan Gómez
Mutlu Onur
Pillai Kamlesh
Shahroodi Taha
Subramoney Sreenivas
Suresh Bharathwaj
Publication venue
Publication date: 21/10/2023
Field of study

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC

arXiv.org e-Print Archive

Mutational landscape of severe combined immunodeficiency patients from Turkey.

Author: Aydiner E
Baris S
Cagdas D
Camcioglu Y
Cekic S
Cipe F
Firtina S
Hatirnaz Ng
Kaya A
Kiykim A
Nepesov S
Ozbek U
Ozen A
Reisli I
Sayar Eh
Sayitoglu Müge
Simsek Ie
Torun Sh
Uygun D
Uygun V
Yin Ng
Yucel E
Publication venue
Publication date: 01/01/2020
Field of study

İstanbul Üniversitesi Açık Erişim Sistemi

The Two Caenorhabditis elegans UDP-Glucose:Glycoprotein Glucosyltransferase Homologues Have Distinct Biological Functions

Author: A Denzel
A Hayashi
A Wright
Armando J. Parodi
AV Samuelson
BJ Park
C Mello
CG Parker
D Lee
Diane Bassham
F Fernandez
F Rauch
F Urano
FS Fernandez
GA Walker
H Coe
I Conte
JJ Caramelo
K Xu
K Yamamoto
L Timmons
Lucila I. Buzzi
M Calfon
M Guerin
ME Caruso
Olga A. Castro
P Meaden
S Brenner
S Fanchiotti
S Moreno
SE Trombetta
SE Trombetta
Sergio H. Simonetta
SM Arnold
SM Arnold
W Lee
X Shen
X Shen
Z Firtina
Publication venue: Public Library of Science
Publication date: 02/11/2011
Field of study

The UDP-Glc:glycoprotein glucosyltransferase (UGGT) is the sensor of glycoprotein conformations in the glycoprotein folding quality control as it exclusively glucosylates glycoproteins not displaying their native conformations. Monoglucosylated glycoproteins thus formed may interact with the lectin-chaperones calnexin (CNX) and calreticulin (CRT). This interaction prevents premature exit of folding intermediates to the Golgi and enhances folding efficiency. Bioinformatic analysis showed that in C. elegans there are two open reading frames (F48E3.3 and F26H9.8 to be referred as uggt-1 and uggt-2, respectively) coding for UGGT homologues. Expression of both genes in Schizosaccharomyces pombe mutants devoid of UGGT activity showed that uggt-1 codes for an active UGGT protein (CeUGGT-1). On the other hand, uggt-2 coded for a protein (CeUGGT-2) apparently not displaying a canonical UGGT activity. This protein was essential for viability, although cnx/crt null worms were viable. We constructed transgenic worms carrying the uggt-1 promoter linked to the green fluorescent protein (GFP) coding sequence and found that CeUGGT-1 is expressed in cells of the nervous system. uggt-1 is upregulated under ER stress through the ire-1 arm of the unfolded protein response (UPR). Real-time PCR analysis showed that both uggt-1 and uggt-2 genes are expressed during the entire C. elegans life cycle. RNAi-mediated depletion of CeUGGT-1 but not of CeUGGT-2 resulted in a reduced lifespan and that of CeUGGT-1 and CeUGGT-2 in a developmental delay. We found that both CeUGGT1 and CeUGGT2 play a protective role under ER stress conditions, since 10 µg/ml tunicamycin arrested development at the L2/L3 stage of both uggt-1(RNAi) and uggt-2(RNAi) but not of control worms. Furthermore, we found that the role of CeUGGT-2 but not CeUGGT-1 is significant in relieving low ER stress levels in the absence of the ire-1 unfolding protein response signaling pathway. Our results indicate that both C. elegans UGGT homologues have distinct biological functions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Plasma adhesion and inflammation markers in subjects with impaired and diabetic glucose tolerance

Author: Firtina S.
Konukoglu D.
Serin O.
Publication venue
Publication date: 06/03/2021
Field of study

İstanbul Üniversitesi Açık Erişim Sistemi