Search CORE

16 research outputs found

ReCoil - an algorithm for compression of extremely large datasets of dna data

Author: Adam L Buchsbaum
Alok Aggarwal
Bin Ma
Christos Kozanitis
Daniel D Sommer
David Eppstein
M Waterman
Markus Fritz Hsi-Yang
P Ferragina
Paolo Ferragina
R Dementiev
Roman Dementiev
Scott Christley
Veli Mäkinen
Vladimir Yanovsky
W Timothy White
Wenyu Zhang
Xin Chen
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

The growing volume of generated DNA sequencing data makes the problem of its long term storage increasingly important. In this work we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large collections of short reads DNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EuroEXA - D2.6: Final ported application software

This document describes the ported software of the EuroEXA applications to the single CRDB testbed and it discusses the experiences extracted from porting and optimization activities that should be actively taken into account in future redesign and optimization. This document accompanies the ported application software, found in the EuroEXA private repository (https://github.com/euroexa). In particular, this document describes the status of the software for each of the EuroEXA applications, sketches the redesign and optimization strategy for each application, discusses issues and difficulties faced during the porting activities and the relative lesson learned. A few preliminary evaluation results have been presented, however the full evaluation will be discussed in deliverable 2.8

OA@INAF - Istituto Nazionale di Astrofisica

ELGAR—a European Laboratory for Gravitation and Atom-interferometric Research

Author: Abend Sven
Amaro Seoane Pau
Badaracco Francesca
Beaufils Q
Bertoldi Andrea
Bongs Kai
Bouyer P
Braxmaier Claus
Canuel B
Chaibi W
Christensen N
F. Sopuerta Carlos
Fitzek F
Flouris G
Gaaloul Naceur
Gaffet S
Garrido Alzar Carlos L.
Geiger Remi
Guellati-Khelifa S
Hammerer Klemens
Harms Jan
Hinderer J
Holynski Michael
Junca J
Katsanevas S
Klempt Carsten
Kozanitis Christos
Krutzik M
Landragin A
Leykauf B
Lien Y-H
Loriani Sina
Làzaro Roche I
Merlet S
Merzougui M
Nofrarias Miquel
Papadakos Panagiotis
Pereira dos Santos Franck
Peters Achim
Plexousakis Dimitris
Prevedelli Marco
Rasel E M
Rogister Y
Rosat S
Roura Albert
Sabulsky Dylan
Schkolnik Vladimir
Schlippert Dennis
Schubert C
Sidorenkov Leonid
Siemß Jan-Niclas
Sorrentino F
Struckmann C
Tino Guglielmo M.
Tsagkatakis Grigorios
Viceré Andrea
von Klitzing Wolf
Woerner L
Zou Xinhao
Publication venue: 'IOP Publishing'
Publication date: 09/11/2019
Field of study

Gravitational waves (GWs) were observed for the first time in 2015, one century after Einstein predicted their existence. There is now growing interest to extend the detection bandwidth to low frequency. The scientific potential of multi-frequency GW astronomy is enormous as it would enable to obtain a more complete picture of cosmic events and mechanisms. This is a unique and entirely new opportunity for the future of astronomy, the success of which depends upon the decisions being made on existing and new infrastructures. The prospect of combining observations from the future space-based instrument LISA together with third generation ground based detectors will open the way toward multi-band GW astronomy, but will leave the infrasound (0.1–10 Hz) band uncovered. GW detectors based on matter wave interferometry promise to fill such a sensitivity gap. We propose the European Laboratory for Gravitation and Atom-interferometric Research (ELGAR), an underground infrastructure based on the latest progress in atomic physics, to study space–time and gravitation with the primary goal of detecting GWs in the infrasound band. ELGAR will directly inherit from large research facilities now being built in Europe for the study of large scale atom interferometry and will drive new pan-European synergies from top research centers developing quantum sensors. ELGAR will measure GW radiation in the infrasound band with a peak strain sensitivity of

3.3{\times}1{0}^{-22}/\sqrt{\text{Hz}}

at 1.7 Hz. The antenna will have an impact on diverse fundamental and applied research fields beyond GW astronomy, including gravitation, general relativity, and geology.AB acknowledges support from the ANR (project EOSBECMR), IdEx Bordeaux—LAPHIA (project OE-TWR), theQuantERA ERA-NET (project TAIOL) and the Aquitaine Region (projets IASIG3D and USOFF).XZ thanks the China Scholarships Council (No. 201806010364) program for financial support. JJ thanks ‘AssociationNationale de la Recherche et de la Technologie’ for financial support (No. 2018/1565).SvAb, NG, SL, EMR, DS, and CS gratefully acknowledge support by the German Space Agency (DLR) with funds provided by the Federal Ministry for Economic Affairs and Energy (BMWi) due to an enactment of the German Bundestag under Grants No. DLR∼50WM1641 (PRIMUS-III), 50WM1952 (QUANTUS-V-Fallturm), and 50WP1700 (BECCAL), 50WM1861 (CAL), 50WM2060 (CARIOQA) as well as 50RK1957 (QGYRO)SvAb, NG, SL, EMR, DS, and CS gratefully acknowledge support by ‘Niedersächsisches Vorab’ through the ‘Quantum- and Nano-Metrology (QUANOMET)’ initiative within the project QT3, and through ‘Förderung von Wissenschaft und Technik in Forschung und Lehre’ for the initial funding of research in the new DLR-SI Institute, the CRC 1227 DQ-mat within the projects A05 and B07DS gratefully acknowledges funding by the Federal Ministry of Education and Research (BMBF) through the funding program Photonics Research Germany under contract number 13N14875.RG acknowledges Ville de Paris (Emergence programme HSENS-MWGRAV), ANR (project PIMAI) and the Fundamental Physics and Gravitational Waves (PhyFOG) programme of Observatoire de Paris for support. We also acknowledge networking support by the COST actions GWverse CA16104 and AtomQT CA16221 (Horizon 2020 Framework Programme of the European Union).The work was also supported by the German Space Agency (DLR) with funds provided by the Federal Ministry for Economic Affairs and Energy (BMWi) due to an enactment of the German Bundestag under Grant Nos.∼50WM1556, 50WM1956 and 50WP1706 as well as through the DLR Institutes DLR-SI and DLR-QT.PA-S, MN, and CFS acknowledge support from contracts ESP2015-67234-P and ESP2017-90084-P from the Ministry of Economy and Business of Spain (MINECO), and from contract 2017-SGR-1469 from AGAUR (Catalan government).SvAb, NG, SL, EMR, DS, and CS gratefully acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC-2123 QuantumFrontiers—390837967 (B2) andCRC1227 ‘DQ-mat’ within projects A05, B07 and B09.LAS thanks Sorbonne Universités (Emergence project LORINVACC) and Conseil Scientifique de l'Observatoire de Paris for funding.This work was realized with the financial support of the French State through the ‘Agence Nationale de la Recherche’ (ANR) in the frame of the ‘MRSEI’ program (Pre-ELGAR ANR-17-MRS5-0004-01) and the ‘Investissement d'Avenir’ program (Equipex MIGA: ANR-11-EQPX-0028, IdEx Bordeaux—LAPHIA: ANR-10-IDEX-03-02).Peer Reviewe

HAL AMU

University of Birmingham Research Portal

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

MPG.PuRe

Institute of Transport Research:Publications

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

DepositOnce

HAL-INSU

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

HAL-OBSPM

Institutionelles Repositorium der Leibniz Universität Hannover

univOAK

Open Access Repository of Ulm University

Compressing and Querrying the Human Genome

Author: Kozanitis Christos A.
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

With high throughput DNA sequencing costs dropping below $1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. In order to address the large-data challenges on genomics, this thesis advocates : 1) A highly efficient read-level compression of the data which is achieved through reference-based compression by a tool called SLIMGENE and 2) a clean separation between evidence collection and inference in variant calling which is achieved though our Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants. The first contribution, SLIMGENE, introduces a set of domain specific lossless compression schemes that achieve over 40x compression of the ASCII representation of short reads, outperforming bzip2 by over 6x. Including quality values, we show 5x compression using less running time than bzip2. Secondly, given the discrepancy between the compression factor obtained with and without quality values, we initiate the study of using lossy transformations of the quality values. Specifically we show that a lossy quality value quantization results in 14x compression but has minimal impact on downstream applications like SNP calling that use quality values. The second contribution, GQL, introduces a novel framework for querying large genomic datasets. We provide a number of cases to showcase the user of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5-10 lines of code and search large datasets ( 100GB) in only a few minutes on a cheap desktop computer. We show that GQL is faster and more concise than writing equivalent queries in existing frameworks such as GATK. We show that existing callers by an order of magnitude by using GQL to retrieve evidence. We also show how GQL output can be visualized using the UCSC browse

Ezid

eScholarship - University of California

Prognostic significance of atypical leukemic cell morphology in chronic lymphocytic leukemia.

Author: Christos Kozanitis (3571421)
David Patterson (188114)
Publication venue
Publication date: 01/01/2016
Field of study

CHARLES UNIVERSITY IN PRAGUE Faculty of Pharmacy in Hradec Králové Department of Biological and Medical Sciences Study program: Health Care Bioanalytics Candidate: Bc. Nikola Fučíková Supervisor: doc. MUDr. Lukáš Smolej, Ph.D. Title of diploma thesis: Prognostic significance of atypical leukemic cell morphology in chronic lymphocytic leukemia The aim of this thesis is to evaluate the prognostic significance of atypical cell morphology and smudge cells in patients with untreated chronic lymphocytic leukemia. We performed differential leukocytes count and classified lymphocytes as typical and atypical in a cohort of 101 patients (median age, 66 years; males, 69%, Rai III/IV stages, 18%). For atypical CLL, we used the 15% threshold and 59% of patients were classified as atypical CLL (aCLL). For smudge cells, we chose the 30% threshold and 33% of patients were classified as smudge cells positive. Patients in early clinical Rai stage (0) had significantly higher number of smudge cells (p=0.04). We didn't find a significant association between aCLL / smudge cells with modern prognostic indicators. We didn't find a relationship between aCLL and the time to first-line therapy (p=0.394). However, patients with aCLL had a significantly shorter overall survival (p=0.0397). There was a trend toward shorter..

National Repository of Grey Literature

The Francis Crick Institute

FPGA based architecture for DNA sequence comparison and database search

Author: Apostolos Dollas
Christos Kozanitis
Euripides Sotiriades
Publication venue
Publication date: 01/01/2006
Field of study

DNA sequence comparison is a computationally intensive problem, known widely since the competition for human DNA decryption. Database search for DNA sequence comparison is of great value to computational biologists. Several algorithms have been developed and implemented to solve this problem efficiently, but from a user base point of view the BLAST algorithm is the most widely used one. In this paper we present a new architecture for the BLAST algorithm. The new architecture was fully designed, placed and routed. The post place-and-route cycle-accurate simulation, accounting for the I/O, shows a better performance than a cluster of workstations running highly optimized code over identical datasets. The new architecture and detailed performance results are presented in this paper. 1

CiteSeerX

Crossref

Using Genome Query Language (GQL) to uncover genetic variation

Author: Andrew Heiberg
Christos Kozanitis
George Varghese
Publication venue
Publication date
Field of study

Motivation:With high throughput DNA sequencing costs dropping below $1,000 for human genomes, data storage, retrieval, and analysis are the major bottlenecks in biological studies. In order to address the large-data challenges, we advocate a clean separation between the evidence collection and the inference in variant calling. We define and implement a Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants. Results: We provide a number of cases to showcase the use of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in5-10 lines of high level code, and search large data sets (100GB) in minutes. We also demonstrate its complementarity with other variant calling tools. Popular variant calling tools can achieve one order of magnitude speed-up by using GQL to retrieve evidence. Finally, we show how GQL can be used to query and compare multiple data-sets. By separating the evidence and inference for variant calling, it frees all variant detection tools from the data intensive evidence collection, and focus on statistical inference. Availability: GQL can be downloaded fro

CiteSeerX

VineTalk: Simplifying Software Access and Sharing of FPGAs in Datacenters

Author: Angelos Bilas
Christi Symeonidou
Christoforos Kachris
Christos Kozanitis
Dimitrios Soudris
Ioannis Stamoulias
Manolis Pavlidakis
Nikolaos Chrysos
Stelios Mavridis
Publication venue
Publication date: 05/09/2017
Field of study

FPGA-based accelerators are becoming a first class citizen in data centers. Adding FPGAs in data centers can lead to higher compute densities with improved energy efficiency for latency critical workloads, such as financial applications. However deployment of FPGAs in datacenters is hindered, as both developers and cloud providers face difficulties. Application writers need to deal with FPGA interfacing as well as application logic/algorithms. On the other hand, cloud providers are reluctant to deploy FPGAs in large scale, due to the FPGAs lack of sharing that results in reduced utilization and questionable ROI. In this paper, we introduce VineTalk, a framework that reduces the programming effort associated with FPGA-based accelerators and FPGA virtualization. We integrate VineTalk with the Xilinx SDAccel development framework and we map it to the Kintex UltraScale FPGA. Our preliminary evaluation with a use-case of financial applications shows that VineTalk can offer effective FPGA sharing introducing less than 4% overhead to application execution time

Crossref

ZENODO