Search CORE

5,899 research outputs found

Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket

Author: A. Ben-Aroya
A. Kirsch
A.M. Frieze
A.M. Frieze
D. Fotakis
E. Lehman
H.-S.P. Wong
J. Schmidt-Pruzan
L. Devroye
M. Dietzfelbinger
M. Karoński
P. Pavan
R. Bez
R. Pagh
S. Irani
Y. Arbitman
Y. Azar
Y.-H. Chang
Publication venue
Publication date: 01/01/2014
Field of study

We study wear-leveling techniques for cuckoo hashing, showing that it is possible to achieve a memory wear bound of

\log\log n+O(1)

after the insertion of

n

items into a table of size

Cn

for a suitable constant

C

using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically, showing that it significantly improves on the memory wear performance for classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on Experimental Algorithms (SEA 2014

arXiv.org e-Print Archive

Crossref

CloudTree: A Library to Extend Cloud Services for Trees

Author: Ji Yanqing
Scholer Jesse
Tian Yun
Xu Bojian
Publication venue
Publication date: 30/04/2015
Field of study

In this work, we propose a library that enables on a cloud the creation and management of tree data structures from a cloud client. As a proof of concept, we implement a new cloud service CloudTree. With CloudTree, users are able to organize big data into tree data structures of their choice that are physically stored in a cloud. We use caching, prefetching, and aggregation techniques in the design and implementation of CloudTree to enhance performance. We have implemented the services of Binary Search Trees (BST) and Prefix Trees as current members in CloudTree and have benchmarked their performance using the Amazon Cloud. The idea and techniques in the design and implementation of a BST and prefix tree is generic and thus can also be used for other types of trees such as B-tree, and other link-based data structures such as linked lists and graphs. Preliminary experimental results show that CloudTree is useful and efficient for various big data applications

arXiv.org e-Print Archive

Crossref

Recommended from our members

Comprehensive sequence-to-function mapping of cofactor-dependent RNA catalysis in the glmS ribozyme.

Author: Andreasson Johan OL
Block Steven M
Greenleaf William J
Savinov Andrew
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Massively parallel, quantitative measurements of biomolecular activity across sequence space can greatly expand our understanding of RNA sequence-function relationships. We report the development of an RNA-array assay to perform such measurements and its application to a model RNA: the core glmS ribozyme riboswitch, which performs a ligand-dependent self-cleavage reaction. We measure the cleavage rates for all possible single and double mutants of this ribozyme across a series of ligand concentrations, determining kcat and KM values for active variants. These systematic measurements suggest that evolutionary conservation in the consensus sequence is driven by maintenance of the cleavage rate. Analysis of double-mutant rates and associated mutational interactions produces a structural and functional mapping of the ribozyme sequence, revealing the catalytic consequences of specific tertiary interactions, and allowing us to infer structural rearrangements that permit certain sequence variants to maintain activity

eScholarship - University of California

Whole-genome analysis of Fusarium graminearum insertional mutants identifies virulence associated genes and unmasks untagged chromosomal deletions

Author: Hammond-Kosack K. E.
Hassani-Pak K.
King R.
Urban M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Identifying pathogen virulence genes required to cause disease is crucial to understand the mechanisms underlying the pathogenic process. Plasmid insertion mutagenesis of fungal protoplasts is frequently used for this purpose in filamentous ascomycetes. Post transformation, the mutant population is screened for loss of virulence to a specific plant or animal host. Identifying the insertion event has previously met with varying degrees of success, from a cleanly disrupted gene with minimal deletion of nucleotides at the insertion point to multiple-copy insertion events and large deletions of chromosomal regions. Currently, extensive mutant collections exist in laboratories globally where it was hitherto impossible to identify all the affected genes. RESULTS: We used a whole-genome sequencing (WGS) approach using Illumina HiSeq 2000 technology to investigate DNA tag insertion points and chromosomal deletion events in mutagenised, reduced virulence F. graminearum isolates identified in disease tests on wheat (Triticum aestivum). We developed the FindInsertSeq workflow to localise the DNA tag insertions to the nucleotide level. The workflow was tested using four mutants showing evidence of single and multi-copy insertions in DNA blot analysis. FindInsertSeq was able to identify both single and multi-copy concatenation insertion sites. By comparing sequencing coverage, unexpected molecular recombination events such as large tagged and untagged chromosomal deletions, and DNA amplification were observed in three of the analysed mutants. A random data sampling approach revealed the minimum genome coverage required to survey the F. graminearum genome for alterations. CONCLUSIONS: This study demonstrates that whole-genome re-sequencing to 22x fold genome coverage is an efficient tool to characterise single and multi-copy insertion mutants in the filamentous ascomycete Fusarium graminearum. In some cases insertion events are accompanied with large untagged chromosomal deletions while in other cases a straight-forward insertion event could be confirmed. The FindInsertSeq analysis workflow presented in this study enables researchers to efficiently characterise insertion and deletion mutants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1412-9) contains supplementary material, which is available to authorized users

Springer - Publisher Connector

PubMed Central

Rothamsted Repository

QPath: a method for querying pathways in a protein-protein interaction network

Author: Ruppin Eytan
Segal Daniel
Sharan Roded
Shlomi Tomer
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Sequence comparison is one of the most prominent tools in biological research, and is instrumental in studying gene function and evolution. The rapid development of high-throughput technologies for measuring protein interactions calls for extending this fundamental operation to the level of pathways in protein networks. RESULTS: We present a comprehensive framework for protein network searches using pathway queries. Given a linear query pathway and a network of interest, our algorithm, QPath, efficiently searches the network for homologous pathways, allowing both insertions and deletions of proteins in the identified pathways. Matched pathways are automatically scored according to their variation from the query pathway in terms of the protein insertions and deletions they employ, the sequence similarity of their constituent proteins to the query proteins, and the reliability of their constituent interactions. We applied QPath to systematically infer protein pathways in fly using an extensive collection of 271 putative pathways from yeast. QPath identified 69 conserved pathways whose members were both functionally enriched and coherently expressed. The resulting pathways tended to preserve the function of the original query pathways, allowing us to derive a first annotated map of conserved protein pathways in fly. CONCLUSION: Pathway homology searches using QPath provide a powerful approach for identifying biologically significant pathways and inferring their function. The growing amounts of protein interactions in public databases underscore the importance of our network querying framework for mining protein network data

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing

Author: André Gilles
AR Quinlan
Christopher W. Wheat
D Hamilton
DA Hahn
Emese Meglécz
F Saeed
IW Saunders
J. C. Dohm
Jean-François Martin
JM Aury
JS Reis-Filho
KJ Hoff
KM Wegner
M Lynch
M Lynch
M Margulies
MA Larkin
Maxime Galan
Nicolas Pech
P McCullagh
PJ Campbell
SF Altschul
SM Huse
Steve Hoffmann
Stéphanie Ferreira
Susan M Huse
Sverker Lundin
Thibaut Malausa
V Kunin
W Babik
XiaoGuang Zhou
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse <it>et al. </it>in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments. Results We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables. Conclusions The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.</p

Crossref

Springer - Publisher Connector

HAL AMU

Directory of Open Access Journals

Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome

Author: Pereira Vini
Publication venue: BioMed Central
Publication date: 29/09/2004
Field of study

BACKGROUND: Genome evolution and size variation in multicellular organisms are profoundly influenced by the activity of retrotransposons. In higher eukaryotes with compact genomes retrotransposons are found in lower copy numbers than in larger genomes, which could be due to either suppression of transposition or to elimination of insertions, and are non-randomly distributed along the chromosomes. The evolutionary mechanisms constraining retrotransposon copy number and chromosomal distribution are still poorly understood. RESULTS: I investigated the evolutionary dynamics of long terminal repeat (LTR)-retrotransposons in the compact Arabidopsis thaliana genome, using an automated method for obtaining genome-wide, age and physical distribution profiles for different groups of elements, and then comparing the distributions of young and old insertions. Elements of the Pseudoviridae family insert randomly along the chromosomes and have been recently active, but insertions tend to be lost from euchromatic regions where they are less likely to fix, with a half-life estimated at approximately 470,000 years. In contrast, members of the Metaviridae (particularly Athila) preferentially target heterochromatin, and were more active in the past. CONCLUSION: Diverse evolutionary mechanisms have constrained both the copy number and chromosomal distribution of retrotransposons within a single genome. In A. thaliana, their non-random genomic distribution is due to both selection against insertions in euchromatin and preferential targeting of heterochromatin. Constant turnover of euchromatic insertions and a decline in activity for the elements that target heterochromatin have both limited the contribution of retrotransposon DNA to genome size expansion in A. thaliana

Springer - Publisher Connector

PubMed Central

A practical guide to molecular docking and homology modelling for medicinal chemists

Author: Levonis Stephan M
Lohning Anna E
Schweiker Stephanie S
Williams-Noonan Billy
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 30/01/2017
Field of study

Bond University Research Portal