1,703 research outputs found
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
Motivation
The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for
compression and indexing of text data, but the cost of computing the BWT of
very large string collections has prevented these techniques from being widely
applied to the large sets of sequences often encountered as the outcome of DNA
sequencing experiments. In previous work, we presented a novel algorithm that
allows the BWT of human genome scale data to be computed on very moderate
hardware, thus enabling us to investigate the BWT as a tool for the compression
of such datasets.
Results
We first used simulated reads to explore the relationship between the level
of compression and the error rate, the length of the reads and the level of
sampling of the underlying genome and compare choices of second-stage
compression algorithm.
We demonstrate that compression may be greatly improved by a particular
reordering of the sequences in the collection and give a novel `implicit
sorting' strategy that enables these benefits to be realised without the
overhead of sorting the reads. With these techniques, a 45x coverage of real
human genome sequence data compresses losslessly to under 0.5 bits per base,
allowing the 135.3Gbp of sequence to fit into only 8.2Gbytes of space (trimming
a small proportion of low-quality bases from the reads improves the compression
still further).
This is more than 4 times smaller than the size achieved by a standard
BWT-based compressor (bzip2) on the untrimmed reads, but an important further
advantage of our approach is that it facilitates the building of compressed
full text indexes such as the FM-index on large-scale DNA sequence collections.Comment: Version here is as submitted to Bioinformatics and is same as the
previously archived version. This submission registers the fact that the
advanced access version is now available at
http://bioinformatics.oxfordjournals.org/content/early/2012/05/02/bioinformatics.bts173.abstract
. Bioinformatics should be considered as the original place of publication of
this article, please cite accordingl
A Rapid Cloning Method Employing Orthogonal End Protection
We describe a novel in vitro cloning strategy that combines standard tools in molecular biology with a basic protecting group concept to create a versatile framework for the rapid and seamless assembly of modular DNA building blocks into functional open reading frames. Analogous to chemical synthesis strategies, our assembly design yields idempotent composite synthons amenable to iterative and recursive split-and-pool reaction cycles. As an example, we illustrate the simplicity, versatility and efficiency of the approach by constructing an open reading frame composed of tandem arrays of a human fibronectin type III (FNIII) domain and the von Willebrand Factor A2 domain (VWFA2), as well as chimeric (FNIII)n-VWFA2-(FNIII)n constructs. Although we primarily designed this strategy to accelerate assembly of repetitive constructs for single-molecule force spectroscopy, we anticipate that this approach is equally applicable to the reconstitution and modification of complex modular sequences including structural and functional analysis of multi-domain proteins, synthetic biology or the modular construction of episomal vectors
Towards Communication-Efficient Quantum Oblivious Key Distribution
Oblivious Transfer, a fundamental problem in the field of secure multi-party
computation is defined as follows: A database DB of N bits held by Bob is
queried by a user Alice who is interested in the bit DB_b in such a way that
(1) Alice learns DB_b and only DB_b and (2) Bob does not learn anything about
Alice's choice b. While solutions to this problem in the classical domain rely
largely on unproven computational complexity theoretic assumptions, it is also
known that perfect solutions that guarantee both database and user privacy are
impossible in the quantum domain. Jakobi et al. [Phys. Rev. A, 83(2), 022301,
Feb 2011] proposed a protocol for Oblivious Transfer using well known QKD
techniques to establish an Oblivious Key to solve this problem. Their solution
provided a good degree of database and user privacy (using physical principles
like impossibility of perfectly distinguishing non-orthogonal quantum states
and the impossibility of superluminal communication) while being loss-resistant
and implementable with commercial QKD devices (due to the use of SARG04).
However, their Quantum Oblivious Key Distribution (QOKD) protocol requires a
communication complexity of O(N log N). Since modern databases can be extremely
large, it is important to reduce this communication as much as possible. In
this paper, we first suggest a modification of their protocol wherein the
number of qubits that need to be exchanged is reduced to O(N). A subsequent
generalization reduces the quantum communication complexity even further in
such a way that only a few hundred qubits are needed to be transferred even for
very large databases.Comment: 7 page
Weak and strong electronic correlations in Fe superconductors
In this chapter the strength of electronic correlations in the normal phase
of Fe-superconductors is discussed. It will be shown that the agreement between
a wealth of experiments and DFT+DMFT or similar approaches supports a scenario
in which strongly-correlated and weakly-correlated electrons coexist in the
conduction bands of these materials. I will then reverse-engineer the realistic
calculations and justify this scenario in terms of simpler behaviors easily
interpreted through model results. All pieces come together to show that Hund's
coupling, besides being responsible for the electronic correlations even in
absence of a strong Coulomb repulsion is also the origin of a subtle emergent
behavior: orbital decoupling. Indeed Hund's exchange decouples the charge
excitations in the different Iron orbitals involved in the conduction bands
thus causing an independent tuning of the degree of electronic correlation in
each one of them. The latter becomes sensitive almost only to the offset of the
orbital population from half-filling, where a Mott insulating state is
invariably realized at these interaction strengths. Depending on the difference
in orbital population a different 'Mottness' affects each orbital, and thus
reflects in the conduction bands and in the Fermi surfaces depending on the
orbital content.Comment: Book Chapte
Practical private database queries based on a quantum key distribution protocol
Private queries allow a user Alice to learn an element of a database held by
a provider Bob without revealing which element she was interested in, while
limiting her information about the other elements. We propose to implement
private queries based on a quantum key distribution protocol, with changes only
in the classical post-processing of the key. This approach makes our scheme
both easy to implement and loss-tolerant. While unconditionally secure private
queries are known to be impossible, we argue that an interesting degree of
security can be achieved, relying on fundamental physical principles instead of
unverifiable security assumptions in order to protect both user and database.
We think that there is scope for such practical private queries to become
another remarkable application of quantum information in the footsteps of
quantum key distribution.Comment: 7 pages, 2 figures, new and improved version, clarified claims,
expanded security discussio
Competition of crystal field splitting and Hund's rule coupling in two-orbital magnetic metal-insulator transitions
Competition of crystal field splitting and Hund's rule coupling in magnetic
metal-insulator transitions of half-filled two-orbital Hubbard model is
investigated by multi-orbital slave-boson mean field theory. We show that with
the increase of Coulomb correlation, the system firstly transits from a
paramagnetic (PM) metal to a {\it N\'{e}el} antiferromagnetic (AFM) Mott
insulator, or a nonmagnetic orbital insulator, depending on the competition of
crystal field splitting and the Hund's rule coupling. The different AFM Mott
insulator, PM metal and orbital insulating phase are none, partially and fully
orbital polarized, respectively. For a small and a finite crystal
field, the orbital insulator is robust. Although the system is nonmagnetic, the
phase boundary of the orbital insulator transition obviously shifts to the
small regime after the magnetic correlations is taken into account. These
results demonstrate that large crystal field splitting favors the formation of
the orbital insulating phase, while large Hund's rule coupling tends to destroy
it, driving the low-spin to high-spin transition.Comment: 4 pages, 4 figure
The Magic Number Problem for Subregular Language Families
We investigate the magic number problem, that is, the question whether there
exists a minimal n-state nondeterministic finite automaton (NFA) whose
equivalent minimal deterministic finite automaton (DFA) has alpha states, for
all n and alpha satisfying n less or equal to alpha less or equal to exp(2,n).
A number alpha not satisfying this condition is called a magic number (for n).
It was shown in [11] that no magic numbers exist for general regular languages,
while in [5] trivial and non-trivial magic numbers for unary regular languages
were identified. We obtain similar results for automata accepting subregular
languages like, for example, combinational languages, star-free, prefix-,
suffix-, and infix-closed languages, and prefix-, suffix-, and infix-free
languages, showing that there are only trivial magic numbers, when they exist.
For finite languages we obtain some partial results showing that certain
numbers are non-magic.Comment: In Proceedings DCFS 2010, arXiv:1008.127
Experimental validation of 4D log file-based proton dose reconstruction for interplay assessment considering amplitude-sorted 4DCTs
Purpose The unpredictable interplay between dynamic proton therapy delivery and target motion in the thorax can lead to severe dose distortions. A fraction-wise four-dimensional (4D) dose reconstruction workflow allows for the assessment of the applied dose after patient treatment while considering the actual beam delivery sequence extracted from machine log files, the recorded breathing pattern and the geometric information from a 4D computed tomography scan (4DCT). Such an algorithm capable of accounting for amplitude-sorted 4DCTs was implemented and its accuracy as well as its sensitivity to input parameter variations was experimentally evaluated. Methods An anthropomorphic thorax phantom with a movable insert containing a target surrogate and a radiochromic film was irradiated with a monoenergetic field for various 1D target motion forms (sin, sin(4)) and peak-to-peak amplitudes (5/10/15/20/30 mm). The measured characteristic film dose distributions were compared to the respective sections in the 4D reconstructed doses using a 2D gamma-analysis (3 mm, 3%); gamma-pass rates were derived for different dose grid resolutions (1 mm/3 mm) and deformable image registrations (DIR, automatic/manual) applied during the 4D dose reconstruction process. In an additional analysis, the sensitivity of reconstructed dose distributions against potential asynchronous timing of the motion and machine log files was investigated for both a monoenergetic field and more realistic 4D robustly optimized fields by artificially introduced offsets of +/- 1/5/25/50/250 ms. The resulting dose distributions with asynchronized log files were compared to those with synchronized log files by means of a 3D gamma-analysis (1 mm, 1%) and the evaluation of absolute dose differences. Results The induced characteristic interplay patterns on the films were well reproduced by the 4D dose reconstruction with 2D gamma-pass rates >= 95% for almost all cases with motion magnitude
- …