2,553 research outputs found
Compressed Genotyping
Significant volumes of knowledge have been accumulated in recent years
linking subtle genetic variations to a wide variety of medical disorders from
Cystic Fibrosis to mental retardation. Nevertheless, there are still great
challenges in applying this knowledge routinely in the clinic, largely due to
the relatively tedious and expensive process of DNA sequencing. Since the
genetic polymorphisms that underlie these disorders are relatively rare in the
human population, the presence or absence of a disease-linked polymorphism can
be thought of as a sparse signal. Using methods and ideas from compressed
sensing and group testing, we have developed a cost-effective genotyping
protocol. In particular, we have adapted our scheme to a recently developed
class of high throughput DNA sequencing technologies, and assembled a
mathematical framework that has some important distinctions from 'traditional'
compressed sensing ideas in order to address different biological and technical
constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue
on Molecular Biology and Neuroscienc
A single-photon sampling architecture for solid-state imaging
Advances in solid-state technology have enabled the development of silicon
photomultiplier sensor arrays capable of sensing individual photons. Combined
with high-frequency time-to-digital converters (TDCs), this technology opens up
the prospect of sensors capable of recording with high accuracy both the time
and location of each detected photon. Such a capability could lead to
significant improvements in imaging accuracy, especially for applications
operating with low photon fluxes such as LiDAR and positron emission
tomography.
The demands placed on on-chip readout circuitry imposes stringent trade-offs
between fill factor and spatio-temporal resolution, causing many contemporary
designs to severely underutilize the technology's full potential. Concentrating
on the low photon flux setting, this paper leverages results from group testing
and proposes an architecture for a highly efficient readout of pixels using
only a small number of TDCs, thereby also reducing both cost and power
consumption. The design relies on a multiplexing technique based on binary
interconnection matrices. We provide optimized instances of these matrices for
various sensor parameters and give explicit upper and lower bounds on the
number of TDCs required to uniquely decode a given maximum number of
simultaneous photon arrivals.
To illustrate the strength of the proposed architecture, we note a typical
digitization result of a 120x120 photodiode sensor on a 30um x 30um pitch with
a 40ps time resolution and an estimated fill factor of approximately 70%, using
only 161 TDCs. The design guarantees registration and unique recovery of up to
4 simultaneous photon arrivals using a fast decoding algorithm. In a series of
realistic simulations of scintillation events in clinical positron emission
tomography the design was able to recover the spatio-temporal location of 98.6%
of all photons that caused pixel firings.Comment: 24 pages, 3 figures, 5 table
Design of Geometric Molecular Bonds
An example of a nonspecific molecular bond is the affinity of any positive
charge for any negative charge (like-unlike), or of nonpolar material for
itself when in aqueous solution (like-like). This contrasts specific bonds such
as the affinity of the DNA base A for T, but not for C, G, or another A. Recent
experimental breakthroughs in DNA nanotechnology demonstrate that a particular
nonspecific like-like bond ("blunt-end DNA stacking" that occurs between the
ends of any pair of DNA double-helices) can be used to create specific
"macrobonds" by careful geometric arrangement of many nonspecific blunt ends,
motivating the need for sets of macrobonds that are orthogonal: two macrobonds
not intended to bind should have relatively low binding strength, even when
misaligned.
To address this need, we introduce geometric orthogonal codes that abstractly
model the engineered DNA macrobonds as two-dimensional binary codewords. While
motivated by completely different applications, geometric orthogonal codes
share similar features to the optical orthogonal codes studied by Chung,
Salehi, and Wei. The main technical difference is the importance of 2D geometry
in defining codeword orthogonality.Comment: Accepted to appear in IEEE Transactions on Molecular, Biological, and
Multi-Scale Communication
RLZAP: Relative Lempel-Ziv with Adaptive Pointers
Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of
genomes from individuals of the same species when fast random access is
desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a
reference genome is selected and then the other genomes are greedily parsed
into phrases exactly matching substrings of the reference. Deorowicz and
Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with
a mismatch character usually gives better compression because many of the
differences between individuals' genomes are single-nucleotide substitutions.
Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers
and run-length compressing them usually gives even better compression. In this
paper we generalize Ferrada et al.'s idea to handle well also short insertions,
deletions and multi-character substitutions. We show experimentally that our
generalization achieves better compression than Ferrada et al.'s implementation
with comparable random-access times
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Partial DNA Assembly: A Rate-Distortion Perspective
Earlier formulations of the DNA assembly problem were all in the context of
perfect assembly; i.e., given a set of reads from a long genome sequence, is it
possible to perfectly reconstruct the original sequence? In practice, however,
it is very often the case that the read data is not sufficiently rich to permit
unambiguous reconstruction of the original sequence. While a natural
generalization of the perfect assembly formulation to these cases would be to
consider a rate-distortion framework, partial assemblies are usually
represented in terms of an assembly graph, making the definition of a
distortion measure challenging. In this work, we introduce a distortion
function for assembly graphs that can be understood as the logarithm of the
number of Eulerian cycles in the assembly graph, each of which correspond to a
candidate assembly that could have generated the observed reads. We also
introduce an algorithm for the construction of an assembly graph and analyze
its performance on real genomes.Comment: To be published at ISIT-2016. 11 pages, 10 figure
- …