4,126 research outputs found
Generalised Pattern Matching Revisited
In the problem of
[STOC'94, Muthukrishnan and Palem], we are given a text of length over
an alphabet , a pattern of length over an alphabet
, and a matching relationship ,
and must return all substrings of that match (reporting) or the number
of mismatches between each substring of of length and (counting).
In this work, we improve over all previously known algorithms for this problem
for various parameters describing the input instance:
* being the maximum number of characters that match a fixed
character,
* being the number of pairs of matching characters,
* being the total number of disjoint intervals of characters
that match the characters of the pattern .
At the heart of our new deterministic upper bounds for and
lies a faster construction of superimposed codes, which solves
an open problem posed in [FOCS'97, Indyk] and can be of independent interest.
To conclude, we demonstrate first lower bounds for . We start by
showing that any deterministic or Monte Carlo algorithm for must
use time, and then proceed to show higher lower bounds
for combinatorial algorithms. These bounds show that our algorithms are almost
optimal, unless a radically new approach is developed
Superselectors: Efficient Constructions and Applications
We introduce a new combinatorial structure: the superselector. We show that
superselectors subsume several important combinatorial structures used in the
past few years to solve problems in group testing, compressed sensing,
multi-channel conflict resolution and data security. We prove close upper and
lower bounds on the size of superselectors and we provide efficient algorithms
for their constructions. Albeit our bounds are very general, when they are
instantiated on the combinatorial structures that are particular cases of
superselectors (e.g., (p,k,n)-selectors, (d,\ell)-list-disjunct matrices,
MUT_k(r)-families, FUT(k, a)-families, etc.) they match the best known bounds
in terms of size of the structures (the relevant parameter in the
applications). For appropriate values of parameters, our results also provide
the first efficient deterministic algorithms for the construction of such
structures
Noise-Resilient Group Testing: Limitations and Constructions
We study combinatorial group testing schemes for learning -sparse Boolean
vectors using highly unreliable disjunctive measurements. We consider an
adversarial noise model that only limits the number of false observations, and
show that any noise-resilient scheme in this model can only approximately
reconstruct the sparse vector. On the positive side, we take this barrier to
our advantage and show that approximate reconstruction (within a satisfactory
degree of approximation) allows us to break the information theoretic lower
bound of that is known for exact reconstruction of
-sparse vectors of length via non-adaptive measurements, by a
multiplicative factor .
Specifically, we give simple randomized constructions of non-adaptive
measurement schemes, with measurements, that allow efficient
reconstruction of -sparse vectors up to false positives even in the
presence of false positives and false negatives within the
measurement outcomes, for any constant . We show that, information
theoretically, none of these parameters can be substantially improved without
dramatically affecting the others. Furthermore, we obtain several explicit
constructions, in particular one matching the randomized trade-off but using measurements. We also obtain explicit constructions
that allow fast reconstruction in time \poly(m), which would be sublinear in
for sufficiently sparse vectors. The main tool used in our construction is
the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the
same title) in proceedings of the 17th International Symposium on
Fundamentals of Computation Theory (FCT 2009
From patterned response dependency to structured covariate dependency: categorical-pattern-matching
Data generated from a system of interest typically consists of measurements
from an ensemble of subjects across multiple response and covariate features,
and is naturally represented by one response-matrix against one
covariate-matrix. Likely each of these two matrices simultaneously embraces
heterogeneous data types: continuous, discrete and categorical. Here a matrix
is used as a practical platform to ideally keep hidden dependency among/between
subjects and features intact on its lattice. Response and covariate dependency
is individually computed and expressed through mutliscale blocks via a newly
developed computing paradigm named Data Mechanics. We propose a categorical
pattern matching approach to establish causal linkages in a form of information
flows from patterned response dependency to structured covariate dependency.
The strength of an information flow is evaluated by applying the combinatorial
information theory. This unified platform for system knowledge discovery is
illustrated through five data sets. In each illustrative case, an information
flow is demonstrated as an organization of discovered knowledge loci via
emergent visible and readable heterogeneity. This unified approach
fundamentally resolves many long standing issues, including statistical
modeling, multiple response, renormalization and feature selections, in data
analysis, but without involving man-made structures and distribution
assumptions. The results reported here enhance the idea that linking patterns
of response dependency to structures of covariate dependency is the true
philosophical foundation underlying data-driven computing and learning in
sciences.Comment: 32 pages, 10 figures, 3 box picture
A single-photon sampling architecture for solid-state imaging
Advances in solid-state technology have enabled the development of silicon
photomultiplier sensor arrays capable of sensing individual photons. Combined
with high-frequency time-to-digital converters (TDCs), this technology opens up
the prospect of sensors capable of recording with high accuracy both the time
and location of each detected photon. Such a capability could lead to
significant improvements in imaging accuracy, especially for applications
operating with low photon fluxes such as LiDAR and positron emission
tomography.
The demands placed on on-chip readout circuitry imposes stringent trade-offs
between fill factor and spatio-temporal resolution, causing many contemporary
designs to severely underutilize the technology's full potential. Concentrating
on the low photon flux setting, this paper leverages results from group testing
and proposes an architecture for a highly efficient readout of pixels using
only a small number of TDCs, thereby also reducing both cost and power
consumption. The design relies on a multiplexing technique based on binary
interconnection matrices. We provide optimized instances of these matrices for
various sensor parameters and give explicit upper and lower bounds on the
number of TDCs required to uniquely decode a given maximum number of
simultaneous photon arrivals.
To illustrate the strength of the proposed architecture, we note a typical
digitization result of a 120x120 photodiode sensor on a 30um x 30um pitch with
a 40ps time resolution and an estimated fill factor of approximately 70%, using
only 161 TDCs. The design guarantees registration and unique recovery of up to
4 simultaneous photon arrivals using a fast decoding algorithm. In a series of
realistic simulations of scintillation events in clinical positron emission
tomography the design was able to recover the spatio-temporal location of 98.6%
of all photons that caused pixel firings.Comment: 24 pages, 3 figures, 5 table
- …