Search CORE

4,126 research outputs found

Generalised Pattern Matching Revisited

Author: Dudek Bart?omiej
Gawrychowski Pawe?
Starikovskaya Tatiana
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Publication date: 01/01/2020
Field of study

In the problem of

\texttt{Generalised Pattern Matching}\ (\texttt{GPM})

[STOC'94, Muthukrishnan and Palem], we are given a text

T

of length

n

over an alphabet

\Sigma_T

, a pattern

P

of length

m

over an alphabet

\Sigma_P

, and a matching relationship

\subseteq \Sigma_T \times \Sigma_P

, and must return all substrings of

T

that match

P

(reporting) or the number of mismatches between each substring of

T

of length

m

and

P

(counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: *

\mathcal{D}\,

being the maximum number of characters that match a fixed character, *

\mathcal{S}\,

being the number of pairs of matching characters, *

\mathcal{I}\,

being the total number of disjoint intervals of characters that match the

m

characters of the pattern

P

. At the heart of our new deterministic upper bounds for

\mathcal{D}\,

and

\mathcal{S}\,

lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for

\texttt{GPM}

. We start by showing that any deterministic or Monte Carlo algorithm for

\texttt{GPM}

must use

\Omega(\mathcal{S})

time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Superselectors: Efficient Constructions and Applications

Author: A. Bonis De
A. Bonis De
A.G. D’yachkov
A.G. D’yachkov
B.S. Chlebus
D. Eppstein
D.J. Balding
D.Z. Du
E. Porat
G. Cormode
J. Wolf
M. Mitzenmacher
N. Alon
N. Alon
N. Linial
P. Erdös
Piotr Indyk
R. Clifford
R. Kumar
S. Ganguly
T. Cover
T. Moran
V. Grebinsky
W.H. Kautz
Y. Cheng
Y. Cheng
Publication venue
Publication date: 01/01/2010
Field of study

We introduce a new combinatorial structure: the superselector. We show that superselectors subsume several important combinatorial structures used in the past few years to solve problems in group testing, compressed sensing, multi-channel conflict resolution and data security. We prove close upper and lower bounds on the size of superselectors and we provide efficient algorithms for their constructions. Albeit our bounds are very general, when they are instantiated on the combinatorial structures that are particular cases of superselectors (e.g., (p,k,n)-selectors, (d,\ell)-list-disjunct matrices, MUT_k(r)-families, FUT(k, a)-families, etc.) they match the best known bounds in terms of size of the structures (the relevant parameter in the applications). For appropriate values of parameters, our results also provide the first efficient deterministic algorithms for the construction of such structures

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio della Ricerca - Università di Salerno

Noise-Resilient Group Testing: Limitations and Constructions

Author: A. Bonis De
A. Dyachkov
A. Macula
A. Ta-Shma
A.G. D’yachkov
B. Chlebus
D. Eppstein
D.-Z. Du
D.Z. Du
E. Knill
L. Trevisan
R. Raz
Ruszinkó
T. Berger
W. Kautz
Y. Cheng
Z. Füredi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We study combinatorial group testing schemes for learning

d

-sparse Boolean vectors using highly unreliable disjunctive measurements. We consider an adversarial noise model that only limits the number of false observations, and show that any noise-resilient scheme in this model can only approximately reconstruct the sparse vector. On the positive side, we take this barrier to our advantage and show that approximate reconstruction (within a satisfactory degree of approximation) allows us to break the information theoretic lower bound of

\tilde{\Omega}(d^2 \log n)

that is known for exact reconstruction of

d

-sparse vectors of length

n

via non-adaptive measurements, by a multiplicative factor

\tilde{\Omega}(d)

. Specifically, we give simple randomized constructions of non-adaptive measurement schemes, with

m=O(d \log n)

measurements, that allow efficient reconstruction of

d

-sparse vectors up to

O(d)

false positives even in the presence of

\delta m

false positives and

O(m/d)

false negatives within the measurement outcomes, for any constant

\delta < 1

. We show that, information theoretically, none of these parameters can be substantially improved without dramatically affecting the others. Furthermore, we obtain several explicit constructions, in particular one matching the randomized trade-off but using

m = O(d^{1+o(1)} \log n)

measurements. We also obtain explicit constructions that allow fast reconstruction in time \poly(m), which would be sublinear in

n

for sufficiently sparse vectors. The main tool used in our construction is the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the same title) in proceedings of the 17th International Symposium on Fundamentals of Computation Theory (FCT 2009

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

From patterned response dependency to structured covariate dependency: categorical-pattern-matching

Author: Fushing Hsieh
Hsieh Yin-Chen
Liu Shan-Yu
McCowan Brenda
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/05/2017
Field of study

Data generated from a system of interest typically consists of measurements from an ensemble of subjects across multiple response and covariate features, and is naturally represented by one response-matrix against one covariate-matrix. Likely each of these two matrices simultaneously embraces heterogeneous data types: continuous, discrete and categorical. Here a matrix is used as a practical platform to ideally keep hidden dependency among/between subjects and features intact on its lattice. Response and covariate dependency is individually computed and expressed through mutliscale blocks via a newly developed computing paradigm named Data Mechanics. We propose a categorical pattern matching approach to establish causal linkages in a form of information flows from patterned response dependency to structured covariate dependency. The strength of an information flow is evaluated by applying the combinatorial information theory. This unified platform for system knowledge discovery is illustrated through five data sets. In each illustrative case, an information flow is demonstrated as an organization of discovered knowledge loci via emergent visible and readable heterogeneity. This unified approach fundamentally resolves many long standing issues, including statistical modeling, multiple response, renormalization and feature selections, in data analysis, but without involving man-made structures and distribution assumptions. The results reported here enhance the idea that linking patterns of response dependency to structures of covariate dependency is the true philosophical foundation underlying data-driven computing and learning in sciences.Comment: 32 pages, 10 figures, 3 box picture

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

FigShare

A single-photon sampling architecture for solid-state imaging

Author: C. Levin
C. Sing-Long
Cheng
E. Candes
E. van den Berg
G. Chinn
P. D. Olcott
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2012
Field of study

Advances in solid-state technology have enabled the development of silicon photomultiplier sensor arrays capable of sensing individual photons. Combined with high-frequency time-to-digital converters (TDCs), this technology opens up the prospect of sensors capable of recording with high accuracy both the time and location of each detected photon. Such a capability could lead to significant improvements in imaging accuracy, especially for applications operating with low photon fluxes such as LiDAR and positron emission tomography. The demands placed on on-chip readout circuitry imposes stringent trade-offs between fill factor and spatio-temporal resolution, causing many contemporary designs to severely underutilize the technology's full potential. Concentrating on the low photon flux setting, this paper leverages results from group testing and proposes an architecture for a highly efficient readout of pixels using only a small number of TDCs, thereby also reducing both cost and power consumption. The design relies on a multiplexing technique based on binary interconnection matrices. We provide optimized instances of these matrices for various sensor parameters and give explicit upper and lower bounds on the number of TDCs required to uniquely decode a given maximum number of simultaneous photon arrivals. To illustrate the strength of the proposed architecture, we note a typical digitization result of a 120x120 photodiode sensor on a 30um x 30um pitch with a 40ps time resolution and an estimated fill factor of approximately 70%, using only 161 TDCs. The design guarantees registration and unique recovery of up to 4 simultaneous photon arrivals using a fast decoding algorithm. In a series of realistic simulations of scintillation events in clinical positron emission tomography the design was able to recover the spatio-temporal location of 98.6% of all photons that caused pixel firings.Comment: 24 pages, 3 figures, 5 table

arXiv.org e-Print Archive

CiteSeerX

Crossref