270 research outputs found
Efficient text fingerprinting via Parikh mapping
AbstractWe consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n|Σ|lognlog|Σ|) and enables answering the following queries: (1)Given an integer k, compute the number of distinct fingerprints of size k in time O(1).(2)Given a set φ⊆Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ|logn)
Normal, Abby Normal, Prefix Normal
A prefix normal word is a binary word with the property that no substring has
more 1s than the prefix of the same length. This class of words is important in
the context of binary jumbled pattern matching. In this paper we present
results about the number of prefix normal words of length , showing
that for some and
. We introduce efficient
algorithms for testing the prefix normal property and a "mechanical algorithm"
for computing prefix normal forms. We also include games which can be played
with prefix normal words. In these games Alice wishes to stay normal but Bob
wants to drive her "abnormal" -- we discuss which parameter settings allow
Alice to succeed.Comment: Accepted at FUN '1
Channel Charting: Locating Users within the Radio Environment using Channel State Information
We propose channel charting (CC), a novel framework in which a multi-antenna
network element learns a chart of the radio geometry in its surrounding area.
The channel chart captures the local spatial geometry of the area so that
points that are close in space will also be close in the channel chart and vice
versa. CC works in a fully unsupervised manner, i.e., learning is only based on
channel state information (CSI) that is passively collected at a single point
in space, but from multiple transmit locations in the area over time. The
method then extracts channel features that characterize large-scale fading
properties of the wireless channel. Finally, the channel charts are generated
with tools from dimensionality reduction, manifold learning, and deep neural
networks. The network element performing CC may be, for example, a
multi-antenna base-station in a cellular system and the charted area in the
served cell. Logical relationships related to the position and movement of a
transmitter, e.g., a user equipment (UE), in the cell can then be directly
deduced from comparing measured radio channel characteristics to the channel
chart. The unsupervised nature of CC enables a range of new applications in UE
localization, network planning, user scheduling, multipoint connectivity,
hand-over, cell search, user grouping, and other cognitive tasks that rely on
CSI and UE movement relative to the base-station, without the need of
information from global navigation satellite systems.Comment: To appear in IEEE Acces
Algorithms for Jumbled Pattern Matching in Strings
The Parikh vector p(s) of a string s is defined as the vector of
multiplicities of the characters. Parikh vector q occurs in s if s has a
substring t with p(t)=q. We present two novel algorithms for searching for a
query q in a text s. One solves the decision problem over a binary text in
constant time, using a linear size index of the text. The second algorithm, for
a general finite alphabet, finds all occurrences of a given Parikh vector q and
has sub-linear expected time complexity; we present two variants, which both
use a linear size index of the text.Comment: 18 pages, 9 figures; article accepted for publication in the
International Journal of Foundations of Computer Scienc
GrOVe: Ownership Verification of Graph Neural Networks using Embeddings
Graph neural networks (GNNs) have emerged as a state-of-the-art approach to
model and draw inferences from large scale graph-structured data in various
application settings such as social networking. The primary goal of a GNN is to
learn an embedding for each graph node in a dataset that encodes both the node
features and the local graph structure around the node. Embeddings generated by
a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs
are prone to model extraction attacks. Model extraction attacks and defenses
have been explored extensively in other non-graph settings. While detecting or
preventing model extraction appears to be difficult, deterring them via
effective ownership verification techniques offer a potential defense. In
non-graph settings, fingerprinting models, or the data used to build them, have
shown to be a promising approach toward ownership verification. We present
GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target
model and a suspect model, can reliably determine if the suspect model was
trained independently of the target model or if it is a surrogate of the target
model obtained via model extraction. We show that GrOVe can distinguish between
surrogate and independent models even when the independent model uses the same
training dataset and architecture as the original target model. Using six
benchmark datasets and three model architectures, we show that consistently
achieves low false-positive and false-negative rates. We demonstrate that is
robust against known fingerprint evasion techniques while remaining
computationally efficient.Comment: 11 pages, 5 figure
- …