Search CORE

95,551 research outputs found

Pattern Matching in Multiple Streams

Author: A. Amir
D. Breslauer
F. Ergun
G.M. Landau
G.M. Landau
H. Karloff
K. Abrahamson
M. Ružić
R. Clifford
R. Clifford
R. Clifford
R. Clifford
R. Clifford
T.S. Jayram
Z. Galil
Publication venue
Publication date: 01/01/2012
Field of study

We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Author: Golan Shay
Kopelowitz Tsvi
Porat Ely
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average

Dagstuhl Research Online Publication Server

SNIF TOOL - Sniffing for Patterns in Continuous Streams

Author: MUKHERJI ABHISHEK
Publication venue: Digital WPI
Publication date: 11/02/2008
Field of study

Recent technological advances in sensor networks and mobile devices give rise to new challenges in processing of live streams. In particular, time-series sequence matching, namely, the similarity matching of live streams against a set of predefined pattern sequence queries, is an important technology for a broad range of domains that include monitoring the spread of hazardous waste and administering network traffic. In this thesis, I use the time critical application of monitoring of fire growth in an intelligent building as my motivating example. Various measures and algorithms have been established in the current literature for similarity of static time-series data. Matching continuous data poses the following new challenges: 1) fluctuations in stream characteristics, 2) real-time requirements of the application, 3) limited system resources, and, 4) noisy data. Thus the matching techniques proposed for static time-series are mostly not applicable for live stream matching. In this thesis, I propose a new generic framework, henceforth referred to as the n-Snippet Indices Framework (in short, SNIF), for discovering the similarity between a live stream and pattern sequences. The framework is composed of two key phases: (1.) Off-line preprocessing phase: where the pattern sequences are processed offline and stored into an approximate 2-level index structure; and (2.) On-line live stream matching phase: streaming time-series (or the live stream) is on-the-fly matched against the indexed pattern sequences. I introduce the concept of n-Snippets for numeric data as the unit for matching. The insight is to match small snippets of the live stream against prefixes of the patterns and maintain them in succession. Longer the pattern prefixes identified to be similar to the live stream, better the confirmation of the match. Thus, the live stream matching is performed in two levels of matching: bag matching for matching snippets and order checking for maintaining the lengths of the match. I propose four variations of matching algorithms that allow the user the capability to choose between the two conflicting characteristics of result accuracy versus response time. The effectiveness of SNIF to detect patterns has been thoroughly tested through extensive experimental evaluations using the continuous query engine CAPE as platform. The evaluations made use of real datasets from multiple domains, including fire monitoring, chlorine monitoring and sensor networks. Moreover, SNIF is demonstrated to be tolerant to noisy datasets

DigitalCommons@WPI

Accelerating Event Stream Processing in On- and Offline Systems

Author: Körber Michael
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2021
Field of study

Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Recommended from our members

The politics of college affordability : a multiple streams analysis of financial aid policymaking in Texas

Author: Drake Anna Peterson
Publication venue
Publication date: 30/01/2018
Field of study

This multiple case study (Yin, 2014) applies Kingdon’s (1984) multiple streams approach (MSA) to the examination of how politics in Texas shapes state financial aid policymaking. I analyzed three bills considered by the Texas Legislature between 2011 and 2015, each affecting a different type of student aid: a need-based grant, forgivable loan, and tuition exemption. The bills that comprise my cases each sought to alter or eliminate one of these programs. I conducted 50 stakeholder interviews, collected 135 documentary sources, and used pattern matching (Yin, 2014) and constant comparison (Merriam, 2009) data analysis techniques. The purpose of my study was to understand how politics shaped the development and outcome of each bill, and consequently the financial assistance available to help Texas students afford higher education. Findings revealed ambiguity and political opportunism surrounding the issue of college affordability; the influence of conservative political culture, partisanship, electoral politics, leadership changes, and policy entrepreneurs; and the use and misuse of data to shape policy. In conclusion, I offer implications for policy and practice and present an adapted multiple streams model of college affordability policymaking.Educational Administratio

Texas ScholarWorks

ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

Author: Cohen Michael
Govindarajan Krishna
Grossberg Stephen
Wyse Lonce
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/06/2003
Field of study

Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

Boston University Institutional Repository (OpenBU)

ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

Author: Grossberg Stephen
Govindarajan Krishna
Wyse Lonce
Cohen Michael
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/01/1997
Field of study

Boston University Institutional Repository (OpenBU)