1,191 research outputs found
KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping [Extended Version]
The volume of time series data has exploded due to the popularity of new
applications, such as data center management and IoT. Subsequence matching is a
fundamental task in mining time series data. All index-based approaches only
consider raw subsequence matching (RSM) and do not support subsequence
normalization. UCR Suite can deal with normalized subsequence match problem
(NSM), but it needs to scan full time series. In this paper, we propose a novel
problem, named constrained normalized subsequence matching problem (cNSM),
which adds some constraints to NSM problem. The cNSM problem provides a knob to
flexibly control the degree of offset shifting and amplitude scaling, which
enables users to build the index to process the query. We propose a new index
structure, KV-index, and the matching algorithm, KV-match. With a single index,
our approach can support both RSM and cNSM problems under either ED or DTW
distance. KV-index is a key-value structure, which can be easily implemented on
local files or HBase tables. To support the query of arbitrary lengths, we
extend KV-match to KV-match, which utilizes multiple varied-length
indexes to process the query. We conduct extensive experiments on synthetic and
real-world datasets. The results verify the effectiveness and efficiency of our
approach.Comment: 13 page
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
- …