6,396,671 research outputs found
Context for Ubiquitous Data Management
In response to the advance of ubiquitous computing technologies, we believe that for computer systems to be ubiquitous, they must be context-aware. In this paper, we address the impact of context-awareness on ubiquitous data management. To do this, we overview different characteristics of context in order to develop a clear understanding of context, as well as its implications and requirements for context-aware data management. References to recent research activities and applicable techniques are also provided
CONTEXT-BASED AUTOSUGGEST ON GRAPH DATA
Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the search text box as the context to autosuggest their next incomplete keyword. This context-based approach uses the relationships between entities in the graph data that the user is searching on and therefore would provide more meaningful suggestions
Simple data-driven context-sensitive lemmatization
Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the
input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages
Fast Autocorrelated Context Models for Data Compression
A method is presented to automatically generate context models of data by
calculating the data's autocorrelation function. The largest values of the
autocorrelation function occur at the offsets or lags in the bitstream which
tend to be the most highly correlated to any particular location. These offsets
are ideal for use in predictive coding, such as predictive partial match (PPM)
or context-mixing algorithms for data compression, making such algorithms more
efficient and more general by reducing or eliminating the need for ad-hoc
models based on particular types of data. Instead of using the definition of
the autocorrelation function, which considers the pairwise correlations of data
requiring O(n^2) time, the Weiner-Khinchin theorem is applied, quickly
obtaining the autocorrelation as the inverse Fast Fourier transform of the
data's power spectrum in O(n log n) time, making the technique practical for
the compression of large data objects. The method is shown to produce the
highest levels of performance obtained to date on a lossless image compression
benchmark.Comment: v2 includes bibliograph
Distributed Online Big Data Classification Using Context Information
Distributed, online data mining systems have emerged as a result of
applications requiring analysis of large amounts of correlated and
high-dimensional data produced by multiple distributed data sources. We propose
a distributed online data classification framework where data is gathered by
distributed data sources and processed by a heterogeneous set of distributed
learners which learn online, at run-time, how to classify the different data
streams either by using their locally available classification functions or by
helping each other by classifying each other's data. Importantly, since the
data is gathered at different locations, sending the data to another learner to
process incurs additional costs such as delays, and hence this will be only
beneficial if the benefits obtained from a better classification will exceed
the costs. We model the problem of joint classification by the distributed and
heterogeneous learners from multiple data sources as a distributed contextual
bandit problem where each data is characterized by a specific context. We
develop a distributed online learning algorithm for which we can prove
sublinear regret. Compared to prior work in distributed online data mining, our
work is the first to provide analytic regret results characterizing the
performance of the proposed algorithm
Learning cover context-free grammars from structural data
We consider the problem of learning an unknown context-free grammar when the
only knowledge available and of interest to the learner is about its structural
descriptions with depth at most The goal is to learn a cover
context-free grammar (CCFG) with respect to , that is, a CFG whose
structural descriptions with depth at most agree with those of the
unknown CFG. We propose an algorithm, called , that efficiently learns
a CCFG using two types of queries: structural equivalence and structural
membership. We show that runs in time polynomial in the number of
states of a minimal deterministic finite cover tree automaton (DCTA) with
respect to . This number is often much smaller than the number of states
of a minimum deterministic finite tree automaton for the structural
descriptions of the unknown grammar
- …
