6,396,671 research outputs found

    Context for Ubiquitous Data Management

    Get PDF
    In response to the advance of ubiquitous computing technologies, we believe that for computer systems to be ubiquitous, they must be context-aware. In this paper, we address the impact of context-awareness on ubiquitous data management. To do this, we overview different characteristics of context in order to develop a clear understanding of context, as well as its implications and requirements for context-aware data management. References to recent research activities and applicable techniques are also provided

    CONTEXT-BASED AUTOSUGGEST ON GRAPH DATA

    Get PDF
    Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the search text box as the context to autosuggest their next incomplete keyword. This context-based approach uses the relationships between entities in the graph data that the user is searching on and therefore would provide more meaningful suggestions

    Simple data-driven context-sensitive lemmatization

    Get PDF
    Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages

    Fast Autocorrelated Context Models for Data Compression

    Full text link
    A method is presented to automatically generate context models of data by calculating the data's autocorrelation function. The largest values of the autocorrelation function occur at the offsets or lags in the bitstream which tend to be the most highly correlated to any particular location. These offsets are ideal for use in predictive coding, such as predictive partial match (PPM) or context-mixing algorithms for data compression, making such algorithms more efficient and more general by reducing or eliminating the need for ad-hoc models based on particular types of data. Instead of using the definition of the autocorrelation function, which considers the pairwise correlations of data requiring O(n^2) time, the Weiner-Khinchin theorem is applied, quickly obtaining the autocorrelation as the inverse Fast Fourier transform of the data's power spectrum in O(n log n) time, making the technique practical for the compression of large data objects. The method is shown to produce the highest levels of performance obtained to date on a lossless image compression benchmark.Comment: v2 includes bibliograph

    Distributed Online Big Data Classification Using Context Information

    Full text link
    Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm

    Learning cover context-free grammars from structural data

    Get PDF
    We consider the problem of learning an unknown context-free grammar when the only knowledge available and of interest to the learner is about its structural descriptions with depth at most .\ell. The goal is to learn a cover context-free grammar (CCFG) with respect to \ell, that is, a CFG whose structural descriptions with depth at most \ell agree with those of the unknown CFG. We propose an algorithm, called LALA^\ell, that efficiently learns a CCFG using two types of queries: structural equivalence and structural membership. We show that LALA^\ell runs in time polynomial in the number of states of a minimal deterministic finite cover tree automaton (DCTA) with respect to \ell. This number is often much smaller than the number of states of a minimum deterministic finite tree automaton for the structural descriptions of the unknown grammar
    corecore