1,774 research outputs found

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    Personalization by Partial Evaluation.

    Get PDF
    The central contribution of this paper is to model personalization by the programmatic notion of partial evaluation.Partial evaluation is a technique used to automatically specialize programs, given incomplete information about their input.The methodology presented here models a collection of information resources as a program (which abstracts the underlying schema of organization and flow of information),partially evaluates the program with respect to user input,and recreates a personalized site from the specialized program.This enables a customizable methodology called PIPE that supports the automatic specialization of resources,without enumerating the interaction sequences beforehand .Issues relating to the scalability of PIPE,information integration,sessioniz-ling scenarios,and case studies are presented

    arules - A Computational Environment for Mining Association Rules and Frequent Item Sets

    Get PDF
    Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.

    Logic Programming Applications: What Are the Abstractions and Implementations?

    Full text link
    This article presents an overview of applications of logic programming, classifying them based on the abstractions and implementations of logic languages that support the applications. The three key abstractions are join, recursion, and constraint. Their essential implementations are for-loops, fixed points, and backtracking, respectively. The corresponding kinds of applications are database queries, inductive analysis, and combinatorial search, respectively. We also discuss language extensions and programming paradigms, summarize example application problems by application areas, and touch on example systems that support variants of the abstractions with different implementations

    A survey of temporal knowledge discovery paradigms and methods

    Get PDF
    With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining
    corecore