5,181 research outputs found

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval

    Get PDF
    In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Image mining: issues, frameworks and techniques

    Get PDF
    [Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in significantly large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. Despite the development of many applications and algorithms in the individual research fields cited above, research in image mining is still in its infancy. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining at the end of this paper

    Time Pressure and System Delays in Information Search

    Get PDF
    We report preliminary results of the impact of time pres- sure and system delays on search behavior from a laboratory study with forty-three participants. To induce time pres- sure, we randomly assigned half of our study participants to a treatment condition where they were only allowed five minutes to search for each of four ad-hoc search topics. The other half of the participants were given no task time limits. For half of participants’ search tasks (n=2), five second de- lays were introduced after queries were submitted and SERP results were clicked. Results showed that participants in the time pressure condition queried at a significantly higher rate, viewed significantly fewer documents per query, had significantly shallower hover and view depths, and spent sig- nificantly less time examining documents and SERPs. We found few significant differences in search behavior for sys- tem delay or interaction effects between time pressure and system delay. These initial results show time pressure has a significant impact on search behavior and suggest the de- sign of search interfaces and features that support people who are searching under time pressure

    Image mining: trends and developments

    Get PDF
    [Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining

    Web Search From a Bus

    Get PDF
    Opportunistic connections to the Internet from open wireless access points is now commonly possible in municipal areas. Vehicular networks can opportunistically connect to the internet for several seconds via open access points. In this paper, we adapt the interactive process of web search and retrieval to vehicular networks with intermittent Internet access. Our system, called Thedu has mobile nodes use an Internet proxy to collect search engine results and prefetch result pages. The mobiles nodes download the pre-fetched web pages from the proxy. Our contribution is a novel set of techniques to make aggressive but selective prefetching practical, resulting in a significantly greater number of relevant web results returned to mobile users. To evaluate our scheme, we deployed Thedu on DieselNet, our vehicular testbed of buses operating in a micro-urban area around Amherst, MA. Using a simulated workload, we find that users can expect four times as many useful responses to web search queries compared to not using Thedu. Moreover, the mean latency in receiving the first relevant response for a query is 2.7 minutes when deployed in a semi-urban region with a sparser distribution of APs compared to big cities
    • …
    corecore