Search CORE

5,181 research outputs found

Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

Author: Dalton Jeff
Li Zhenghua
Lin Jimmy
Mishne Gilad
Sharma Aneesh
Publication venue
Publication date: 27/10/2012
Field of study

We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

arXiv.org e-Print Archive

CiteSeerX

A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval

Author: DeLuca Ernesto William
Nürnberger Andreas
Publication venue
Publication date: 18/04/2011
Field of study

In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail

University of Hildesheim

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology

Image mining: issues, frameworks and techniques

Author: Hsu Wynne
Lee Mong Li
Zhang Ji
Publication venue: 'University of Alberta'
Publication date: 01/01/2001
Field of study

[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in significantly large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. Despite the development of many applications and algorithms in the individual research fields cited above, research in image mining is still in its infancy. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining at the end of this paper

University of Southern Queensland ePrints

Time Pressure and System Delays in Information Search

Author: Azzopardi Leif
Crescenzi Anita
Kelly Diane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We report preliminary results of the impact of time pres- sure and system delays on search behavior from a laboratory study with forty-three participants. To induce time pres- sure, we randomly assigned half of our study participants to a treatment condition where they were only allowed five minutes to search for each of four ad-hoc search topics. The other half of the participants were given no task time limits. For half of participants’ search tasks (n=2), five second de- lays were introduced after queries were submitted and SERP results were clicked. Results showed that participants in the time pressure condition queried at a significantly higher rate, viewed significantly fewer documents per query, had significantly shallower hover and view depths, and spent sig- nificantly less time examining documents and SERPs. We found few significant differences in search behavior for sys- tem delay or interaction effects between time pressure and system delay. These initial results show time pressure has a significant impact on search behavior and suggest the de- sign of search interfaces and features that support people who are searching under time pressure

Crossref

Enlighten

Image mining: trends and developments

Author: Hsu Wynne
Lee Mong Li
Zhang Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining

University of Southern Queensland ePrints

Web Search From a Bus

Author: Balasubramanian Aruna
Croft Bruce
Levine Brian Neil
Venkataramani Arun
Zhou Yun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2007
Field of study

Opportunistic connections to the Internet from open wireless access points is now commonly possible in municipal areas. Vehicular networks can opportunistically connect to the internet for several seconds via open access points. In this paper, we adapt the interactive process of web search and retrieval to vehicular networks with intermittent Internet access. Our system, called Thedu has mobile nodes use an Internet proxy to collect search engine results and prefetch result pages. The mobiles nodes download the pre-fetched web pages from the proxy. Our contribution is a novel set of techniques to make aggressive but selective prefetching practical, resulting in a significantly greater number of relevant web results returned to mobile users. To evaluate our scheme, we deployed Thedu on DieselNet, our vehicular testbed of buses operating in a micro-urban area around Amherst, MA. Using a simulated workload, we find that users can expect four times as many useful responses to web search queries compared to not using Thedu. Moreover, the mean latency in receiving the first relevant response for a query is 2.7 minutes when deployed in a semi-urban region with a sparser distribution of APs compared to big cities

CiteSeerX

ScholarWorks@UMass Amherst