Search CORE

14,692 research outputs found

Hikester - the event management application

Author: Khatipov Rinat
Mazzara Manuel
Negimatzhanov Aydar
Rivera Victor
Zakirov Anvar
Zamaleev Ilgiz
Publication venue
Publication date: 19/01/2018
Field of study

Today social networks and services are one of the most important part of our everyday life. Most of the daily activities, such as communicating with friends, reading news or dating is usually done using social networks. However, there are activities for which social networks do not yet provide adequate support. This paper focuses on event management and introduces "Hikester". The main objective of this service is to provide users with the possibility to create any event they desire and to invite other users. "Hikester" supports the creation and management of events like attendance of football matches, quest rooms, shared train rides or visit of museums in foreign countries. Here we discuss the project architecture as well as the detailed implementation of the system components: the recommender system, the spam recognition service and the parameters optimizer

arXiv.org e-Print Archive

Crossref

Efficient Multi-way Theta-Join Processing Using MapReduce

Author: Chen Lei
Wang Min
Zhang Xiaofei
Publication venue
Publication date: 01/01/2012
Field of study

Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency.Comment: VLDB201

arXiv.org e-Print Archive

University of Memphis Digital Commons

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Author: Ailamaki A.
Boncz P. A.
Boncz P. A.
Chen M.-S.
Chen M.-S.
DeWitt D. J.
Lang H.
Li Y.
Liu B.
Lohman G. M.
Lu H.
Manegold S.
Ono K.
Schneider D. A.
Shatdal A.
Stillger M.
Zhang N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/07/2015
Field of study

Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

arXiv.org e-Print Archive

Crossref