Search CORE

13,276 research outputs found

T ${}^2$ K ${}^2$ : The Twitter Top-K Keywords Benchmark

Author: A Guille
AE Gattiker
CD Manning
D Kılınç
DD Lewis
F Ravat
J Darmont
J Ferrarons
J Gray
J O’Shea
JD Cooper
K Spärck Jones
K Spärck Jones
L Wang
S Bringay
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2017
Field of study

Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T

{}^2

{}^2

, which features a real tweet dataset and queries with various complexities and selectivities. T

{}^2

{}^2

helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T

{}^2

{}^2

's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

arXiv.org e-Print Archive

Crossref

HAL

Hal-Diderot

On Evaluating Commercial Cloud Services: A Systematic Review

Author: Cai Rainbow
Flint Shayne
Li Zheng
O'Brien Liam
Zhang He
Publication venue: 'Elsevier BV'
Publication date: 24/02/2016
Field of study

Background: Cloud Computing is increasingly booming in industry with many competing providers and services. Accordingly, evaluation of commercial Cloud services is necessary. However, the existing evaluation studies are relatively chaotic. There exists tremendous confusion and gap between practices and theory about Cloud services evaluation. Aim: To facilitate relieving the aforementioned chaos, this work aims to synthesize the existing evaluation implementations to outline the state-of-the-practice and also identify research opportunities in Cloud services evaluation. Method: Based on a conceptual evaluation model comprising six steps, the Systematic Literature Review (SLR) method was employed to collect relevant evidence to investigate the Cloud services evaluation step by step. Results: This SLR identified 82 relevant evaluation studies. The overall data collected from these studies essentially represent the current practical landscape of implementing Cloud services evaluation, and in turn can be reused to facilitate future evaluation work. Conclusions: Evaluation of commercial Cloud services has become a world-wide research topic. Some of the findings of this SLR identify several research gaps in the area of Cloud services evaluation (e.g., the Elasticity and Security evaluation of commercial Cloud services could be a long-term challenge), while some other findings suggest the trend of applying commercial Cloud services (e.g., compared with PaaS, IaaS seems more suitable for customers and is particularly important in industry). This SLR study itself also confirms some previous experiences and reveals new Evidence-Based Software Engineering (EBSE) lessons

arXiv.org e-Print Archive

The Australian National University

Metrics for Measuring Data Quality - Foundations for an Economic Oriented Management of Data Quality

Author: Heinrich Bernd
Kaiser Marcus
Klier Mathias
Publication venue
Publication date: 01/01/2007
Field of study

The article develops metrics for an economic oriented management of data quality. Two data quality dimensions are focussed: consistency and timeliness. For deriving adequate metrics several requirements are stated (e. g. normalisation, cardinality, adaptivity, interpretability). Then the authors discuss existing approaches for measuring data quality and illustrate their weaknesses. Based upon these considerations, new metrics are developed for the data quality dimensions consistency and timeliness. These metrics are applied in practice and the results are illustrated in the case of a major German mobile services provider

University of Regensburg Publication Server

Implementation of a Segmented, Transactional Database Caching System

Author: Sandmann Benjamin J.
Publication venue: Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato
Publication date: 25/08/2014
Field of study

Research on algorithms and concepts regarding memory-based data caching can help solve the performance bottleneck in current Database Management Systems. Problems such as data concurrency, persistent storage, and transaction management have limited most memory cache’s capabilities. It has also been tough to develop a proper user- oriented and business friendly way of implementing such a system. The research of this project focused on code implementation, abstract methodologies and how to best prepare such an application for common business usage

Institutional Repository for Minnesota State University, Mankato

uFLIP: Understanding Flash IO Patterns

Author: Bonnet Philippe
Bouganim Luc
Jónsson Björn
Publication venue
Publication date: 01/01/2009
Field of study

Does the advent of flash devices constitute a radical change for secondary storage? How should database systems adapt to this new form of secondary storage? Before we can answer these questions, we need to fully understand the performance characteristics of flash devices. More specifically, we want to establish what kind of IOs should be favored (or avoided) when designing algorithms and architectures for flash-based systems. In this paper, we focus on flash IO patterns, that capture relevant distribution of IOs in time and space, and our goal is to quantify their performance. We define uFLIP, a benchmark for measuring the response time of flash IO patterns. We also present a benchmarking methodology which takes into account the particular characteristics of flash devices. Finally, we present the results obtained by measuring eleven flash devices, and derive a set of design hints that should drive the development of flash-based systems on current devices.Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

HAL UVSQ

Mergers and acquisitions transactions strategies in diffusion - type financial systems in highly volatile global capital markets with nonlinearities

Author: Ledenyov Dimitri O.
Ledenyov Viktor O.
Publication venue
Publication date: 04/01/2014
Field of study

The M and A transactions represent a wide range of unique business optimization opportunities in the corporate transformation deals, which are usually characterized by the high level of total risk. The M and A transactions can be successfully implemented by taking to an account the size of investments, purchase price, direction of transaction, type of transaction, and using the modern comparable transactions analysis and the business valuation techniques in the diffusion type financial systems in the finances. We developed the MicroMA software program with the embedded optimized near-real-time artificial intelligence algorithm to create the winning virtuous M and A strategies, using the financial performance characteristics of the involved firms, and to estimate the probability of the M and A transaction completion success. We believe that the fluctuating dependence of M and A transactions number over the certain time period is quasi periodic. We think that there are many factors, which can generate the quasi periodic oscillations of the M and A transactions number in the time domain, for example: the stock market bubble effects. We performed the research of the nonlinearities in the M and A transactions number quasi-periodic oscillations in Matlab, including the ideal, linear, quadratic, and exponential dependences. We discovered that the average of a sum of random numbers in the M and A transactions time series represents a time series with the quasi periodic systematic oscillations, which can be finely approximated by the polynomial numbers. We think that, in the course of the M and A transaction implementation, the ability by the companies to absorb the newly acquired knowledge and to create the new innovative knowledge bases, is a key predeterminant of the M and A deal completion success as in Switzerland.Comment: 160 pages, 9 figures, 37 table

arXiv.org e-Print Archive

Munich RePEc Personal Archive