13,276 research outputs found
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
On Evaluating Commercial Cloud Services: A Systematic Review
Background: Cloud Computing is increasingly booming in industry with many
competing providers and services. Accordingly, evaluation of commercial Cloud
services is necessary. However, the existing evaluation studies are relatively
chaotic. There exists tremendous confusion and gap between practices and theory
about Cloud services evaluation. Aim: To facilitate relieving the
aforementioned chaos, this work aims to synthesize the existing evaluation
implementations to outline the state-of-the-practice and also identify research
opportunities in Cloud services evaluation. Method: Based on a conceptual
evaluation model comprising six steps, the Systematic Literature Review (SLR)
method was employed to collect relevant evidence to investigate the Cloud
services evaluation step by step. Results: This SLR identified 82 relevant
evaluation studies. The overall data collected from these studies essentially
represent the current practical landscape of implementing Cloud services
evaluation, and in turn can be reused to facilitate future evaluation work.
Conclusions: Evaluation of commercial Cloud services has become a world-wide
research topic. Some of the findings of this SLR identify several research gaps
in the area of Cloud services evaluation (e.g., the Elasticity and Security
evaluation of commercial Cloud services could be a long-term challenge), while
some other findings suggest the trend of applying commercial Cloud services
(e.g., compared with PaaS, IaaS seems more suitable for customers and is
particularly important in industry). This SLR study itself also confirms some
previous experiences and reveals new Evidence-Based Software Engineering (EBSE)
lessons
Metrics for Measuring Data Quality - Foundations for an Economic Oriented Management of Data Quality
The article develops metrics for an economic oriented management of data quality. Two data quality dimensions are focussed: consistency and timeliness. For deriving adequate metrics several requirements are stated (e. g. normalisation, cardinality, adaptivity, interpretability). Then the authors discuss existing approaches for measuring data quality and illustrate their weaknesses. Based upon these considerations, new metrics are developed for the data quality dimensions consistency and timeliness. These metrics are applied in practice and the results are illustrated in the case of a major German mobile services provider
Implementation of a Segmented, Transactional Database Caching System
Research on algorithms and concepts regarding memory-based data caching can help solve the performance bottleneck in current Database Management Systems. Problems such as data concurrency, persistent storage, and transaction management have limited most memory cache’s capabilities. It has also been tough to develop a proper user- oriented and business friendly way of implementing such a system. The research of this project focused on code implementation, abstract methodologies and how to best prepare such an application for common business usage
uFLIP: Understanding Flash IO Patterns
Does the advent of flash devices constitute a radical change for secondary
storage? How should database systems adapt to this new form of secondary
storage? Before we can answer these questions, we need to fully understand the
performance characteristics of flash devices. More specifically, we want to
establish what kind of IOs should be favored (or avoided) when designing
algorithms and architectures for flash-based systems. In this paper, we focus
on flash IO patterns, that capture relevant distribution of IOs in time and
space, and our goal is to quantify their performance. We define uFLIP, a
benchmark for measuring the response time of flash IO patterns. We also present
a benchmarking methodology which takes into account the particular
characteristics of flash devices. Finally, we present the results obtained by
measuring eleven flash devices, and derive a set of design hints that should
drive the development of flash-based systems on current devices.Comment: CIDR 200
Mergers and acquisitions transactions strategies in diffusion - type financial systems in highly volatile global capital markets with nonlinearities
The M and A transactions represent a wide range of unique business
optimization opportunities in the corporate transformation deals, which are
usually characterized by the high level of total risk. The M and A transactions
can be successfully implemented by taking to an account the size of
investments, purchase price, direction of transaction, type of transaction, and
using the modern comparable transactions analysis and the business valuation
techniques in the diffusion type financial systems in the finances. We
developed the MicroMA software program with the embedded optimized
near-real-time artificial intelligence algorithm to create the winning virtuous
M and A strategies, using the financial performance characteristics of the
involved firms, and to estimate the probability of the M and A transaction
completion success. We believe that the fluctuating dependence of M and A
transactions number over the certain time period is quasi periodic. We think
that there are many factors, which can generate the quasi periodic oscillations
of the M and A transactions number in the time domain, for example: the stock
market bubble effects. We performed the research of the nonlinearities in the M
and A transactions number quasi-periodic oscillations in Matlab, including the
ideal, linear, quadratic, and exponential dependences. We discovered that the
average of a sum of random numbers in the M and A transactions time series
represents a time series with the quasi periodic systematic oscillations, which
can be finely approximated by the polynomial numbers. We think that, in the
course of the M and A transaction implementation, the ability by the companies
to absorb the newly acquired knowledge and to create the new innovative
knowledge bases, is a key predeterminant of the M and A deal completion success
as in Switzerland.Comment: 160 pages, 9 figures, 37 table
- …