13,276 research outputs found

    T2{}^2K2{}^2: The Twitter Top-K Keywords Benchmark

    Full text link
    Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T2{}^2K2{}^2, which features a real tweet dataset and queries with various complexities and selectivities. T2{}^2K2{}^2 helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T2{}^2K2{}^2's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

    On Evaluating Commercial Cloud Services: A Systematic Review

    Full text link
    Background: Cloud Computing is increasingly booming in industry with many competing providers and services. Accordingly, evaluation of commercial Cloud services is necessary. However, the existing evaluation studies are relatively chaotic. There exists tremendous confusion and gap between practices and theory about Cloud services evaluation. Aim: To facilitate relieving the aforementioned chaos, this work aims to synthesize the existing evaluation implementations to outline the state-of-the-practice and also identify research opportunities in Cloud services evaluation. Method: Based on a conceptual evaluation model comprising six steps, the Systematic Literature Review (SLR) method was employed to collect relevant evidence to investigate the Cloud services evaluation step by step. Results: This SLR identified 82 relevant evaluation studies. The overall data collected from these studies essentially represent the current practical landscape of implementing Cloud services evaluation, and in turn can be reused to facilitate future evaluation work. Conclusions: Evaluation of commercial Cloud services has become a world-wide research topic. Some of the findings of this SLR identify several research gaps in the area of Cloud services evaluation (e.g., the Elasticity and Security evaluation of commercial Cloud services could be a long-term challenge), while some other findings suggest the trend of applying commercial Cloud services (e.g., compared with PaaS, IaaS seems more suitable for customers and is particularly important in industry). This SLR study itself also confirms some previous experiences and reveals new Evidence-Based Software Engineering (EBSE) lessons

    Metrics for Measuring Data Quality - Foundations for an Economic Oriented Management of Data Quality

    Get PDF
    The article develops metrics for an economic oriented management of data quality. Two data quality dimensions are focussed: consistency and timeliness. For deriving adequate metrics several requirements are stated (e. g. normalisation, cardinality, adaptivity, interpretability). Then the authors discuss existing approaches for measuring data quality and illustrate their weaknesses. Based upon these considerations, new metrics are developed for the data quality dimensions consistency and timeliness. These metrics are applied in practice and the results are illustrated in the case of a major German mobile services provider

    Implementation of a Segmented, Transactional Database Caching System

    Get PDF
    Research on algorithms and concepts regarding memory-based data caching can help solve the performance bottleneck in current Database Management Systems. Problems such as data concurrency, persistent storage, and transaction management have limited most memory cache’s capabilities. It has also been tough to develop a proper user- oriented and business friendly way of implementing such a system. The research of this project focused on code implementation, abstract methodologies and how to best prepare such an application for common business usage

    uFLIP: Understanding Flash IO Patterns

    Get PDF
    Does the advent of flash devices constitute a radical change for secondary storage? How should database systems adapt to this new form of secondary storage? Before we can answer these questions, we need to fully understand the performance characteristics of flash devices. More specifically, we want to establish what kind of IOs should be favored (or avoided) when designing algorithms and architectures for flash-based systems. In this paper, we focus on flash IO patterns, that capture relevant distribution of IOs in time and space, and our goal is to quantify their performance. We define uFLIP, a benchmark for measuring the response time of flash IO patterns. We also present a benchmarking methodology which takes into account the particular characteristics of flash devices. Finally, we present the results obtained by measuring eleven flash devices, and derive a set of design hints that should drive the development of flash-based systems on current devices.Comment: CIDR 200

    Mergers and acquisitions transactions strategies in diffusion - type financial systems in highly volatile global capital markets with nonlinearities

    Get PDF
    The M and A transactions represent a wide range of unique business optimization opportunities in the corporate transformation deals, which are usually characterized by the high level of total risk. The M and A transactions can be successfully implemented by taking to an account the size of investments, purchase price, direction of transaction, type of transaction, and using the modern comparable transactions analysis and the business valuation techniques in the diffusion type financial systems in the finances. We developed the MicroMA software program with the embedded optimized near-real-time artificial intelligence algorithm to create the winning virtuous M and A strategies, using the financial performance characteristics of the involved firms, and to estimate the probability of the M and A transaction completion success. We believe that the fluctuating dependence of M and A transactions number over the certain time period is quasi periodic. We think that there are many factors, which can generate the quasi periodic oscillations of the M and A transactions number in the time domain, for example: the stock market bubble effects. We performed the research of the nonlinearities in the M and A transactions number quasi-periodic oscillations in Matlab, including the ideal, linear, quadratic, and exponential dependences. We discovered that the average of a sum of random numbers in the M and A transactions time series represents a time series with the quasi periodic systematic oscillations, which can be finely approximated by the polynomial numbers. We think that, in the course of the M and A transaction implementation, the ability by the companies to absorb the newly acquired knowledge and to create the new innovative knowledge bases, is a key predeterminant of the M and A deal completion success as in Switzerland.Comment: 160 pages, 9 figures, 37 table
    corecore