Search CORE

101,277 research outputs found

Creating a web-scale video collection for research

Author: Awad George M.
Foley Colum
Lanagan James
Over Paul
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

This paper begins by considering a number of important design questions for a web-scale, widely available, multimedia test collection intended to support long-term scientific evaluation and comparison of content-based video analysis and exploitation systems. Such exploitation systems would include the kinds of functionality already explored within the annual TRECVid benchmarking activity such as search, semantic concept detection, and automatic summarisation. We then report on our progress in creating such a multimedia collection which we believe to be web scale and which will support a next generation of benchmarking activities for content-based video operations, and we report on our plans for how we intend to put this collection, the IACC.1 collection, to use

Crossref

Irish Universities

DCU Online Research Access Service

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

Author: C Luo
DM Blei
J Leskovec
J Leskovec
J-M Fourneau
LA Barroso
T Rabl
Z Jia
Publication venue
Publication date: 26/02/2014
Field of study

Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only generate limited types of data and support specific big data systems such as Hadoop. Hence we develop a tool, called Big Data Generator Suite (BDGS), to efficiently generate scalable big data while employing data models derived from real data to preserve data veracity. The effectiveness of BDGS is demonstrated by developing six data generators covering three representative data types (structured, semi-structured and unstructured) and three data sources (text, graph, and table data)

arXiv.org e-Print Archive

Crossref

Benchmarking Distributed Stream Data Processing Systems

Author: Heiskanen Henri
Karimov Jeyhun
Katsifodimos Asterios
Markl Volker
Rabl Tilmann
Samarev Roman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/06/2019
Field of study

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Technical efficiency in electricity generation - the impact of smallness and isolation of island economies

Author: Domah Preetum
Publication venue: Faculty of Economics
Publication date: 16/06/2004
Field of study

Technical efficiency in electricity generation - the impact of smallness and isolation of island economie

Apollo (Cambridge)

FishMark: A Linked Data Application Benchmark

Author: Alkiviadous S.
Bail S.
Concalves R. S.
Garilao Cristina
Parsia B.
van Harmelen M.
Workman D.
Publication venue: CEUR
Publication date: 01/01/2012
Field of study

Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the world’s finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.

OceanRep

CiteSeerX

The University of Manchester - Institutional Repository