101,277 research outputs found
Creating a web-scale video collection for research
This paper begins by considering a number of important design questions for a
web-scale, widely available, multimedia test collection intended to support
long-term scientific evaluation and comparison of content-based video analysis and
exploitation systems. Such exploitation systems would include the kinds of functionality
already explored within the annual TRECVid benchmarking activity such as search, semantic
concept detection, and automatic summarisation.
We then report on our progress in creating
such a multimedia collection which we believe to be web scale and which will support a next generation of benchmarking activities for content-based video operations, and we report on our plans for how we intend to put this collection, the IACC.1 collection, to use
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate
application-specific data sets to meet the 4V requirements of big data.
Specifically, big data generators need to generate scalable data (Volume) of
different types (Variety) under controllable generation rates (Velocity) while
keeping the important characteristics of raw data (Veracity). This gives rise
to various new challenges about how we design generators efficiently and
successfully. To date, most existing techniques can only generate limited types
of data and support specific big data systems such as Hadoop. Hence we develop
a tool, called Big Data Generator Suite (BDGS), to efficiently generate
scalable big data while employing data models derived from real data to
preserve data veracity. The effectiveness of BDGS is demonstrated by developing
six data generators covering three representative data types (structured,
semi-structured and unstructured) and three data sources (text, graph, and
table data)
Benchmarking Distributed Stream Data Processing Systems
The need for scalable and efficient stream analysis has led to the
development of many open-source streaming data processing systems (SDPSs) with
highly diverging capabilities and performance characteristics. While first
initiatives try to compare the systems for simple workloads, there is a clear
gap of detailed analyses of the systems' performance characteristics. In this
paper, we propose a framework for benchmarking distributed stream processing
engines. We use our suite to evaluate the performance of three widely used
SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our
evaluation focuses in particular on measuring the throughput and latency of
windowed operations, which are the basic type of operations in stream
analytics. For this benchmark, we design workloads based on real-life,
industrial use-cases inspired by the online gaming industry. The contribution
of our work is threefold. First, we give a definition of latency and throughput
for stateful operators. Second, we carefully separate the system under test and
driver, in order to correctly represent the open world model of typical stream
processing deployments and can, therefore, measure system performance under
realistic conditions. Third, we build the first benchmarking framework to
define and test the sustainable performance of streaming systems.
Our detailed evaluation highlights the individual characteristics and
use-cases of each system.Comment: Published at ICDE 201
Recommended from our members
Technical efficiency in electricity generation - the impact of smallness and isolation of island economies
Technical efficiency in electricity generation - the impact of smallness and isolation of island economie
FishMark: A Linked Data Application Benchmark
Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the worldās finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.
- ā¦