1,589 research outputs found
CERN Storage Systems for Large-Scale Wireless
The project aims at evaluating the use of CERN computing infrastructure for next generation sensor networks data analysis. The proposed system allows the simulation of a large-scale sensor array for traffic analysis, streaming data to CERN storage systems in an efficient way. The data are made available for offline and quasi-online analysis, enabling both long term planning and fast reaction on the environment
Prototyping a Web-Scale Multimedia Retrieval Service Using Spark
International audienceThe world has experienced phenomenal growth in data production and storage in recent years, much of which has taken the form of media files. At the same time, computing power has become abundant with multi-core machines, grids, and clouds. Yet it remains a challenge to harness the available power and move toward gracefully searching and retrieving from web-scale media collections. Several researchers have experimented with using automatically distributed computing frameworks, notably Hadoop and Spark, for processing multimedia material, but mostly using small collections on small computing clusters. In this article, we describe a prototype of a (near) web-scale throughput-oriented MM retrieval service using the Spark framework running on the AWS cloud service. We present retrieval results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection, making this the largest high-dimensional feature vector collection reported in the literature. We also present a publicly available demonstration retrieval system, running on our own servers, where the implementation of the Spark pipelines can be observed in practice using standard image benchmarks, and downloaded for research purposes. Finally, we describe a method to evaluate retrieval quality of the ever-growing high-dimensional index of the prototype, without actually indexing a web-scale media collection
Hierarchical video surveillance architecture: a chassis for video big data analytics and exploration
There is increasing reliance on video surveillance systems for systematic derivation, analysis and interpretation of the data needed for predicting, planning, evaluating and implementing public safety. This is evident from the massive number of surveillance cameras deployed across public locations. For example, in July 2013, the British Security Industry Association (BSIA) reported that over 4 million CCTV cameras had been installed in Britain alone. The BSIA also reveal that only 1.5% of these are state owned. In this paper, we propose a framework that allows access to data from privately owned cameras, with the aim of increasing the efficiency and accuracy of public safety planning, security activities, and decision support systems that are based on video integrated surveillance systems. The accuracy of results obtained from government-owned public safety infrastructure would improve greatly if privately owned surveillance systems ‘expose’ relevant video-generated metadata events, such as triggered alerts and also permit query of a metadata repository. Subsequently, a police officer, for example, with an appropriate level of system permission can query unified video systems across a large geographical area such as a city or a country to predict the location of an interesting entity, such as a pedestrian or a vehicle. This becomes possible with our proposed novel hierarchical architecture, the Fused Video Surveillance Architecture (FVSA). At the high level, FVSA comprises of a hardware framework that is supported by a multi-layer abstraction software interface. It presents video surveillance systems as an adapted computational grid of intelligent services, which is integration-enabled to communicate with other compatible systems in the Internet of Things (IoT)
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
- …