12,401 research outputs found
ALOJA: A benchmarking and predictive platform for big data performance analysis
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1.
This article describes the evolution of the project's focus and research
lines from over a year of continuously benchmarking Hadoop under dif-
ferent configuration and deployments options, presents results, and dis
cusses the motivation both technical and market-based of such changes.
During this time, ALOJA's target has evolved from a previous low-level
profiling of Hadoop runtime, passing through extensive benchmarking
and evaluation of a large body of results via aggregation, to currently
leveraging Predictive Analytics (PA) techniques. Modeling benchmark
executions allow us to estimate the results of new or untested configu-
rations or hardware set-ups automatically, by learning techniques from
past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span-
ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
An efficient time optimized scheme for progressive analytics in big data
Big data analytics is the key research subject for future data driven decision making applications. Due to the large amount of data, progressive analytics could provide an efficient way for querying big data clusters. Each cluster contains only a piece of the examined data. Continuous queries over these data sources require intelligent mechanisms to result the final outcome (query response) in the minimum time with the maximum performance. A Query Controller (QC) is responsible to manage continuous/sequential queries and return the final outcome to users or applications. In this paper, we propose a mechanism that can be adopted by the QC. The proposed mechanism is capable of managing partial results retrieved by a number of processors each one responsible for each cluster. Each processor executes a query over a specific cluster of data. Our mechanism adopts two sequential decision making models for handling the incoming partial results. The first model is based on a finite horizon time-optimized model and the second one is based on an infinite horizon optimally scheduled model. We provide mathematical formulations for solving the discussed problem and present simulation results. Through a large number of experiments, we reveal the advantages of the proposed models and give numerical results comparing them with a deterministic model. These results indicate that the proposed models can efficiently reduce the required time for returning the final outcome to the user/application while keeping the quality of the aggregated result at high levels
Image-based Geolocalization by Ground-to-2.5D Map Matching
We study the image-based geolocalization problem, aiming to localize
ground-view query images on cartographic maps. Current methods often utilize
cross-view localization techniques to match ground-view query images with 2D
maps. However, the performance of these methods is unsatisfactory due to
significant cross-view appearance differences. In this paper, we lift
cross-view matching to a 2.5D space, where heights of structures (e.g., trees
and buildings) provide geometric information to guide the cross-view matching.
We propose a new approach to learning representative embeddings from
multi-modal data. Specifically, we establish a projection relationship between
2.5D space and 2D aerial-view space. The projection is further used to combine
multi-modal features from the 2.5D and 2D maps using an effective
pixel-to-point fusion method. By encoding crucial geometric cues, our method
learns discriminative location embeddings for matching panoramic images and
maps. Additionally, we construct the first large-scale ground-to-2.5D map
geolocalization dataset to validate our method and facilitate future research.
Both single-image based and route based localization experiments are conducted
to test our method. Extensive experiments demonstrate that the proposed method
achieves significantly higher localization accuracy and faster convergence than
previous 2D map-based approaches
- …