504 research outputs found
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data
Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D
Casual Information Visualization on Exploring Spatiotemporal Data
The goal of this thesis is to study how the diverse data on the Web which are familiar to everyone can be visualized, and with a special consideration on their spatial and temporal information. We introduce novel approaches and visualization techniques dealing with different types of data contents: interactively browsing large amount of tags linking with geospace and time, navigating and locating spatiotemporal photos or videos in collections, and especially, providing visual supports for the exploration of diverse Web contents on arbitrary webpages in terms of augmented Web browsing
Semantic Cross-View Matching
Matching cross-view images is challenging because the appearance and
viewpoints are significantly different. While low-level features based on
gradient orientations or filter responses can drastically vary with such
changes in viewpoint, semantic information of images however shows an invariant
characteristic in this respect. Consequently, semantically labeled regions can
be used for performing cross-view matching. In this paper, we therefore explore
this idea and propose an automatic method for detecting and representing the
semantic information of an RGB image with the goal of performing cross-view
matching with a (non-RGB) geographic information system (GIS). A segmented
image forms the input to our system with segments assigned to semantic concepts
such as traffic signs, lakes, roads, foliage, etc. We design a descriptor to
robustly capture both, the presence of semantic concepts and the spatial layout
of those segments. Pairwise distances between the descriptors extracted from
the GIS map and the query image are then used to generate a shortlist of the
most promising locations with similar semantic concepts in a consistent spatial
layout. An experimental evaluation with challenging query images and a large
urban area shows promising results
Scalable exploration of 3D massive models
Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Esta tese presenta unha serie técnicas escalables que avanzan o estado da arte da creación e exploración de grandes modelos tridimensionaies. No ámbito da xeración
destes modelos, preséntanse métodos para mellorar a adquisición e procesado de
escenas reais, grazas a unha implementación eficiente dun sistema out- of- core de
xestión de nubes de puntos, e unha nova metodoloxía escalable de fusión de datos
de xeometría e cor para adquisicións con oclusións. No ámbito da visualización de
grandes conxuntos de datos, que é o núcleo principal desta tese, preséntanse dous
novos métodos. O primeiro é unha técnica adaptabile out-of-core que aproveita o
hardware de rasterización da GPU e as occlusion queries para crear lotes coherentes
de traballo, que serán procesados por kernels de trazado de raios codificados en
shaders, permitindo out-of-core ray-tracing con sombreado e iluminación global. O segundo
é un método de compresión agresivo que aproveita a redundancia xeométrica
que se adoita atopar en grandes modelos 3D para comprimir os datos de forma
que caiban, nun formato totalmente renderizable, na memoria da GPU. O método
está deseñado para representacións voxelizadas de escenas 3D, que son amplamente
utilizadas para diversos cálculos como para acelerar as consultas de visibilidade na
GPU. A compresión lógrase fusionando subárbores idénticas a través dunha transformación
de similitude, e aproveitando a distribución non homoxénea de referencias
a nodos compartidos para almacenar punteiros aos nodos fillo, e utilizando unha
codificación de bits variable. A capacidade e o rendemento de todos os métodos
avalíanse utilizando diversos casos de uso do mundo real de diversos ámbitos e
sectores, incluídos o patrimonio cultural, a enxeñería e os videoxogos.[Resumen] En esta tesis se presentan una serie técnicas escalables que avanzan el estado del arte de la creación y exploración de grandes modelos tridimensionales. En el ámbito de
la generación de estos modelos, se presentan métodos para mejorar la adquisición y
procesado de escenas reales, gracias a una implementación eficiente de un sistema
out-of-core de gestión de nubes de puntos, y una nueva metodología escalable de
fusión de datos de geometría y color para adquisiciones con oclusiones. Para la
visualización de grandes conjuntos de datos, que constituye el núcleo principal de
esta tesis, se presentan dos nuevos métodos. El primero de ellos es una técnica
adaptable out-of-core que aprovecha el hardware de rasterización de la GPU y las
occlusion queries, para crear lotes coherentes de trabajo, que serán procesados por
kernels de trazado de rayos codificados en shaders, permitiendo renders out-of-core
avanzados con sombreado e iluminación global. El segundo es un método de compresión
agresivo, que aprovecha la redundancia geométrica que se suele encontrar en
grandes modelos 3D para comprimir los datos de forma que quepan, en un formato
totalmente renderizable, en la memoria de la GPU. El método está diseñado para
representaciones voxelizadas de escenas 3D, que son ampliamente utilizadas para
diversos cálculos como la aceleración las consultas de visibilidad en la GPU o el
trazado de sombras. La compresión se logra fusionando subárboles idénticos a través
de una transformación de similitud, y aprovechando la distribución no homogénea de
referencias a nodos compartidos para almacenar punteros a los nodos hijo, utilizando
una codificación de bits variable. La capacidad y el rendimiento de todos los métodos
se evalúan utilizando diversos casos de uso del mundo real de diversos ámbitos y
sectores, incluidos el patrimonio cultural, la ingeniería y los videojuegos.[Abstract] This thesis introduces scalable techniques that advance the state-of-the-art in massive model creation and exploration. Concerning model creation, we present methods for improving reality-based scene acquisition and processing, introducing an efficient
implementation of scalable out-of-core point clouds and a data-fusion approach for
creating detailed colored models from cluttered scene acquisitions. The core of this
thesis concerns enabling technology for the exploration of general large datasets.
Two novel solutions are introduced. The first is an adaptive out-of-core technique
exploiting the GPU rasterization pipeline and hardware occlusion queries in order
to create coherent batches of work for localized shader-based ray tracing kernels,
opening the door to out-of-core ray tracing with shadowing and global illumination.
The second is an aggressive compression method that exploits redundancy in large
models to compress data so that it fits, in fully renderable format, in GPU memory.
The method is targeted to voxelized representations of 3D scenes, which are widely
used to accelerate visibility queries on the GPU. Compression is achieved by merging
subtrees that are identical through a similarity transform and by exploiting the skewed
distribution of references to shared nodes to store child pointers using a variable bitrate
encoding The capability and performance of all methods are evaluated on many
very massive real-world scenes from several domains, including cultural heritage,
engineering, and gaming
Analyzing textual data by multiple word clouds
Word clouds are a good way to represent textual data with meta information. However, when it comes to analyzing multiple data sources, they are difficult to use. This is due to their poor comparability. The proposed RadCloud merges multiple standard clouds into a single one, while retaining the information about the origin of the words. Using separate clouds as well as this new visualization technique, a tool for textual data analysis is created: MuWoC. The resulting software is then evaluated by using it with multiple data sources ranging from encyclopedia articles, over literature to custom CSV files
- …