29 research outputs found
Dynamic Prefetching of Data Tiles for Interactive Visualization
In this paper, we present ForeCache, a general-purpose tool for exploratory browsing of large datasets. ForeCache utilizes a client-server architecture, where the user interacts with a lightweight client-side interface to browse datasets, and the data to be browsed is retrieved from a DBMS running on a back-end server. We assume a detail-on-demand browsing paradigm, and optimize the back-end support for this paradigm by inserting a separate middleware layer in front of the DBMS. To improve response times, the middleware layer fetches data ahead of the user as she explores a dataset. We consider two different mechanisms for prefetching: (a) learning what to fetch from the user's recent movements, and (b) using data characteristics (e.g., histograms) to find data similar to what the user has viewed in the past. We incorporate these mechanisms into a single prediction engine that adjusts its prediction strategies over time, based on changes in the user's behavior. We evaluated our prediction engine with a user study, and found that our dynamic prefetching strategy provides: (1) significant improvements in overall latency when compared with non-prefetching systems (430% improvement); and (2) substantial improvements in both prediction accuracy (25% improvement) and latency (88% improvement) relative to existing prefetching techniques
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Management of spatial data for visualization on mobile devices
Vector-based mapping is emerging as a preferred format in Location-based
Services(LBS), because it can deliver an up-to-date and interactive map visualization.
The Progressive Transmission(PT) technique has been developed to
enable the ecient transmission of vector data over the internet by delivering
various incremental levels of detail(LoD). However, it is still challenging to apply
this technique in a mobile context due to many inherent limitations of mobile
devices, such as small screen size, slow processors and limited memory. Taking
account of these limitations, PT has been extended by developing a framework of
ecient data management for the visualization of spatial data on mobile devices.
A data generalization framework is proposed and implemented in a software
application. This application can signicantly reduce the volume of data for
transmission and enable quick access to a simplied version of data while preserving
appropriate visualization quality. Using volunteered geographic information
as a case-study, the framework shows
exibility in delivering up-to-date spatial
information from dynamic data sources.
Three models of PT are designed and implemented to transmit the additional
LoD renements: a full scale PT as an inverse of generalisation, a viewdependent
PT, and a heuristic optimised view-dependent PT. These models are
evaluated with user trials and application examples. The heuristic optimised
view-dependent PT has shown a signicant enhancement over the traditional PT
in terms of bandwidth-saving and smoothness of transitions.
A parallel data management strategy associated with three corresponding
algorithms has been developed to handle LoD spatial data on mobile clients.
This strategy enables the map rendering to be performed in parallel with a process
which retrieves the data for the next map location the user will require. A viewdependent
approach has been integrated to monitor the volume of each LoD for
visible area. The demonstration of a
exible rendering style shows its potential
use in visualizing dynamic geoprocessed data. Future work may extend this
to integrate topological constraints and semantic constraints for enhancing the
vector map visualization
Aplicación de técnicas de aprendizaje automático a la gestión y optimización de cachés de teselas para la aceleración de servicios de mapas en las infraestructuras de datos espaciales
La gran proliferación en el uso de servicios de mapas a través de la Web ha motivado la
necesidad de disponer de servicios cada vez más escalables. Como respuesta a esta necesidad,
los servicios de mapas basados en teselado se han perfilado como una alternativa escalable
frente a los servicios de mapas tradicionales, permitiendo la actuación de mecanismos de
caché o incluso la prestación del servicio mediante una colección de imágenes pregeneradas.
Sin embargo, los requisitos de almacenamiento y tiempo de puesta en marcha de estos
servicios resultan a menudo prohibitivos cuando la cartografÃa a servir cubre una zona
geográfica extensa para un elevado número de escalas.
Por ello, habitualmente estos servicios se ofrecen recurriendo a cachés parciales que
contienen tan solo un subconjunto de la cartografÃa. Para garantizar una Calidad de Servicio
(QoS - Quality of Service) aceptable es necesaria la actuación de adecuadas polÃticas de
mantenimiento y gestión de estas cachés de mapas: 1) Estrategias de población inicial ó
seeding de la caché. 2) Algoritmos de carga dinámica ante las peticiones de los usuarios. 3)
PolÃticas de reemplazo de caché.
Sin embargo, existe un reducido número de estas estrategias que sean especÃficas para los
servicios de mapas. La mayor parte de estrategias aplicadas a estos servicios son extraÃdas
de otros ámbitos, como los proxies Web tradicionales, las cuáles no tienen en cuenta la
componente espacial de los objetos de mapa que gestionan.
En la presente tesis se aborda este punto de mejora, diseñando nuevos algoritmos especÃficos para este dominio de aplicación que permitan optimizar el rendimiento de los
servicios de mapas. Dado el elevado número de objetos gestionados por estas cachés y la
heterogeneidad de los mismos en cuanto a capas, escalas de representación, etc., se ha hecho
un esfuerzo para que las estrategias diseñadas sean automáticas o semi-automáticas,
requiriendo la menor intervención humana posible.
AsÃ, se han propuesto dos novedosas estrategias para la población inicial de una caché de
mapas. Una de ellas utiliza un modelo descriptivo mediante los registros de peticiones pasadas
del servicio. La otra se basa en un modelo predictivo para la identificación de fenómenos
geográficos directores de las peticiones de los usuarios, parametrizado o bien mediante un
análisis regresivo OLS (Ordinary Least Squares) o mediante un sistema inteligente con redes
neuronales.
Asimismo, se han llevado a cabo importantes contribuciones en relación con las estrategias
de reemplazo de estas cachés. Por una parte, se ha propuesto un sistema inteligente
basado en redes neuronales, que estima la popularidad de acceso futuro en base a ciertas
propiedades de los objetos que gestiona: actualidad de referencia, frecuencia de referencia,
y el tamaño de la tesela referenciada. Por otra parte, se ha propuesto una estrategia, bautizada
como Spatial-LFU, la cual es una variante de la estrategia Perfect-LFU, simplificada
aprovechando la correlación espacial existente entre las peticiones.Departamento de TeorÃa de la Señal y Comunicaciones e IngenierÃa Telemátic
Issues on distributed caching of spatial data
Die Menge an digitalen Informationen über Orte hat bis heute rapide zugenommen. Mit der Verbreitung mobiler, internetfähiger Geräte kann nun jederzeit und von überall auf diese Informationen zugegriffen werden. Im Zuge dieser Entwicklung wurden zahlreiche ortsbasierte Anwendungen und Dienste populär. So reihen sich digitale Einkaufsassistenten und Touristeninformationsdienste sowie geosoziale Anwendungen in der Liste der beliebtesten Vertreter. Steigende Benutzerzahlen sowie die rapide wachsenden Datenmengen, stellen ernstzunehmende Herausforderungen für die Anbieter ortsbezogener Informationen dar. So muss der Datenbereitstellungsprozess effizient gestaltet sein, um einen kosteneffizienten Betrieb zu ermöglichen. Darüber hinaus sollten Ressourcen flexibel genug zugeordnet werden können, um Lastungleichgewichte zwischen Systemkomponenten ausgleichen zu können. Außerdem müssen Datenanbieter in der Lage sein, die Verarbeitungskapazitäten mit steigender und fallender Anfragelast zu skalieren.
Mit dieser Arbeit stellen wir einen verteilten Zwischenspeicher für ortsbasierte Daten vor. In dem verteilten Zwischenspeicher werden Replika der am häufigsten verwendeten Daten von mehreren unabhängigen Servern im flüchtigen Speicher vorgehalten. Mit unserem Ansatz können die Herausforderungen für Anbieter ortsbezogener Informationen wie folgt addressiert werden: Zunächst sorgt eine speziell für die Zugriffsmuster ortsbezogener Anwendungen konzipierte Zwischenspreicherungsstragie für eine Erhöhung der Gesamteffizienz, da eine erhebliche Menge der zwischengespeicherten Ergebnisse vorheriger Anfragen wiederverwendet werden kann. Darüber hinaus bewirken unsere speziell für den Geo-Kontext entwickelten Lastbalancierungsverfahren den Ausgleich dynamischer Lastungleichgewichte. Letztlich befähigen unsere verteilten Protokolle zur Hinzu- und Wegnahme von Servern die Anbieter ortsbezogener Informationen, die Verarbeitungskapazität steigender oder fallender Anfragelast anzupassen.
In diesem Dokument untersuchen wir zunächst die Anforderungen der Datenbereitstellung im Kontext von ortsbasierten Anwendungen. Anschließend diskutieren wir mögliche Entwurfsmuster und leiten eine Architektur für einen verteilten Zwischenspeicher ab. Im Verlauf dieser Arbeit, entstanden mehrere konkrete Implementierungsvarianten, die wir in diesem Dokument vorstellen und miteinander vergleichen. Unsere Evaluation zeigt nicht nur die prinzipielle Machbarkeit, sondern auch die Effektivität von unserem Caching-Ansatz für die Erreichung von Skalierbarkeit und Verfügbarkeit im Kontext der Bereitstellung von ortsbasierten Daten