Search CORE

16 research outputs found

Growth of relational model: Interdependence and complementary to big data

Author: Prabhu Srikanth
Rao B. Dinesh
Shetty Sucharitha
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/04/2021
Field of study

A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Análise colaborativa de grandes conjuntos de séries temporais

Author: Duarte Eduardo Miguel Oliveira
Publication venue
Publication date: 14/01/2021
Field of study

The recent expansion of metrification on a daily basis has led to the production of massive quantities of data, and in many cases, these collected metrics are only useful for knowledge building when seen as a full sequence of data ordered by time, which constitutes a time series. To find and interpret meaningful behavioral patterns in time series, a multitude of analysis software tools have been developed. Many of the existing solutions use annotations to enable the curation of a knowledge base that is shared between a group of researchers over a network. However, these tools also lack appropriate mechanisms to handle a high number of concurrent requests and to properly store massive data sets and ontologies, as well as suitable representations for annotated data that are visually interpretable by humans and explorable by automated systems. The goal of the work presented in this dissertation is to iterate on existing time series analysis software and build a platform for the collaborative analysis of massive time series data sets, leveraging state-of-the-art technologies for querying, storing and displaying time series and annotations. A theoretical and domain-agnostic model was proposed to enable the implementation of a distributed, extensible, secure and high-performant architecture that handles various annotation proposals in simultaneous and avoids any data loss from overlapping contributions or unsanctioned changes. Analysts can share annotation projects with peers, restricting a set of collaborators to a smaller scope of analysis and to a limited catalog of annotation semantics. Annotations can express meaning not only over a segment of time, but also over a subset of the series that coexist in the same segment. A novel visual encoding for annotations is proposed, where annotations are rendered as arcs traced only over the affected series’ curves in order to reduce visual clutter. Moreover, the implementation of a full-stack prototype with a reactive web interface was described, directly following the proposed architectural and visualization model while applied to the HVAC domain. The performance of the prototype under different architectural approaches was benchmarked, and the interface was tested in its usability. Overall, the work described in this dissertation contributes with a more versatile, intuitive and scalable time series annotation platform that streamlines the knowledge-discovery workflow.A recente expansão de metrificação diária levou à produção de quantidades massivas de dados, e em muitos casos, estas métricas são úteis para a construção de conhecimento apenas quando vistas como uma sequência de dados ordenada por tempo, o que constitui uma série temporal. Para se encontrar padrões comportamentais significativos em séries temporais, uma grande variedade de software de análise foi desenvolvida. Muitas das soluções existentes utilizam anotações para permitir a curadoria de uma base de conhecimento que é compartilhada entre investigadores em rede. No entanto, estas ferramentas carecem de mecanismos apropriados para lidar com um elevado número de pedidos concorrentes e para armazenar conjuntos massivos de dados e ontologias, assim como também representações apropriadas para dados anotados que são visualmente interpretáveis por seres humanos e exploráveis por sistemas automatizados. O objetivo do trabalho apresentado nesta dissertação é iterar sobre o software de análise de séries temporais existente e construir uma plataforma para a análise colaborativa de grandes conjuntos de séries temporais, utilizando tecnologias estado-de-arte para pesquisar, armazenar e exibir séries temporais e anotações. Um modelo teórico e agnóstico quanto ao domínio foi proposto para permitir a implementação de uma arquitetura distribuída, extensível, segura e de alto desempenho que lida com várias propostas de anotação em simultâneo e evita quaisquer perdas de dados provenientes de contribuições sobrepostas ou alterações não-sancionadas. Os analistas podem compartilhar projetos de anotação com colegas, restringindo um conjunto de colaboradores a uma janela de análise mais pequena e a um catálogo limitado de semântica de anotação. As anotações podem exprimir significado não apenas sobre um intervalo de tempo, mas também sobre um subconjunto das séries que coexistem no mesmo intervalo. Uma nova codificação visual para anotações é proposta, onde as anotações são desenhadas como arcos traçados apenas sobre as curvas de séries afetadas de modo a reduzir o ruído visual. Para além disso, a implementação de um protótipo full-stack com uma interface reativa web foi descrita, seguindo diretamente o modelo de arquitetura e visualização proposto enquanto aplicado ao domínio AVAC. O desempenho do protótipo com diferentes decisões arquiteturais foi avaliado, e a interface foi testada quanto à sua usabilidade. Em geral, o trabalho descrito nesta dissertação contribui com uma abordagem mais versátil, intuitiva e escalável para uma plataforma de anotação sobre séries temporais que simplifica o fluxo de trabalho para a descoberta de conhecimento.Mestrado em Engenharia Informátic

Repositório Institucional da Universidade de Aveiro

A framework for multidimensional indexes on distributed and highly-available data stores

Author: Cugnasco Cesare
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones científicas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances científicos. En esta tesis, describimos 3 aplicaciones científicas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientífico y simulaciones físicas, donde hemos experimentado en primera mano las limitaciones de las tecnologías existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serían deseables y no existen, y nos hemos centrado en dos características que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos científicos. Por un lado, las simulaciones científicas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples características correlacionadas. Por ejemplo, una partícula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analíticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analíticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo así la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una característica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analítico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analítico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los científicos ofreciendo una solución escalable e integraPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

A framework for multidimensional indexes on distributed and highly-available data stores

Author: Cugnasco Cesare
Publication venue: Universitat Politècnica de Catalunya
Publication date: 22/03/2019
Field of study

Tesis Doctorals en Xarxa

Understanding the Impact of Databases on the Energy Efficiency of Cloud Applications

Author: Bani Bechir
Publication venue
Publication date: 01/08/2016
Field of study

RÉSUMÉ Aujourd'hui, les applications infonuagiques sont utilisées dans toutes les industries ; de la finance, au commerce de détail, en passant par l'éducation, la communication, la manufacture, les services publics et les transports. Malgré leur popularité et leur large adoption, peu d'informations sont disponibles sur l'empreinte énergétique de ces applications et, en particulier, celle de leurs bases de données, qui constituent l'épine dorsale de ces applications infonuagiques. Pourtant, la réduction de la consommation d'énergie des applications est un objectif majeur pour la société et continuera de l'être à l'avenir. Deux familles de bases de données sont actuellement utilisées dans les applications infonuagiques: Les bases de données relationnelles et non-relationnelles. Aussi, nous examinons la consommation d'énergie des trois bases de données utilisées par les applications infonuagiques : MySQL, PostgreSQL et MongoDB, respectivement relationelle, relationelle, et non-relationelle. Nous réalisons une série d'expériences avec trois applications infonuagiques (une application multi-thread RESTful, DVD Store, et JPetStore). Nous étudions également l'impact des patrons infonuagiques sur la consommation d'énergie parce que les bases de données dans les applications infonuagiques sont souvent implémentées conjointement avec des patrons infonuagiques tels que le Local Database Proxy, le Local Sharding Based Router, ou la Priority Message Queue. Nous mesurons la consommation d'énergie en utilisant l'outil Power-API pour garder une trace de l'énergie consommée au niveau de processus par les variantes des applications infonuagiques. Cette estimation énergétique au niveau processus donne une précision plus exacte que d'une estimation au niveau d'un logiciel en général. En plus de cela, nous mesurons le temps de réponse de l'application infonuagique pour mettre en contraste le temps de réponse avec l'efficacité énergétique, afin que les développeurs soient conscients des compromis entre ces deux indicateurs de qualité lors de la sélection d'une base de données pour leur application. Nous rapportons que le choix des bases de données peut réduire la consommation d'énergie d'une application infonuagique quelque soit les trois types des patrons infonuagiques étudiés. Nous avons montré que la base de données MySQL est la moins consommatrice d'énergie, mais est la plus lente parmi les trois bases de données étudiées. PostgreSQL est la plus consommatrice d'énergie entre les trois bases de données, mais est plus rapide que MySQL, mais plus lente que MongoDB. MongoDB consomme plus d'énergie que MySQL, mais moins que PostgreSQL et est la plus rapide parmi les trois bases de données étudiées.----------ABSTRACT Cloud-based applications are used in about every industry; from financial, retail, education, and communication, to manufacturing, utilities, and transportation. Despite their popularity and wide adoption, little is still known about the energy footprint of these applications and, in particular, of their databases, which are the backbone of cloud-based applications. Reducing the energy consumption of applications is a major objective for society and will continue to be so in the near to far future. Two families of databases are currently used in cloud-based applications: relational and non-relational databases. Consequently, in this thesis, we study the energy consumption of three databases used by cloud-based applications: MySQL, PostgreSQL, and MongoDB, which are respectively relational, relational, and non-relational. We devise a series of experiments with three cloud-based applications (a RESTful multi-threaded application, DVD Store, and JPetStore). We also study the impact of cloud patterns on the energy consumption because databases in cloud-based applications are often implemented in conjunction with patterns like Local Database Proxy, Local Sharding-Based Router, and Priority Message Queue. We measure the energy consumption using the Power-API tool to keep track of the energy consumed at the process-level by the variants of the cloud-based applications. We measure the response time of the cloud-based application because we wanted to contrast response time with energy efficiency, so that developers are aware of the trade-offs between these two quality indicators when selecting a database for their application. We report that the choice of the databases can reduce the energy consumption of a cloud-based application regardless of the three cloud patterns that are implemented. We showed that MySQL database is the least energy consuming but is the slowest among the three databases. PostgreSQL is the most energy consuming among the three databases, but is faster than MySQL but slower than MongoDB. MongoDB consumes more energy than MySQL but less than PostgreSQL and is the fastest among the three databases

PolyPublie

Data warehousing technologies for large-scale and right-time data

Author: Xiufeng Liu
Publication venue: Department of Computer Science, Aalborg University
Publication date: 01/08/2012
Field of study

VBN

Conceptualização e desenvolvimento de uma framework de clustering

Author: Taboada Ricardo Filipe Fernandes
Publication venue
Publication date: 01/01/2017
Field of study

Com a proliferação de todo o tipo de serviços baseados em plataformas digitais, como por exemplo, o e-commerce o home banking ou mesmo as redes sociais, o conceito de sistemas distribuídos ganhou um novo folgo, e com ele, surgiram novas necessidades de se atingir altos níveis de disponibilidade para determinados sistemas de software. Este cenário obriga a que as infraestruturas tecnológicas atuais incluam várias réplicas desses mesmos sistemas, de forma a manter o serviço sempre disponível ainda que ocorra uma falha num ou noutro sistema. A maior parte dos sistemas atuais incluem duas camadas distintas, a camada aplicacional, onde corre a lógica de negócio, e a camada de persistência onde os dados são guardados de forma não volátil. Embora, normalmente, de forma simples se consigam replicar os aplicacionais desses sistemas, replicar as camadas de persistência revela-se a maior parte das vezes um desafio bem mais complexo. Esta dissertação apresenta um problema concreto de uma necessidade de aplicar replicação de dados num sistema distribuído que se encontra atualmente em ambiente de produção, de forma a poder garantir-se a disponibilidade do mesmo. Do estudo realizado sobre os principais conceitos de replicação de dados, assim como algumas frameworks de replicação a nível de middleware, e o problema em questão, foi possível conceptualizar e desenvolver uma nova framework de clustering ao nível do middleware que pode ser aplicada em sistemas aos quais se queira adicionar capacidade de clustering, independentemente do tipo de persistência com os quais os mesmos interagem.With the proliferation of all kinds of services based on digital platforms, as for example, the ecommerce, the home banking or even the social networks, the concept of distributed systems gained a new breadth, and with it, appeared new necessities to achieve higher levels of high availability in some specific software systems. This scenario forces the need of the actual technological infrastructures to include several replicas of those systems, in order to ensure the service availability, even in an advent of a failure in one or more systems. The majority of the actual systems include two distinct layers, the application layer, where the business logic runs, and the persistence layer, where the data is stored in a non-volatile way. Although, usually, is simple to apply replication to the application layer of those systems, applying replication on the persistence layers reveals itself most of the times a much more complex challenge. This master thesis presents a concrete problem of the necessity to apply data replication to a distributed system that is currently in a production environment, in order to ensure its availability. Through study performed both on the main concepts of data replication, as on some middleware based replication frameworks, and taking into the account the problem in hand, it was possible to conceptualize and develop a new middleware clustering framework that can be applied to systems to which is wanted to add clustering capabilities, regardless of the persistence type they interact with

Repositório Científico do Instituto Politécnico do Porto

Enhanced Living Environments

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book was prepared as a Final Publication of the COST Action IC1303 “Algorithms, Architectures and Platforms for Enhanced Living Environments (AAPELE)”. The concept of Enhanced Living Environments (ELE) refers to the area of Ambient Assisted Living (AAL) that is more related with Information and Communication Technologies (ICT). Effective ELE solutions require appropriate ICT algorithms, architectures, platforms, and systems, having in view the advance of science and technology in this area and the development of new and innovative solutions that can provide improvements in the quality of life for people in their homes and can reduce the financial burden on the budgets of the healthcare providers. The aim of this book is to become a state-of-the-art reference, discussing progress made, as well as prompting future directions on theories, practices, standards, and strategies related to the ELE area. The book contains 12 chapters and can serve as a valuable reference for undergraduate students, post-graduate students, educators, faculty members, researchers, engineers, medical doctors, healthcare organizations, insurance companies, and research strategists working in this area

Directory of Open Access Books (DOAB)

A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities

Author: Poaka Poaka Vladivy
Publication venue
Publication date: 26/10/2022
Field of study

To face the tough competition, changing markets and technologies in automotive industry, automakers have to be highly innovative. In the previous decades, innovations were electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal network. Furthermore, the growing expectations and preferences of customers oblige these manufacturers to adapt their business models and to also propose mobility-based services. One other hand, there is also an increasing pressure from regulators to significantly reduce the environmental footprint in transportation and mobility, down to zero in the foreseeable future. This dissertation investigates an architecture for communication and data exchange within a complex and heterogeneous ecosystem. This communication takes place between various third-party entities on one side, and between these entities and the infrastructure on the other. The proposed solution reduces considerably the complexity of vehicle communication and within the parties involved in the ODX life cycle. In such an heterogeneous environment, a particular attention is paid to the protection of confidential and private data. Confidential data here refers to the OEM’s know-how which is enclosed in vehicle projects. The data delivered by a car during a vehicle communication session might contain private data from customers. Our solution ensures that every entity of this ecosystem has access only to data it has the right to. We designed our solution to be non-technological-coupling so that it can be implemented in any platform to benefit from the best environment suited for each task. We also proposed a data model for vehicle projects, which improves query time during a vehicle diagnostic session. The scalability and the backwards compatibility were also taken into account during the design phase of our solution. We proposed the necessary algorithms and the workflow to perform an efficient vehicle diagnostic with considerably lower latency and substantially better complexity time and space than current solutions. To prove the practicality of our design, we presented a prototypical implementation of our design. Then, we analyzed the results of a series of tests we performed on several vehicle models and projects. We also evaluated the prototype against quality attributes in software engineering

Publikationsserver der Technischen Universität Clausthal