11 research outputs found

    Big Data Reference Architectures, a systematic literature review

    Get PDF
    Today, we live in a world that produces data at an unprecedented rate. The significant amount of data has raised lots of attention and many strive to harness the power of this new material. In the same direction, academics and practitioners have considered means through which they can incorporate datadriven functions and explore patterns that were otherwise unknown. This has led to a concept called Big Data. Big Data is a field that deals with data sets that are too large and complex for traditional approaches to handle. Technical matters are fundamentally critical, but what is even more necessary, is an architecture that supports the orchestration of Big Data systems; an image of the system providing with clear understanding of different elements and their interdependencies. Reference architectures aid in defining the body of system and its key components, relationships, behaviors, patterns and limitations. This study provides an in-depth review of Big Data Reference Architectures by applying a systematic literature review. The study demonstrates a synthesis of high-quality research to offer indications of new trends. The study contributes to the body of knowledge on the principles of Reference Architectures, the current state of Big Data Reference Architectures, and their limitations

    Data analytics for data variety

    Get PDF
    A internet fez com que os gestores das organizações tivessem acesso a grandes quantidades de dados e esses dados são apresentados em diferentes formatos, em concreto, estruturados, semiestruturados e não estruturados. Esta variedade de dados é essencialmente proveniente das redes sociais, mas não só, também são provenientes da Internet of Things. Verifica-se para os dados estruturados que existem técnicas validadas, estudadas e maduras, mas para os outros tipos de dados, ou seja, semiestruturados e não estruturados tal já não se verifica. Neste poster, é apresentado um conjunto de técnicas de análise de dados para os dados semiestruturados e não estruturados, utilizando como principal bibliografia conferências de investigação na área de análise de dados.Through the Internet, the organizations managers had access to massive amounts of data and these data are presented in different formats, namely, structured, semi-structured and unstructured. These variety of data is essentially generated from social networks, but not only, they also are generated from the Internet of Things, from machines, sensors, among others. While the structured data has techniques well studied, mature and validated, otherwise the other types of techniques, semi-structured and unstructured, this is no longer true. In this poster, a set of data analysis techniques is presented for the semi-structured and unstructured data by using as main bibliography data analytics conferences.(undefined)info:eu-repo/semantics/publishedVersio

    Big Data Reference Architecture for e-Learning Analytical Systems

    Get PDF
    The recent advancements in technology have produced big data and become the necessity for researcher to analyze the data in order to make it meaningful. Massive amounts of data are collected across social media sites, mobile communications, business environments and institutions. In order to efficiently analyze this large quantity of raw data, the concept of big data was introduced. In this regard, big data analytic is needed in order to provide techniques to analyze the data. This new concept is expected to help education in the near future, by changing the way we approach the e-Learning process, by encouraging the interaction between learners and teachers, by allowing the fulfilment of the individual requirements and goals of learners. The learning environment generates massive knowledge by means of the various services provided in massive open online courses. Such knowledge is produced via learning actor interactions. Also, data analytics can be a valuable tool to help e-Learning organizations deliver better services to the public. It can provide important insights into consumer behavior and better predict demand for goods and services, thereby allowing for better resource management. This result motivates to put forward solutions for big data usage to the educational field. This research article unfolds a big data reference architecture for e-Learning analytical systems to make a unified analysis of the massive data generated by learning actors. This reference architecture makes the process of the massive data produced in big data e-learning system. Finally, the BiDRA for e-Learning analytical systems was evaluated based on the quality of maintainability, modularity, reusability, performance, and scalability

    Exploring data analytics of data variety

    Get PDF
    The Internet allows organizations managers access to large amounts of data, and this data are presented in different formats, i.e., data variety, namely structured, semi-structured and unstructured. Based on the Internet, this data variety is partly derived from social networks, but not only, machines are also capable of sharing information among themselves, or even machines with people. The objective of this paper is to understand how to retrieve information from data analysis with data variety. An experiment was carried out, based on a dataset with two distinct data types, images and comments on cars. Techniques of data analysis were used, namely Natural Language Processing to identify patterns, and Sentimental and Emotional Analysis. The image recognition technique was used to associate a car model with a category. Next, OLAP cubes and their visualization through dashboards were created. This paper concludes that it is possible to extract a set of relevant information, namely identifying which cars people like more/less, among other information.COMPETE: POCI-01-0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/201

    Data Analytics for Data Variety

    Get PDF
    Through the Internet, the organizations managers had access to massive amounts of data and these data are presented in different formats, namely, structured, semi-structured and unstructured. These variety of data is essentially generated from social networks, but not only, they also are generated from the Internet of Things, from machines, sensors, among others. While the structured data has techniques well studied, mature and validated, otherwise the other types of techniques, semi-structured and unstructured, this is no longer true. In this poster, a set of data analysis techniques is presented for the semi-structured and unstructured data by using as main bibliography data analytics conferences

    A reference architecture for big data solutions introducing a model to perform predictive analytics using big data technology

    No full text

    Catálogo de arquitecturas de software y tácticas arquitectónicas para contextos de big data

    Get PDF
    El presente documento presenta un catálogo de arquitecturas de software y tácticas arquitectónicas aplicables en contextos de big data. En la primera parte se describen las arquitecturas de software utilizando un esquema que presenta sus escenarios de uso y principales componentes. A partir de estas descripciones se identificaron tácticas arquitectónicas comunes aplicadas en contextos de big data. Se presenta una descripción de cada táctica arquitectónica y su estrategia de resolución de los atributos de calidad afectados

    Mapeo sistemático y evaluación de arquitecturas de software para contextos de big data

    Get PDF
    Incluye bibliografía.Big data es la información caracterizada por un volumen, velocidad y variedad alta de datos que requieren métodos analíticos y tecnologías específicas para poder ser gestionados y transformados en valor agregado para el usuario. El mercado de servicios de big data ha comenzado a crecer sostenidamente en los últimos años. Sin embargo, su rápido crecimiento trae varios desafíos a superar para la ingeniería de software. Las arquitecturas de software se vuelven relevantes en este contexto donde los estilos y patrones tradicionales no son suficientes para el diseño y desarrollo de software. Esta tesis tiene como objetivo explorar los desafíos y prácticas utilizadas durante el proceso de diseño arquitectónico en contextos de big data. En primer lugar, se realizó un mapeo sistemático de la literatura para identificar y categorizar propuestas de arquitecturas de software. Luego se profundiza la evaluación de dichas arquitecturas para identificar, describir y discutir el impacto de un conjunto de tácticas arquitectónicas sobre los atributos de calidad propios de big data. Se concluye que existen una variedad de propuestas de arquitectura de software industriales, teóricas y de referencia para big data. Estas propuestas muchas veces difieren en las capas y la separación de responsabilidades, por lo que dificulta al practicante diseñar una solución que se adapte a su contexto de uso. Por otra parte, los resultados del análisis de estas arquitecturas indican la existencia de requerimientos complejos, similares a los encontrados en sistemas distribuidos, pero a mayor escala, determinados por las características de gran volumen, variedad y velocidad de datos. Estos resultados muestran la oportunidad de buscar mejoras al proceso del diseño arquitectónico, adoptando prácticas como el uso de tácticas de arquitectura, para capturar las decisiones de diseño propias de big data

    Metadata-driven data integration

    Get PDF
    Cotutela: Universitat Politècnica de Catalunya i Université Libre de Bruxelles, IT4BI-DC programme for the joint Ph.D. degree in computer science.Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles és avui en dia un dels factors claus per l’èxit d’una organització. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els últims dos anys. Per tal de dur a terme aquestes tasques d’explotació de dades, les organitzacions primer han de realitzar una integració de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per això, aquest fet requereix reconsiderar les assumpcions tradicionals en integració amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral té com a objectiu proporcional un nou marc de treball per a la integració de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogènies, provinents de múltiples fonts i en el seu format original. Per això, proposem un procés d’integració compost d’una seqüència d’activitats governades per una capa semàntica, la qual és implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administració, aquestes activitats són el desplegament d’una arquitectura d’integració de dades, seguit per la inserció d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats són la integració virtual i materialització de les dades, la primera sent una tasca exploratòria i la segona una de consolidació. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referència de software per a sistemes de tractament massiu de dades amb coneixement semàntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestió de metadades. Concretament, ens centrem en donar suport a l’evolució d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogènies considerades. Per a l’integració virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitó, considerem heterogeneïtat semàntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automàticament. Finalment, la tesi es centra en l’activitat d’integració materialitzada. Per això proposa un mètode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribució al camp d’integració de dades en els ecosistemes de tractament massiu de dades contemporanisLes données ont un impact indéniable sur la société. Le stockage et le traitement de grandes quantités de données disponibles constituent actuellement l’un des facteurs clés de succès d’une entreprise. Néanmoins, nous assistons récemment à un changement représenté par des quantités de données massives et hétérogènes. En effet, 90% des données dans le monde ont été générées au cours des deux dernières années. Ainsi, pour mener à bien ces tâches d’exploitation des données, les organisations doivent d’abord réaliser une intégration des données en combinant des données provenant de sources multiples pour obtenir une vue unifiée de ces dernières. Cependant, l’intégration de quantités de données massives et hétérogènes nécessite de revoir les hypothèses d’intégration traditionnelles afin de faire face aux nouvelles exigences posées par les systèmes de gestion de données massives. Cette thèse de doctorat a pour objectif de fournir un nouveau cadre pour l’intégration de données dans le contexte d’écosystèmes à forte intensité de données, ce qui implique de traiter de grandes quantités de données hétérogènes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous préconisons un processus d’intégration constitué d’activités séquentielles régies par une couche sémantique, mise en oeuvre via un dépôt partagé de métadonnées. Du point de vue de la gestion, ces activités consistent à déployer une architecture d’intégration de données, suivies de la population de métadonnées partagées. Du point de vue de la consommation de données, les activités sont l’intégration de données virtuelle et matérialisée, la première étant une tâche exploratoire et la seconde, une tâche de consolidation. Conformément au cadre proposé, nous nous attachons à fournir des contributions à chacune des quatre activités. Nous commençons par proposer une architecture logicielle de référence pour les systèmes de gestion de données massives et à connaissance sémantique. Une telle architecture consiste en un schéma directeur pour le déploiement d’une pile de systèmes, le dépôt de métadonnées étant son composant principal. Ensuite, nous proposons un modèle de métadonnées basé sur des graphes comme formalisme pour la gestion des métadonnées. Nous mettons l’accent sur la prise en charge de l’évolution des schémas et des sources de données, facteur prédominant des sources hétérogènes sous-jacentes. Pour l’intégration virtuelle, nous proposons des algorithmes de réécriture de requêtes qui s’appuient sur le modèle de métadonnées proposé précédemment. Nous considérons en outre les hétérogénéités sémantiques dans les sources de données, que les algorithmes proposés sont capables de résoudre automatiquement. Enfin, la thèse se concentre sur l’activité d’intégration matérialisée et propose à cette fin une méthode de sélection de résultats intermédiaires à matérialiser dans des flux des données massives. Dans l’ensemble, les résultats de cette thèse constituent une contribution au domaine de l’intégration des données dans les écosystèmes contemporains de gestion de données massivesPostprint (published version

    The Power of Exogenous Variables in Predicting West Nile Virus in South Carolina

    Get PDF
    Despite the availability of medical data, environmental surveillance tools, and heightened public awareness, West Nile Virus (WNv) remains a global health hazard. Reliable methods for predicting WNv outbreaks remain elusive, and environmental health managers must take preventive actions without the benefit of simple predictive tools. The purpose of this ex post facto research was to examine the accuracy and timeliness of exogenous data in predicting outbreaks of WNv in South Carolina. Decision theory, the CYNEFIN construct, and systems theory provided the theoretical framework for this study, allowing the researcher to broaden traditional decision theory concepts with powerful system-level precepts. Using WNv as an example of decision making in complex environments, a statistical model for predicting the likelihood of the presence of WNv was developed through the exclusive use of exogenous explanatory variables (EEVs). The key research questions were focused on whether EEVs alone can predict the likelihood of WNv presence with the statistical confidence to make timely preventive resource decisions. Results indicated that publicly accessible EEVs such as average temperature, average wind speed, and average population can be used to predict the presence of WNv in a South Carolina locality 30 days prior to an incident, although they did not accurately predict incident counts higher than four. The social implications of this research can be far-reaching. The ability to predict emerging infectious diseases (EID) for which there are no vaccines or cure can provide decision makers with the ability to take pro-active measures to mitigate EID outbreaks
    corecore