145 research outputs found

    Towards a new hybrid approach for building document-oriented data warehouses

    Get PDF
    Schemaless databases offer a large storage capacity while guaranteeing high performance in data processing. Unlike relational databases, which are rigid and have shown their limitations in managing large amounts of data. However, the absence of a well-defined schema and structure in not only SQL (NoSQL) databases makes the use of data for decision analysis purposes even more complex and difficult. In this paper, we propose an original approach to build a document-oriented data warehouse from unstructured data. The new approach follows a hybrid paradigm that combines data analysis and user requirements analysis. The first data-driven step exploits the fast and distributed processing of the spark engine to generate a general schema for each collection in the database. The second requirement-driven step consists of analyzing the semantics of the decisional requirements expressed in natural language and mapping them to the schemas of the collections. At the end of the process, a decisional schema is generated in JavaScript object notation (JSON) format and the data loading with the necessary transformations is performed

    How to Optimize the Environmental Impact of Transformed NoSQL Schemas through a Multidimensional Cost Model?

    Full text link
    The complexity of database systems has increased significantly along with the continuous growth of data, resulting in NoSQL systems and forcing Information Systems (IS) architects to constantly adapt their data models (i.e., the data structure of information stored in the database) and carefully choose the best option(s) for storing and managing data. In this context, we propose %in this paper an automatic global approach for leading data models' transformation process. This approach starts with the generation of all possible solutions. It then relies on a cost model that helps to compare these generated data models in a logical level to finally choose the best one for the given use case. This cost model integrates both data model and queries cost. It also takes into consideration the environmental impact of a data model as well as its financial and its time costs. This work presents for the first time a multidimensional cost model encompassing time, environmental and financial constraints, which compares data models leading to the choice of the optimal one for a given use case. In addition, a simulation for data model's transformation and cost computation has been developed based on our approach

    Cloud migration of legacy applications

    Get PDF

    ICSEA 2021: the sixteenth international conference on software engineering advances

    Get PDF
    The Sixteenth International Conference on Software Engineering Advances (ICSEA 2021), held on October 3 - 7, 2021 in Barcelona, Spain, continued a series of events covering a broad spectrum of software-related topics. The conference covered fundamentals on designing, implementing, testing, validating and maintaining various kinds of software. The tracks treated the topics from theory to practice, in terms of methodologies, design, implementation, testing, use cases, tools, and lessons learnt. The conference topics covered classical and advanced methodologies, open source, agile software, as well as software deployment and software economics and education. The conference had the following tracks: Advances in fundamentals for software development Advanced mechanisms for software development Advanced design tools for developing software Software engineering for service computing (SOA and Cloud) Advanced facilities for accessing software Software performance Software security, privacy, safeness Advances in software testing Specialized software advanced applications Web Accessibility Open source software Agile and Lean approaches in software engineering Software deployment and maintenance Software engineering techniques, metrics, and formalisms Software economics, adoption, and education Business technology Improving productivity in research on software engineering Trends and achievements Similar to the previous edition, this event continued to be very competitive in its selection process and very well perceived by the international software engineering community. As such, it is attracting excellent contributions and active participation from all over the world. We were very pleased to receive a large amount of top quality contributions. We take here the opportunity to warmly thank all the members of the ICSEA 2021 technical program committee as well as the numerous reviewers. The creation of such a broad and high quality conference program would not have been possible without their involvement. We also kindly thank all the authors that dedicated much of their time and efforts to contribute to the ICSEA 2021. We truly believe that thanks to all these efforts, the final conference program consists of top quality contributions. This event could also not have been a reality without the support of many individuals, organizations and sponsors. We also gratefully thank the members of the ICSEA 2021 organizing committee for their help in handling the logistics and for their work that is making this professional meeting a success. We hope the ICSEA 2021 was a successful international forum for the exchange of ideas and results between academia and industry and to promote further progress in software engineering research

    Relatório de Estágio - Solução de BI Roaming Data Science (RoaDS) em ambiente Vodafone

    Get PDF
    A telecom company (Vodafone), had the need to implement a Business Intelligence solution for Roaming data across a wide set of different data sources. Based on the data visualization of this solution, its key users with decision power, can make a business analysis and needs of infrastructure and software expansion. This document aims to expose the scientific papers produced with the various stages of production of the solution (state of the art, architecture design and implementation results), this Business Intelligence solution was designed and implemented with OLAP methodologies and technologies in a Data Warehouse composed of Data Marts arranged in constellation, the visualization layer was custom made in JavaScript (VueJS). As a base for the results a questionnaire was created to be filled in by the key users of the solution. Based on this questionnaire it was possible to ascertain that user acceptance was satisfactory. The proposed objectives for the implementation of the BI solution with all the requirements was achieved with the infrastructure itself created from scratch in Kubernetes. This BI platform can be expanded using column storage databases created specifically with OLAP workloads in mind, removing the need for an OLAP cube layer. Based on Machine Learning algorithms, the platform will be able to perform the predictions needed to make decisions about Vodafone's Roaming infrastructure

    Extraction des modèles d'une base de données NoSQL orientée-documents basée sur une approche dirigée par les modèles

    Get PDF
    De nos jours, la transformation digitale des entreprises et plus largement celle de la société a entrainé une évolution des bases de données (BD) relationnelles vers les BD massives (Big data). Celles-ci permettent de stocker non seulement de grandes quantités de données mais également différents types et formats de données provenant de sources hétérogènes. De plus, ces données sont souvent saisies à très haute fréquence et doivent donc être filtrées et agrégées en temps réel pour éviter toute saturation inutile de l’espace de stockage. Ces caractéristiques ont eu un impact sur les outils nécessaires au stockage et à la gestion des données. Ainsi, sont apparus de nouveaux systèmes de gestion des données : les systèmes NoSQL. Ceux-ci sont notamment capables de gérer le volume, la variété et la vélocité. Dans la majorité des SGBD NoSQL, les bases de données (BD) sont schema-less (sans schéma), ce qui signifie que le modèle de données n’est pas fourni lors de la création d’une BD. Autrement dit, dans une table, les noms des attributs ne sont précisés qu’au moment de l’insertion de leurs valeurs. Cette propriété d’absence du schéma offre une flexibilité indéniable qui : - facilite l'évolution du modèle de données au fur et à mesure de l’utilisation de la BD, - et permet aux utilisateurs finaux d'ajouter de nouvelles informations sans avoir recours à l’administrateur de BD. Mais, en contrepartie, cette propriété introduit un manque de compréhension et de visibilité sur l’organisation des données dans une BD NoSQL. Autrement dit, l'absence du modèle de données ne permet pas à l’utilisateur de connaitre comment les données sont stockées (sous quel nom et quel type) et reliées dans la BD ; cette connaissance est indispensable pour exprimer des requêtes. En effet, pour écrire ses requêtes, l’utilisateur doit disposer de la structure de la BD décrivant les noms des tables, les noms des attributs et leurs types ainsi que les liens entre les objets ; un modèle de données contient l’ensemble de ces descriptions. Nos travaux s’inscrivent dans ce contexte ; ils concernent l’élaboration des modèles nécessaires pour la manipulation des BD gérées par des systèmes NoSQL schema-less. Il s’agit de deux modèles : – Le modèle physique qui décrit l'organisation interne des données et permet d'exprimer des requêtes. – Le modèle conceptuel qui fait abstraction des aspects techniques et se concentre sur la sémantique de données. L’objectif de cette thèse est de proposer une démarche générale qui vise à extraire les modèles physique et conceptuel d’une BD NoSQL schema-less. Nous utilisons l’architecture MDA qui est une norme du consortium OMG pour le développement dirigé par les modèles. A partir d'une BD NoSQL schema-less, notre démarche MDA applique deux processus automatiques successifs : – Le processus d’extraction et de mise à jour du modèle physique, – Le processus de transformation du modèle physique en un modèle conceptuel. Afin de vérifier la faisabilité de notre solution, nous avons développé un prototype composé de deux modules. Le premier est chargé de générer un modèle physique de données à partir d’une BD NoSQL schema-less et le mettre à jour au fur et à mesure de l’exécution des requêtes sur la BD. Le modèle physique résultant décrit l'organisation interne des données de la base et permet aux utilisateurs d'exprimer des requêtes. Le second module a pour objectif de transformer le modèle physique en un modèle conceptuel de données. Celui-ci fait abstraction des aspects techniques et facilite la compréhension de l’organisation des données.Nowadays, the digital transformation of companies and more broadly that of society has led to an evolution of relational databases towards Big data. These allow storing not only large amounts of data but also different types and formats of data from heterogeneous sources. In addition, this data is often captured at very high frequency and must therefore be filtered and aggregated in real time to avoid unnecessary saturation of storage space. These characteristics have had an impact on the tools needed to store and manage data. Thus, new data management systems have appeared: NoSQL systems. These are able to handle volume, variety and velocity. In the majority of NoSQL DBMS, databases are schema-less, which means that the data model is not provided when creating a database. In other words, in a table, the names of the attributes are not specified until the time of inserting their values. This property of the absence of the diagram offers undeniable flexibility which: - facilitates the evolution of the data model as the database is used, - and allows end users to add new information without resorting to the database administrator. But, on the other hand, this property introduces a lack of understanding and visibility into the organization of data in a NoSQL database. In other words, the absence of the data model does not allow the user to know how the data is stored (under what name and what type) and linked in the database; this knowledge is essential to express requests. In fact, to write queries, the user must have the database structure describing the names of the tables, the names of the attributes and their types as well as the links between the objects; a data model contains all of these descriptions. Our work falls within this context; they relate to the development of the models necessary for the manipulation of databases managed by NoSQL schema-less systems. There are two models: - The physical model which describes the internal organization of the data and makes it possible to express queries. - The conceptual model which disregards technical aspects and focuses on data semantics. The objective of this thesis is to propose a general approach which aims to extract the physical and conceptual models of a NoSQL schema-less database. We use the MDA architecture which is an OMG consortium standard for model-driven development. From a NoSQL schema-less database, our MDA approach applies two successive automatic processes: - The process of extracting and updating the physical model, - The process of transforming the physical model into a conceptual model. In order to verify the feasibility of our solution, we have developed a prototype made up of two modules. The first is responsible for generating a physical data model from a NoSQL schema-less database and updating it as queries are executed against the database. The physical model obtained describes the internal organization of the data in the database and allows users to express queries. The second module aims to transform the physical model already extracted into a conceptual data model. This ignores the technical aspects and makes it easier to understand the organization of the data

    Extraction des modèles d'une base de données NoSQL orientée-documents basée sur une approche dirigée par les modèles

    Get PDF
    De nos jours, la transformation digitale des entreprises et plus largement celle de la société a entrainé une évolution des bases de données (BD) relationnelles vers les BD massives (Big data). Celles-ci permettent de stocker non seulement de grandes quantités de données mais également différents types et formats de données provenant de sources hétérogènes. De plus, ces données sont souvent saisies à très haute fréquence et doivent donc être filtrées et agrégées en temps réel pour éviter toute saturation inutile de l’espace de stockage. Ces caractéristiques ont eu un impact sur les outils nécessaires au stockage et à la gestion des données. Ainsi, sont apparus de nouveaux systèmes de gestion des données : les systèmes NoSQL. Ceux-ci sont notamment capables de gérer le volume, la variété et la vélocité. Dans la majorité des SGBD NoSQL, les bases de données (BD) sont schema-less (sans schéma), ce qui signifie que le modèle de données n’est pas fourni lors de la création d’une BD. Autrement dit, dans une table, les noms des attributs ne sont précisés qu’au moment de l’insertion de leurs valeurs. Cette propriété d’absence du schéma offre une flexibilité indéniable qui : - facilite l'évolution du modèle de données au fur et à mesure de l’utilisation de la BD, - et permet aux utilisateurs finaux d'ajouter de nouvelles informations sans avoir recours à l’administrateur de BD. Mais, en contrepartie, cette propriété introduit un manque de compréhension et de visibilité sur l’organisation des données dans une BD NoSQL. Autrement dit, l'absence du modèle de données ne permet pas à l’utilisateur de connaitre comment les données sont stockées (sous quel nom et quel type) et reliées dans la BD ; cette connaissance est indispensable pour exprimer des requêtes. En effet, pour écrire ses requêtes, l’utilisateur doit disposer de la structure de la BD décrivant les noms des tables, les noms des attributs et leurs types ainsi que les liens entre les objets ; un modèle de données contient l’ensemble de ces descriptions. Nos travaux s’inscrivent dans ce contexte ; ils concernent l’élaboration des modèles nécessaires pour la manipulation des BD gérées par des systèmes NoSQL schema-less. Il s’agit de deux modèles : – Le modèle physique qui décrit l'organisation interne des données et permet d'exprimer des requêtes. – Le modèle conceptuel qui fait abstraction des aspects techniques et se concentre sur la sémantique de données. L’objectif de cette thèse est de proposer une démarche générale qui vise à extraire les modèles physique et conceptuel d’une BD NoSQL schema-less. Nous utilisons l’architecture MDA qui est une norme du consortium OMG pour le développement dirigé par les modèles. A partir d'une BD NoSQL schema-less, notre démarche MDA applique deux processus automatiques successifs : – Le processus d’extraction et de mise à jour du modèle physique, – Le processus de transformation du modèle physique en un modèle conceptuel. Afin de vérifier la faisabilité de notre solution, nous avons développé un prototype composé de deux modules. Le premier est chargé de générer un modèle physique de données à partir d’une BD NoSQL schema-less et le mettre à jour au fur et à mesure de l’exécution des requêtes sur la BD. Le modèle physique résultant décrit l'organisation interne des données de la base et permet aux utilisateurs d'exprimer des requêtes. Le second module a pour objectif de transformer le modèle physique en un modèle conceptuel de données. Celui-ci fait abstraction des aspects techniques et facilite la compréhension de l’organisation des données.Nowadays, the digital transformation of companies and more broadly that of society has led to an evolution of relational databases towards Big data. These allow storing not only large amounts of data but also different types and formats of data from heterogeneous sources. In addition, this data is often captured at very high frequency and must therefore be filtered and aggregated in real time to avoid unnecessary saturation of storage space. These characteristics have had an impact on the tools needed to store and manage data. Thus, new data management systems have appeared: NoSQL systems. These are able to handle volume, variety and velocity. In the majority of NoSQL DBMS, databases are schema-less, which means that the data model is not provided when creating a database. In other words, in a table, the names of the attributes are not specified until the time of inserting their values. This property of the absence of the diagram offers undeniable flexibility which: - facilitates the evolution of the data model as the database is used, - and allows end users to add new information without resorting to the database administrator. But, on the other hand, this property introduces a lack of understanding and visibility into the organization of data in a NoSQL database. In other words, the absence of the data model does not allow the user to know how the data is stored (under what name and what type) and linked in the database; this knowledge is essential to express requests. In fact, to write queries, the user must have the database structure describing the names of the tables, the names of the attributes and their types as well as the links between the objects; a data model contains all of these descriptions. Our work falls within this context; they relate to the development of the models necessary for the manipulation of databases managed by NoSQL schema-less systems. There are two models: - The physical model which describes the internal organization of the data and makes it possible to express queries. - The conceptual model which disregards technical aspects and focuses on data semantics. The objective of this thesis is to propose a general approach which aims to extract the physical and conceptual models of a NoSQL schema-less database. We use the MDA architecture which is an OMG consortium standard for model-driven development. From a NoSQL schema-less database, our MDA approach applies two successive automatic processes: - The process of extracting and updating the physical model, - The process of transforming the physical model into a conceptual model. In order to verify the feasibility of our solution, we have developed a prototype made up of two modules. The first is responsible for generating a physical data model from a NoSQL schema-less database and updating it as queries are executed against the database. The physical model obtained describes the internal organization of the data in the database and allows users to express queries. The second module aims to transform the physical model already extracted into a conceptual data model. This ignores the technical aspects and makes it easier to understand the organization of the data

    MogwaĂŻ: a Framework to Handle Complex Queries on Large Models

    Get PDF
    International audienceWhile Model Driven Engineering is gaining more industrial interest, scalability issues when managing large models have become a major problem in current modeling frameworks. Scalable model persistence has been achieved by using NoSQL backends for model storage, but existing modeling framework APIs have not evolved accordingly, limiting NoSQL query performance benefits. In this paper we present the MogwaĂŻ a scalable and efficient model query framework based on a direct translation of OCL queries to Gremlin, a query language supported by several NoSQL databases. Generated Gremlin expressions are computed inside the database itself, bypassing limitations of existing framework APIs and improving overall performance, as confirmed by our experimental results showing an improvement of execution time up to a factor of 20 and a reduction of the memory overhead up to a factor of 75 for large models
    • …
    corecore