872 research outputs found

    Data management and Data Pipelines: An empirical investigation in the embedded systems domain

    Get PDF
    Context: Companies are increasingly collecting data from all possible sources to extract insights that help in data-driven decision-making. Increased data volume, variety, and velocity and the impact of poor quality data on the development of data products are leading companies to look for an improved data management approach that can accelerate the development of high-quality data products. Further, AI is being applied in a growing number of fields, and thus it is evolving as a horizontal technology. Consequently, AI components are increasingly been integrated into embedded systems along with electronics and software. We refer to these systems as AI-enhanced embedded systems. Given the strong dependence of AI on data, this expansion also creates a new space for applying data management techniques. Objective: The overall goal of this thesis is to empirically identify the data management challenges encountered during the development and maintenance of AI-enhanced embedded systems, propose an improved data management approach and empirically validate the proposed approach.Method: To achieve the goal, we conducted this research in close collaboration with Software Center companies using a combination of different empirical research methods: case studies, literature reviews, and action research.Results and conclusions: This research provides five main results. First, it identifies key data management challenges specific to Deep Learning models developed at embedded system companies. Second, it examines the practices such as DataOps and data pipelines that help to address data management challenges. We observed that DataOps is the best data management practice that improves the data quality and reduces the time tdevelop data products. The data pipeline is the critical component of DataOps that manages the data life cycle activities. The study also provides the potential faults at each step of the data pipeline and the corresponding mitigation strategies. Finally, the data pipeline model is realized in a small piece of data pipeline and calculated the percentage of saved data dumps through the implementation.Future work: As future work, we plan to realize the conceptual data pipeline model so that companies can build customized robust data pipelines. We also plan to analyze the impact and value of data pipelines in cross-domain AI systems and data applications. We also plan to develop AI-based fault detection and mitigation system suitable for data pipelines

    Data Management in Microservices: State of the Practice, Challenges, and Research Directions

    Full text link
    We are recently witnessing an increased adoption of microservice architectures by the industry for achieving scalability by functional decomposition, fault-tolerance by deployment of small and independent services, and polyglot persistence by the adoption of different database technologies specific to the needs of each service. Despite the accelerating industrial adoption and the extensive research on microservices, there is a lack of thorough investigation on the state of the practice and the major challenges faced by practitioners with regard to data management. To bridge this gap, this paper presents a detailed investigation of data management in microservices. Our exploratory study is based on the following methodology: we conducted a systematic literature review of articles reporting the adoption of microservices in industry, where more than 300 articles were filtered down to 11 representative studies; we analyzed a set of 9 popular open-source microservice-based applications, selected out of more than 20 open-source projects; furthermore, to strengthen our evidence, we conducted an online survey that we then used to cross-validate the findings of the previous steps with the perceptions and experiences of over 120 practitioners and researchers. Through this process, we were able to categorize the state of practice and reveal several principled challenges that cannot be solved by software engineering practices, but rather need system-level support to alleviate the burden of practitioners. Based on the observations we also identified a series of research directions to achieve this goal. Fundamentally, novel database systems and data management tools that support isolation for microservices, which include fault isolation, performance isolation, data ownership, and independent schema evolution across microservices must be built to address the needs of this growing architectural style

    Perspectives On Data-Driven failure diagnosis : With a case study on failure diagnosis at an Payment Service Provider

    Get PDF
    Data-driven failure diagnosis aims to extract relevant information from a dataset in an automatic way. In this paper it is being proposed a data driven model for classifying the transactions of a Payment Service Provider based on relevant shared characteristics that would provide the business users relevant insights about the data analyzed. The proposed solution aims to mimic processes applied in industrial organizations. However, the methods discussed in this paper from these organizations does not directly deal with the human component in information systems. Therefore, the proposed solution aims to offer the relevant error paths to help the business users in their daily tasks while dealing with the human factor in IT systems. The built artifact follow the next set of steps: • Categorization of variables following data mining techniques. • Assignation of importance for variables affecting the transaction process using predictive machine learning method. • Classification of transactions in groups with similar characteristics. The solution developed effectively and consistently classify more than 90% of the faults in the database by grouping them in paths with shared characteristics and with a relevant failure rate. The artifact does not depends in any predefined fault distribution and satisfactorily deal with highly correlated input variables. Therefore, the artifact has a scalable potential if previously, a data mining categorization of variables is performed. Specially, in companies that deals with rigid processes

    Improving the testing of Profit Software's insurance policy database system

    Get PDF
    Profit Software's Profit Life and Pension (PLP) is an investment insurance management system. This means that PLP handles investment insurances from the moment they are sold to when they eventually expire. For a system that handles money, it is important that it can be trusted. Therefore, testing is a required part of PLP's development. This thesis is an investigation into PLP's testing strategy. In this thesis we analyse PLP's current testing strategy to find flaws and impediments. We then offer improvement suggestions to the identified problem areas as well as suggest additions which we found could be beneficial

    Understanding the Issues, Their Causes and Solutions in Microservices Systems: An Empirical Study

    Full text link
    Many small to large organizations have adopted the Microservices Architecture (MSA) style to develop and deliver their core businesses. Despite the popularity of MSA in the software industry, there is a limited evidence-based and thorough understanding of the types of issues (e.g., errors, faults, failures, and bugs) that microservices system developers experience, the causes of the issues, and the solutions as potential fixing strategies to address the issues. To ameliorate this gap, we conducted a mixed-methods empirical study that collected data from 2,641 issues from the issue tracking systems of 15 open-source microservices systems on GitHub, 15 interviews, and an online survey completed by 150 practitioners from 42 countries across 6 continents. Our analysis led to comprehensive taxonomies for the issues, causes, and solutions. The findings of this study inform that Technical Debt, Continuous Integration and Delivery, Exception Handling, Service Execution and Communication, and Security are the most dominant issues in microservices systems. Furthermore, General Programming Errors, Missing Features and Artifacts, and Invalid Configuration and Communication are the main causes behind the issues. Finally, we found 177 types of solutions that can be applied to fix the identified issues. Based on our study results, we formulated future research directions that could help researchers and practitioners to engineer emergent and next-generation microservices systems.Comment: 35 pages, 5 images, 7 tables, Manuscript submitted to a Journal (2023

    Information Collection Platform for Smart Nudging. A Microservice-Based Approach.

    Get PDF
    This thesis aims to explore the problem of integrating heterogeneous data sources into the Smart Nudge system. The Smart Nudge system is a system that produces personalised nudges that are contextually relevant to each user. The system relies on access to live data that could be constructed and presented in specific ways to influence users behaviour towards an agreed-upon goal. The goal is to ascertain the suitability of a microservice-based approach to designing the component that is responsible for integrating various data sources. A small prototype of two microservices provided a practical look at integrating real-world sources, namely a Norwegian weather service and a bus tracking service in Chicago. The proposed architecture is analysed using a set of requirements derived from a theoretical examination of the Smart Nudge system and a general theoretical look at decomposition techniques used to evaluate microservice architectures. Evaluating the prototype revealed that the Smart Nudge system is highly dependant on augmenting data sources with additional meta-data to produce personalised nudges. The analysis indicates that a data-driven microservice-based architecture seems well suited to resolving some of the problems and requirements that are somewhat unique to the Smart Nudge system setting
    • …
    corecore