872 research outputs found
Data management and Data Pipelines: An empirical investigation in the embedded systems domain
Context: Companies are increasingly collecting data from all possible sources to extract insights that help in data-driven decision-making. Increased data volume, variety, and velocity and the impact of poor quality data on the development of data products are leading companies to look for an improved data management approach that can accelerate the development of high-quality data products. Further, AI is being applied in a growing number of fields, and thus it is evolving as a horizontal technology. Consequently, AI components are increasingly been integrated into embedded systems along with electronics and software. We refer to these systems as AI-enhanced embedded systems. Given the strong dependence of AI on data, this expansion also creates a new space for applying data management techniques. Objective: The overall goal of this thesis is to empirically identify the data management challenges encountered during the development and maintenance of AI-enhanced embedded systems, propose an improved data management approach and empirically validate the proposed approach.Method: To achieve the goal, we conducted this research in close collaboration with Software Center companies using a combination of different empirical research methods: case studies, literature reviews, and action research.Results and conclusions: This research provides five main results. First, it identifies key data management challenges specific to Deep Learning models developed at embedded system companies. Second, it examines the practices such as DataOps and data pipelines that help to address data management challenges. We observed that DataOps is the best data management practice that improves the data quality and reduces the time tdevelop data products. The data pipeline is the critical component of DataOps that manages the data life cycle activities. The study also provides the potential faults at each step of the data pipeline and the corresponding mitigation strategies. Finally, the data pipeline model is realized in a small piece of data pipeline and calculated the percentage of saved data dumps through the implementation.Future work: As future work, we plan to realize the conceptual data pipeline model so that companies can build customized robust data pipelines. We also plan to analyze the impact and value of data pipelines in cross-domain AI systems and data applications. We also plan to develop AI-based fault detection and mitigation system suitable for data pipelines
Data Management in Microservices: State of the Practice, Challenges, and Research Directions
We are recently witnessing an increased adoption of microservice
architectures by the industry for achieving scalability by functional
decomposition, fault-tolerance by deployment of small and independent services,
and polyglot persistence by the adoption of different database technologies
specific to the needs of each service. Despite the accelerating industrial
adoption and the extensive research on microservices, there is a lack of
thorough investigation on the state of the practice and the major challenges
faced by practitioners with regard to data management. To bridge this gap, this
paper presents a detailed investigation of data management in microservices.
Our exploratory study is based on the following methodology: we conducted a
systematic literature review of articles reporting the adoption of
microservices in industry, where more than 300 articles were filtered down to
11 representative studies; we analyzed a set of 9 popular open-source
microservice-based applications, selected out of more than 20 open-source
projects; furthermore, to strengthen our evidence, we conducted an online
survey that we then used to cross-validate the findings of the previous steps
with the perceptions and experiences of over 120 practitioners and researchers.
Through this process, we were able to categorize the state of practice and
reveal several principled challenges that cannot be solved by software
engineering practices, but rather need system-level support to alleviate the
burden of practitioners. Based on the observations we also identified a series
of research directions to achieve this goal. Fundamentally, novel database
systems and data management tools that support isolation for microservices,
which include fault isolation, performance isolation, data ownership, and
independent schema evolution across microservices must be built to address the
needs of this growing architectural style
Perspectives On Data-Driven failure diagnosis : With a case study on failure diagnosis at an Payment Service Provider
Data-driven failure diagnosis aims to extract relevant information from a dataset in an automatic way. In this paper it is being proposed a data driven model for classifying the transactions of a Payment Service Provider based on relevant shared characteristics that would provide the business users relevant insights about the data analyzed.
The proposed solution aims to mimic processes applied in industrial organizations. However, the methods discussed in this paper from these organizations does not directly deal with the human component in information systems. Therefore, the proposed solution aims to offer the relevant error paths to help the business users in their daily tasks while dealing with the human factor in IT systems. The built artifact follow the next set of steps:
• Categorization of variables following data mining techniques.
• Assignation of importance for variables affecting the transaction process using predictive machine learning method.
• Classification of transactions in groups with similar characteristics.
The solution developed effectively and consistently classify more than 90% of the faults in the database by grouping them in paths with shared characteristics and with a relevant failure rate. The artifact does not depends in any predefined fault distribution and satisfactorily deal with highly correlated input variables. Therefore, the artifact has a scalable potential if previously, a data mining categorization of variables is performed. Specially, in companies that deals with rigid processes
Improving the testing of Profit Software's insurance policy database system
Profit Software's Profit Life and Pension (PLP) is an investment insurance management system. This means that PLP handles investment insurances from the moment they are sold to when they eventually expire. For a system that handles money, it is important that it can be trusted. Therefore, testing is a required part of PLP's development.
This thesis is an investigation into PLP's testing strategy. In this thesis we analyse PLP's current testing strategy to find flaws and impediments. We then offer improvement suggestions to the identified problem areas as well as suggest additions which we found could be beneficial
Understanding the Issues, Their Causes and Solutions in Microservices Systems: An Empirical Study
Many small to large organizations have adopted the Microservices Architecture
(MSA) style to develop and deliver their core businesses. Despite the
popularity of MSA in the software industry, there is a limited evidence-based
and thorough understanding of the types of issues (e.g., errors, faults,
failures, and bugs) that microservices system developers experience, the causes
of the issues, and the solutions as potential fixing strategies to address the
issues. To ameliorate this gap, we conducted a mixed-methods empirical study
that collected data from 2,641 issues from the issue tracking systems of 15
open-source microservices systems on GitHub, 15 interviews, and an online
survey completed by 150 practitioners from 42 countries across 6 continents.
Our analysis led to comprehensive taxonomies for the issues, causes, and
solutions. The findings of this study inform that Technical Debt, Continuous
Integration and Delivery, Exception Handling, Service Execution and
Communication, and Security are the most dominant issues in microservices
systems. Furthermore, General Programming Errors, Missing Features and
Artifacts, and Invalid Configuration and Communication are the main causes
behind the issues. Finally, we found 177 types of solutions that can be applied
to fix the identified issues. Based on our study results, we formulated future
research directions that could help researchers and practitioners to engineer
emergent and next-generation microservices systems.Comment: 35 pages, 5 images, 7 tables, Manuscript submitted to a Journal
(2023
Information Collection Platform for Smart Nudging. A Microservice-Based Approach.
This thesis aims to explore the problem of integrating heterogeneous data sources into the Smart Nudge system. The Smart Nudge system is a system that produces personalised nudges that are contextually relevant to each user. The system relies on access to live data that could be constructed and presented in specific ways to influence users behaviour towards an agreed-upon goal. The goal is to ascertain the suitability of a microservice-based approach to designing the component that is responsible for integrating various data sources. A small prototype of two microservices provided a practical look at integrating real-world sources, namely a Norwegian weather service and a bus tracking service in Chicago. The proposed architecture is analysed using a set of requirements derived from a theoretical examination of the Smart Nudge system and a general theoretical look at decomposition techniques used to evaluate microservice architectures. Evaluating the prototype revealed that the Smart Nudge system is highly dependant on augmenting data sources with additional meta-data to produce personalised nudges. The analysis indicates that a data-driven microservice-based architecture seems well suited to resolving some of the problems and requirements that are somewhat unique to the Smart Nudge system setting
- …