631 research outputs found
Towards Platform Independent Database Modelling in Enterprise Systems
Enterprise software systems are prevalent in many organisations, typically they are data-intensive and manage customer, sales, or other important data. When an enterprise system needs to be modernised or migrated (e.g. to the cloud) it is necessary to understand the structure of this data and how it is used. We have developed a tool-supported approach to model database structure, query patterns, and growth patterns. Compared to existing work, our tool offers increased system support and extensibility which is vital for use in industry. Standardisation and platform independence is ensured by producing models conforming to the Knowledge Discovery Metamodel and Software Metrics Metamodel
Evaluating cloud database migration options using workload models
A key challenge in porting enterprise software systems to the cloud is the migration of their database. Choosing a cloud provider and service option (e.g., a database-as-a-service or a manually configured set of virtual machines) typically requires the estimation of the cost and migration duration for each considered option. Many organisations also require this information for budgeting and planning purposes. Existing cloud migration research focuses on the software components, and therefore does not address this need. We introduce a two-stage approach which accurately estimates the migration cost, migration duration and cloud running costs of relational databases. The first stage of our approach obtains workload and structure models of the database to be migrated from database logs and the database schema. The second stage performs a discrete-event simulation using these models to obtain the cost and duration estimates. We implemented software tools that automate both stages of our approach. An extensive evaluation compares the estimates from our approach against results from real-world cloud database migrations
Evaluating Cloud Migration Options for Relational Databases
Migrating the database layer remains a key challenge when moving a software system to a new cloud provider. The database is often very large, poorly documented, and used to store business-critical information. Most cloud providers offer a variety of services for hosting databases and the most suitable choice depends on the database size, workload, performance requirements, cost, and future business plans. Current approaches do not support this decision-making process, leading to errors and inaccurate comparisons between database migration options. The heterogeneity of databases and clouds means organisations often have to develop their own ad-hoc process to compare the suitability of cloud services for their system. This is time consuming, error prone, and costly.
This thesis contributes to addressing these issues by introducing a three-phase methodology for evaluating cloud database migration options. The first phase defines the planning activities, such as, considering downtime tolerance, existing infrastructure, and information sources. The second phase is a novel method for modelling the structure and the workload of the database being migrated. This addresses database heterogeneity by using a multi-dialect SQL grammar and annotated text-to-model transformations. The final phase consumes the models from the second and uses discrete-event simulation to predict migration cost, data transfer duration, and cloud running costs. This involved the extension of the existing CloudSim framework to simulate the data transfer to a new cloud database.
An extensive evaluation was performed to assess the effectiveness of each phase of the methodology and of the tools developed to automate their main steps. The modelling phase was applied to 15 real-world systems, and compared to the leading approach there was a substantial improvement in: performance, model completeness, extensibility, and SQL support. The complete methodology was applied to four migrations of two real-world systems. The results from this showed that the methodology provided significantly improved accuracy over existing approaches
Comprehensible and Robust Knowledge Discovery from Small Datasets
Die Wissensentdeckung in Datenbanken (âKnowledge Discovery in Databasesâ, KDD) zielt darauf ab, nĂŒtzliches Wissen aus Daten zu extrahieren. Daten können eine Reihe
von Messungen aus einem realen Prozess reprÀsentieren oder eine Reihe von Eingabe-
Ausgabe-Werten eines Simulationsmodells. Zwei hĂ€ufig widersprĂŒchliche Anforderungen
an das erworbene Wissen sind, dass es (1) die Daten möglichst exakt zusammenfasst und
(2) in einer gut verstĂ€ndlichen Form vorliegt. EntscheidungsbĂ€ume (âDecision Treesâ) und
Methoden zur Entdeckung von Untergruppen (âSubgroup Discoveryâ) liefern Wissenszusammenfassungen in Form von Hyperrechtecken; diese gelten als gut verstĂ€ndlich.
Um die Bedeutung einer verstÀndlichen Datenzusammenfassung zu demonstrieren,
erforschen wir Dezentrale intelligente Netzsteuerung â ein neues System, das die Bedarfsreaktion in Stromnetzen ohne wesentliche Ănderungen in der Infrastruktur implementiert.
Die bisher durchgefĂŒhrte konventionelle Analyse dieses Systems beschrĂ€nkte sich auf
die BerĂŒcksichtigung identischer Teilnehmer und spiegelte daher die RealitĂ€t nicht ausreichend gut wider. Wir fĂŒhren viele Simulationen mit unterschiedlichen Eingabewerten durch und wenden EntscheidungsbĂ€ume auf die resultierenden Daten an. Mit den daraus resultierenden verstĂ€ndlichen Datenzusammenfassung konnten wir neue Erkenntnisse zum Verhalten der Dezentrale intelligente Netzsteuerung gewinnen.
EntscheidungsbĂ€ume ermöglichen die Beschreibung des Systemverhaltens fĂŒr alle Eingabekombinationen.
Manchmal ist man aber nicht daran interessiert, den gesamten Eingaberaum
zu partitionieren, sondern Bereiche zu finden, die zu bestimmten Ausgabe fĂŒhren
(sog. Untergruppen). Die vorhandenen Algorithmen zum Erkennen von Untergruppen
erfordern normalerweise groĂe Datenmengen, um eine stabile und genaue Ausgabe zu erzielen.
Der Datenerfassungsprozess ist jedoch hÀufig kostspielig. Unser Hauptbeitrag ist die
Verbesserung der Untergruppenerkennung aus DatensÀtzen mit wenigen Beobachtungen.
Die Entdeckung von Untergruppen in simulierten Daten wird als Szenarioerkennung
bezeichnet. Ein hĂ€ufig verwendeter Algorithmus fĂŒr die Szenarioerkennung ist PRIM
(Patient Rule Induction Method). Wir schlagen REDS (Rule Extraction for Discovering
Scenarios) vor, ein neues Verfahren fĂŒr die Szenarioerkennung. FĂŒr REDS, trainieren wir
zuerst ein statistisches Zwischenmodell und verwenden dieses, um eine groĂe Menge
neuer Daten fĂŒr PRIM zu erstellen. Die grundlegende statistische Intuition beschrieben wir
ebenfalls. Experimente zeigen, dass REDS viel besser funktioniert als PRIM fĂŒr sich alleine:
Es reduziert die Anzahl der erforderlichen SimulationslÀufe um 75% im Durchschnitt.
Mit simulierten Daten hat man perfekte Kenntnisse ĂŒber die Eingangsverteilung â eine
Voraussetzung von REDS. Um REDS auf realen Messdaten anwendbar zu machen, haben
wir es mit Stichproben aus einer geschÀtzten multivariate Verteilung der Daten kombiniert.
Wir haben die resultierende Methode in Kombination mit verschiedenen Methoden zur Generierung von Daten experimentell evaluiert. Wir haben dies fĂŒr PRIM und BestInterval â eine weitere reprĂ€sentative Methode zur Erkennung von Untergruppen â gemacht. In den meisten FĂ€llen hat unsere Methodik die QualitĂ€t der entdeckten Untergruppen erhöht
ICSEA 2021: the sixteenth international conference on software engineering advances
The Sixteenth International Conference on Software Engineering Advances (ICSEA 2021), held on October 3 - 7, 2021 in Barcelona, Spain, continued a series of events covering a broad spectrum of software-related topics.
The conference covered fundamentals on designing, implementing, testing, validating and maintaining various kinds of software. The tracks treated the topics from theory to practice, in terms of methodologies, design, implementation, testing, use cases, tools, and lessons learnt. The conference topics covered classical and advanced methodologies, open source, agile software, as well as software deployment and software economics and education.
The conference had the following tracks:
Advances in fundamentals for software development
Advanced mechanisms for software development
Advanced design tools for developing software
Software engineering for service computing (SOA and Cloud)
Advanced facilities for accessing software
Software performance
Software security, privacy, safeness
Advances in software testing
Specialized software advanced applications
Web Accessibility
Open source software
Agile and Lean approaches in software engineering
Software deployment and maintenance
Software engineering techniques, metrics, and formalisms
Software economics, adoption, and education
Business technology
Improving productivity in research on software engineering
Trends and achievements
Similar to the previous edition, this event continued to be very competitive in its selection process and very well perceived by the international software engineering community. As such, it is attracting excellent contributions and active participation from all over the world. We were very pleased to receive a large amount of top quality contributions.
We take here the opportunity to warmly thank all the members of the ICSEA 2021 technical program committee as well as the numerous reviewers. The creation of such a broad and high quality conference program would not have been possible without their involvement. We also kindly thank all the authors that dedicated much of their time and efforts to contribute to the ICSEA 2021. We truly believe that thanks to all these efforts, the final conference program consists of top quality contributions.
This event could also not have been a reality without the support of many individuals, organizations and sponsors. We also gratefully thank the members of the ICSEA 2021 organizing committee for their help in handling the logistics and for their work that is making this professional meeting a success.
We hope the ICSEA 2021 was a successful international forum for the exchange of ideas and results between academia and industry and to promote further progress in software engineering research
Constructing data marts from web sources using a graph common model
At a time when humans and devices are generating more information than ever, activities such as data mining and machine learning become crucial. These activities enable us to understand and interpret the information we have and predict, or better prepare ourselves for, future events. However, activities such as data mining cannot be performed without a layer of data management to clean, integrate, process and make available the necessary datasets. To that extent, large and costly data flow processes such as Extract-Transform-Load are necessary to extract from disparate information sources to generate ready-for-analyses datasets. These datasets are generally in the form of multi-dimensional cubes from which different data views can be extracted for the purpose of different analyses. The process of creating a multi-dimensional cube from integrated data sources is significant. In this research, we present a methodology to generate these cubes automatically or in some cases, close to automatic, requiring very little user interaction. A construct called a StarGraph acts as a canonical model for our system, to which imported data sources are transformed. An ontology-driven process controls the integration of StarGraph schemas and simple OLAP style functions generate the cubes or datasets. An extensive evaluation is carried out using a large number of agri data sources with user-defined case studies to identify sources for integration and the types of analyses required for the final data cubes
Discovering data lineage in data warehouse : methods and techniques for tracing the origins of data in data-warehouse
A data warehouse enables enterprise-wide analysis and reporting functionality that is usually used to support decision-making. Data warehousing system integrates data from different data sources. Typically, the data are extracted from different data sources, then transformed several times and integrated before they are finally
stored in the central repository. The extraction and transformation processes vary widely - both in theory and between solution providers. Some are generic, others are tailored to users' transformation and reporting requirements through hand-coded solutions. Most research related to data integration is focused on this area, i.e., on the transformation of data. Since data in a data warehouse undergo various complex transformation processes, often at many different levels and in many stages, it is very important to be able to ensure the quality of the data that the data warehouse
contains. The objective of this thesis is to study and compare existing approaches (methods and techniques) for tracing data lineage, and to propose a data lineage solution specific to a business enterprise data warehouse
- âŠ