Search CORE

631 research outputs found

Towards Platform Independent Database Modelling in Enterprise Systems

Author: Calinescu Radu Constantin
Ellison Martyn Holland
Paige Richard Freeman
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2016
Field of study

Enterprise software systems are prevalent in many organisations, typically they are data-intensive and manage customer, sales, or other important data. When an enterprise system needs to be modernised or migrated (e.g. to the cloud) it is necessary to understand the structure of this data and how it is used. We have developed a tool-supported approach to model database structure, query patterns, and growth patterns. Compared to existing work, our tool offers increased system support and extensibility which is vital for use in industry. Standardisation and platform independence is ensured by producing models conforming to the Knowledge Discovery Metamodel and Software Metrics Metamodel

White Rose Research Online

Evaluating cloud database migration options using workload models

Author: Calinescu Radu
Ellison Martyn
Paige Richard F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

A key challenge in porting enterprise software systems to the cloud is the migration of their database. Choosing a cloud provider and service option (e.g., a database-as-a-service or a manually configured set of virtual machines) typically requires the estimation of the cost and migration duration for each considered option. Many organisations also require this information for budgeting and planning purposes. Existing cloud migration research focuses on the software components, and therefore does not address this need. We introduce a two-stage approach which accurately estimates the migration cost, migration duration and cloud running costs of relational databases. The first stage of our approach obtains workload and structure models of the database to be migrated from database logs and the database schema. The second stage performs a discrete-event simulation using these models to obtain the cost and duration estimates. We implemented software tools that automate both stages of our approach. An extensive evaluation compares the estimates from our approach against results from real-world cloud database migrations

Directory of Open Access Journals

White Rose Research Online

Interoperability of Enterprise Software and Applications

Author
Publication venue: Springer
Publication date: 01/01/2005
Field of study

University of Twente Research Information

Evaluating Cloud Migration Options for Relational Databases

Author: Ellison Martyn
Publication venue: University of York
Publication date: 29/09/2017
Field of study

Migrating the database layer remains a key challenge when moving a software system to a new cloud provider. The database is often very large, poorly documented, and used to store business-critical information. Most cloud providers offer a variety of services for hosting databases and the most suitable choice depends on the database size, workload, performance requirements, cost, and future business plans. Current approaches do not support this decision-making process, leading to errors and inaccurate comparisons between database migration options. The heterogeneity of databases and clouds means organisations often have to develop their own ad-hoc process to compare the suitability of cloud services for their system. This is time consuming, error prone, and costly. This thesis contributes to addressing these issues by introducing a three-phase methodology for evaluating cloud database migration options. The first phase defines the planning activities, such as, considering downtime tolerance, existing infrastructure, and information sources. The second phase is a novel method for modelling the structure and the workload of the database being migrated. This addresses database heterogeneity by using a multi-dialect SQL grammar and annotated text-to-model transformations. The final phase consumes the models from the second and uses discrete-event simulation to predict migration cost, data transfer duration, and cloud running costs. This involved the extension of the existing CloudSim framework to simulate the data transfer to a new cloud database. An extensive evaluation was performed to assess the effectiveness of each phase of the methodology and of the tools developed to automate their main steps. The modelling phase was applied to 15 real-world systems, and compared to the leading approach there was a substantial improvement in: performance, model completeness, extensibility, and SQL support. The complete methodology was applied to four migrations of two real-world systems. The results from this showed that the methodology provided significantly improved accuracy over existing approaches

White Rose E-theses Online

Comprehensible and Robust Knowledge Discovery from Small Datasets

Author: Arzamasov Vadim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 24/06/2021
Field of study

Die Wissensentdeckung in Datenbanken (“Knowledge Discovery in Databases”, KDD) zielt darauf ab, nützliches Wissen aus Daten zu extrahieren. Daten können eine Reihe von Messungen aus einem realen Prozess repräsentieren oder eine Reihe von Eingabe- Ausgabe-Werten eines Simulationsmodells. Zwei häufig widersprüchliche Anforderungen an das erworbene Wissen sind, dass es (1) die Daten möglichst exakt zusammenfasst und (2) in einer gut verständlichen Form vorliegt. Entscheidungsbäume (“Decision Trees”) und Methoden zur Entdeckung von Untergruppen (“Subgroup Discovery”) liefern Wissenszusammenfassungen in Form von Hyperrechtecken; diese gelten als gut verständlich. Um die Bedeutung einer verständlichen Datenzusammenfassung zu demonstrieren, erforschen wir Dezentrale intelligente Netzsteuerung — ein neues System, das die Bedarfsreaktion in Stromnetzen ohne wesentliche Änderungen in der Infrastruktur implementiert. Die bisher durchgeführte konventionelle Analyse dieses Systems beschränkte sich auf die Berücksichtigung identischer Teilnehmer und spiegelte daher die Realität nicht ausreichend gut wider. Wir führen viele Simulationen mit unterschiedlichen Eingabewerten durch und wenden Entscheidungsbäume auf die resultierenden Daten an. Mit den daraus resultierenden verständlichen Datenzusammenfassung konnten wir neue Erkenntnisse zum Verhalten der Dezentrale intelligente Netzsteuerung gewinnen. Entscheidungsbäume ermöglichen die Beschreibung des Systemverhaltens für alle Eingabekombinationen. Manchmal ist man aber nicht daran interessiert, den gesamten Eingaberaum zu partitionieren, sondern Bereiche zu finden, die zu bestimmten Ausgabe führen (sog. Untergruppen). Die vorhandenen Algorithmen zum Erkennen von Untergruppen erfordern normalerweise große Datenmengen, um eine stabile und genaue Ausgabe zu erzielen. Der Datenerfassungsprozess ist jedoch häufig kostspielig. Unser Hauptbeitrag ist die Verbesserung der Untergruppenerkennung aus Datensätzen mit wenigen Beobachtungen. Die Entdeckung von Untergruppen in simulierten Daten wird als Szenarioerkennung bezeichnet. Ein häufig verwendeter Algorithmus für die Szenarioerkennung ist PRIM (Patient Rule Induction Method). Wir schlagen REDS (Rule Extraction for Discovering Scenarios) vor, ein neues Verfahren für die Szenarioerkennung. Für REDS, trainieren wir zuerst ein statistisches Zwischenmodell und verwenden dieses, um eine große Menge neuer Daten für PRIM zu erstellen. Die grundlegende statistische Intuition beschrieben wir ebenfalls. Experimente zeigen, dass REDS viel besser funktioniert als PRIM für sich alleine: Es reduziert die Anzahl der erforderlichen Simulationsläufe um 75% im Durchschnitt. Mit simulierten Daten hat man perfekte Kenntnisse über die Eingangsverteilung — eine Voraussetzung von REDS. Um REDS auf realen Messdaten anwendbar zu machen, haben wir es mit Stichproben aus einer geschätzten multivariate Verteilung der Daten kombiniert. Wir haben die resultierende Methode in Kombination mit verschiedenen Methoden zur Generierung von Daten experimentell evaluiert. Wir haben dies für PRIM und BestInterval — eine weitere repräsentative Methode zur Erkennung von Untergruppen — gemacht. In den meisten Fällen hat unsere Methodik die Qualität der entdeckten Untergruppen erhöht

KITopen

ICSEA 2021: the sixteenth international conference on software engineering advances

Author: L. Lavazza H. Washizaki, H. Mannert
Lavazza L.
Mannert H.
Washizaki H.
Publication venue: IARIA
Publication date: 01/01/2021
Field of study

The Sixteenth International Conference on Software Engineering Advances (ICSEA 2021), held on October 3 - 7, 2021 in Barcelona, Spain, continued a series of events covering a broad spectrum of software-related topics. The conference covered fundamentals on designing, implementing, testing, validating and maintaining various kinds of software. The tracks treated the topics from theory to practice, in terms of methodologies, design, implementation, testing, use cases, tools, and lessons learnt. The conference topics covered classical and advanced methodologies, open source, agile software, as well as software deployment and software economics and education. The conference had the following tracks: Advances in fundamentals for software development Advanced mechanisms for software development Advanced design tools for developing software Software engineering for service computing (SOA and Cloud) Advanced facilities for accessing software Software performance Software security, privacy, safeness Advances in software testing Specialized software advanced applications Web Accessibility Open source software Agile and Lean approaches in software engineering Software deployment and maintenance Software engineering techniques, metrics, and formalisms Software economics, adoption, and education Business technology Improving productivity in research on software engineering Trends and achievements Similar to the previous edition, this event continued to be very competitive in its selection process and very well perceived by the international software engineering community. As such, it is attracting excellent contributions and active participation from all over the world. We were very pleased to receive a large amount of top quality contributions. We take here the opportunity to warmly thank all the members of the ICSEA 2021 technical program committee as well as the numerous reviewers. The creation of such a broad and high quality conference program would not have been possible without their involvement. We also kindly thank all the authors that dedicated much of their time and efforts to contribute to the ICSEA 2021. We truly believe that thanks to all these efforts, the final conference program consists of top quality contributions. This event could also not have been a reality without the support of many individuals, organizations and sponsors. We also gratefully thank the members of the ICSEA 2021 organizing committee for their help in handling the logistics and for their work that is making this professional meeting a success. We hope the ICSEA 2021 was a successful international forum for the exchange of ideas and results between academia and industry and to promote further progress in software engineering research

Archivio istituzionale della ricerca - Università dell'Insubria

Constructing data marts from web sources using a graph common model

Author: Scriney Michael
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2018
Field of study

At a time when humans and devices are generating more information than ever, activities such as data mining and machine learning become crucial. These activities enable us to understand and interpret the information we have and predict, or better prepare ourselves for, future events. However, activities such as data mining cannot be performed without a layer of data management to clean, integrate, process and make available the necessary datasets. To that extent, large and costly data flow processes such as Extract-Transform-Load are necessary to extract from disparate information sources to generate ready-for-analyses datasets. These datasets are generally in the form of multi-dimensional cubes from which different data views can be extracted for the purpose of different analyses. The process of creating a multi-dimensional cube from integrated data sources is significant. In this research, we present a methodology to generate these cubes automatically or in some cases, close to automatic, requiring very little user interaction. A construct called a StarGraph acts as a canonical model for our system, to which imported data sources are transformed. An ontology-driven process controls the integration of StarGraph schemas and simple OLAP style functions generate the cubes or datasets. An extensive evaluation is carried out using a large number of agri data sources with user-defined case studies to identify sources for integration and the types of analyses required for the final data cubes

Irish Universities

DCU Online Research Access Service

Discovering data lineage in data warehouse : methods and techniques for tracing the origins of data in data-warehouse

Author: Webjørnsen Roselie Bandibas
Publication venue
Publication date: 01/01/2005
Field of study

A data warehouse enables enterprise-wide analysis and reporting functionality that is usually used to support decision-making. Data warehousing system integrates data from different data sources. Typically, the data are extracted from different data sources, then transformed several times and integrated before they are finally stored in the central repository. The extraction and transformation processes vary widely - both in theory and between solution providers. Some are generic, others are tailored to users' transformation and reporting requirements through hand-coded solutions. Most research related to data integration is focused on this area, i.e., on the transformation of data. Since data in a data warehouse undergo various complex transformation processes, often at many different levels and in many stages, it is very important to be able to ensure the quality of the data that the data warehouse contains. The objective of this thesis is to study and compare existing approaches (methods and techniques) for tracing data lineage, and to propose a data lineage solution specific to a business enterprise data warehouse

NORA - Norwegian Open Research Archives

Model analytics and management

Author: Babur Ö.
Publication venue: Technische Universiteit Eindhoven
Publication date: 20/02/2019
Field of study

Pure OAI Repository