194 research outputs found
Requirement-driven creation and deployment of multidimensional and ETL designs
We present our tool for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.Peer ReviewedPostprint (author's final draft
DBMSs Should Talk Back Too
Natural language user interfaces to database systems have been studied for
several decades now. They have mainly focused on parsing and interpreting
natural language queries to generate them in a formal database language. We
envision the reverse functionality, where the system would be able to take the
internal result of that translation, say in SQL form, translate it back into
natural language, and show it to the initiator of the query for verification.
Likewise, information extraction has received considerable attention in the
past ten years or so, identifying structured information in free text so that
it may then be stored appropriately and queried. Validation of the records
stored with a backward translation into text would again be very powerful.
Verification and validation of query and data input of a database system
correspond to just one example of the many important applications that would
benefit greatly from having mature techniques for translating such database
constructs into free-flowing text. The problem appears to be deceivingly
simple, as there are no ambiguities or other complications in interpreting
internal database elements, so initially a straightforward translation appears
adequate. Reality teaches us quite the opposite, however, as the resulting text
should be expressive, i.e., accurate in capturing the underlying queries or
data, and effective, i.e., allowing fast and unique interpretation of them.
Achieving both of these qualities is very difficult and raises several
technical challenges that need to be addressed. In this paper, we first expose
the reader to several situations and applications that need translation into
natural language, thereby, motivating the problem. We then outline, by example,
the research problems that need to be solved, separately for data translations
and query translations.Comment: CIDR 200
Effectiveness of Economic Adjustment Programmes for Debt Crises Implemented in the Southern European Union Countries
This article addresses the effectiveness of the economic adjustment programmes for debt crises implemented in the southern European Union countries, a rather contemporary, as well as disputable, issue. All South-European countries that faced a debt crisis had already adopted the European single currency – Euro. Our literature review depicts contemporary research work on debt crises, their economic and social implications either generally or, more relative to our work, South-European-country specific. Our research work is based on a wide range of statistical indices, in an effort to appreciate the effectiveness of the economic adjustment programmes, holistically. The countries addressed were Greece, Portugal, Spain and Cyprus. The applied statistical indices were grouped in six pillars that are considered to be essential to social prosperity. These pillars are financial prosperity, employment, healthcare, education, governance and entrepreneurship. All data were eventually incorporated in a single index, namely `Social Prosperity Index', in an attempt to attain a holistic view on the effectiveness of these programmes. This approach contradicts the mainstream approach of pure financially oriented assessments. Portugal scores first in this appraisal – not only fully recovering but even improving social prosperity standards for its citizens – followed closely by Spain and Cyprus. Greece recorded the worst classification, albeit the index is recovering to pro-crisis levels. Our empirical results suggest that these programmes had a significant impact on the countries that were implemented. In solely financial terms, the programmes proved to be quite effective for all countries. However, their effectiveness is rather questionable if we take into consideration all pillars of social prosperity. The most problematic pillar is employment, which challenges governments and especially their citizens. European and sovereign policies must urgently address employment problems, whereas economists are already talking about a `lost generation'.
Keywords: sovereign debt crisis, Euro, social prosperity, economic adjustment programmes, South Europe
 
Adversarial Learning in Real-World Fraud Detection: Challenges and Perspectives
Data economy relies on data-driven systems and complex machine learning
applications are fueled by them. Unfortunately, however, machine learning
models are exposed to fraudulent activities and adversarial attacks, which
threaten their security and trustworthiness. In the last decade or so, the
research interest on adversarial machine learning has grown significantly,
revealing how learning applications could be severely impacted by effective
attacks. Although early results of adversarial machine learning indicate the
huge potential of the approach to specific domains such as image processing,
still there is a gap in both the research literature and practice regarding how
to generalize adversarial techniques in other domains and applications. Fraud
detection is a critical defense mechanism for data economy, as it is for other
applications as well, which poses several challenges for machine learning. In
this work, we describe how attacks against fraud detection systems differ from
other applications of adversarial machine learning, and propose a number of
interesting directions to bridge this gap
GEM: requirement-driven generation of ETL and multidimensional conceptual designs
Technical ReportAt the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an errorprone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is
of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual
representation of the extract-transform-load (ETL)processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we
describe a method that combines information about the data sources along with the business requirements, for validating
and completing –if necessary– these requirements, producing a multidimensional design, and identifying the ETL operations
needed. We present our method in terms of the
TPC-DS benchmark and show its applicability and usefulness.Preprin
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
Quality measures for ETL processes: from goals to implementation
Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft
- …