36,117 research outputs found
Discovering Dynamic Integrity Rules with a Rules-Based Tool for Data Quality Analyzing
Rules based approaches for data quality solutions often use business rules or integrity rules
for data monitoring purpose. Integrity rules are constraints on data derived from business rules into a formal
form in order to allow computerization. One of challenges of these approaches is rules discovering, which is
usually manually made by business experts or system analysts based on experiences. In this paper, we
present our rule-based approach for data quality analyzing, in which we discuss a comprehensive method for
discovering dynamic integrity rules
Discovering Dynamic Integrity Rules with a Rules-Based Tool for Data Quality Analyzing
Rules based approaches for data quality solutions often use business rules or integrity rules
for data monitoring purpose. Integrity rules are constraints on data derived from business rules into a formal
form in order to allow computerization. One of challenges of these approaches is rules discovering, which is
usually manually made by business experts or system analysts based on experiences. In this paper, we
present our rule-based approach for data quality analyzing, in which we discuss a comprehensive method for
discovering dynamic integrity rules
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Who watches the watchers: Validating the ProB Validation Tool
Over the years, ProB has moved from a tool that complemented proving, to a
development environment that is now sometimes used instead of proving for
applications, such as exhaustive model checking or data validation. This has
led to much more stringent requirements on the integrity of ProB. In this paper
we present a summary of our validation efforts for ProB, in particular within
the context of the norm EN 50128 and safety critical applications in the
railway domain.Comment: In Proceedings F-IDE 2014, arXiv:1404.578
A Practical Approach to Protect IoT Devices against Attacks and Compile Security Incident Datasets
open access articleThe Internet of Things (IoT) introduced the opportunity of remotely manipulating home appliances (such as heating systems, ovens, blinds, etc.) using computers and mobile devices. This idea fascinated people and originated a boom of IoT devices together with an increasing demand that was difficult to support. Many manufacturers quickly created hundreds of devices implementing functionalities but neglected some critical issues pertaining to device security. This oversight gave rise to the current situation where thousands of devices remain unpatched having many security issues that manufacturers cannot address after the devices have been produced and deployed. This article presents our novel research protecting IOT devices using Berkeley Packet Filters (BPFs) and evaluates our findings with the aid of our Filter.tlk tool, which is able to facilitate the development of BPF expressions that can be executed by GNU/Linux systems with a low impact on network packet throughput
Mining Techniques For Invariants In Cloud Computing
The increasing popularity of Software as a Service (SaaS) stresses the need of solutions to predict failures and avoid service interruptions, which invariably result in SLA violations and severe loss of revenue. A promising approach to continuously monitor the correct functioning of the system is to check the execution conformance to a set of invariants, i.e., properties that must hold when the system is deemed to run correctly. This paper proposes a technique to spot a true anomalies by the use of various data mining techniques like clustering, association rule and decision tree algorithms help in finding the hidden and previously unknown information from the database. We assess the techniques in two invariants’ applications, namely executions characterization and anomaly detection, using the metrics of coverage, recall and precision. In this work two real-world datasets have been used - the publicly available Google datacenter dataset and a dataset of a commercial SaaS utility computing platform - for detecting the anomalies
Data Mining in Electronic Commerce
Modern business is rushing toward e-commerce. If the transition is done
properly, it enables better management, new services, lower transaction costs
and better customer relations. Success depends on skilled information
technologists, among whom are statisticians. This paper focuses on some of the
contributions that statisticians are making to help change the business world,
especially through the development and application of data mining methods. This
is a very large area, and the topics we cover are chosen to avoid overlap with
other papers in this special issue, as well as to respect the limitations of
our expertise. Inevitably, electronic commerce has raised and is raising fresh
research problems in a very wide range of statistical areas, and we try to
emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …