36,117 research outputs found

    Discovering Dynamic Integrity Rules with a Rules-Based Tool for Data Quality Analyzing

    Get PDF
    Rules based approaches for data quality solutions often use business rules or integrity rules for data monitoring purpose. Integrity rules are constraints on data derived from business rules into a formal form in order to allow computerization. One of challenges of these approaches is rules discovering, which is usually manually made by business experts or system analysts based on experiences. In this paper, we present our rule-based approach for data quality analyzing, in which we discuss a comprehensive method for discovering dynamic integrity rules

    Discovering Dynamic Integrity Rules with a Rules-Based Tool for Data Quality Analyzing

    Get PDF
    Rules based approaches for data quality solutions often use business rules or integrity rules for data monitoring purpose. Integrity rules are constraints on data derived from business rules into a formal form in order to allow computerization. One of challenges of these approaches is rules discovering, which is usually manually made by business experts or system analysts based on experiences. In this paper, we present our rule-based approach for data quality analyzing, in which we discuss a comprehensive method for discovering dynamic integrity rules

    Automatic extraction of knowledge from web documents

    Get PDF
    A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Who watches the watchers: Validating the ProB Validation Tool

    Full text link
    Over the years, ProB has moved from a tool that complemented proving, to a development environment that is now sometimes used instead of proving for applications, such as exhaustive model checking or data validation. This has led to much more stringent requirements on the integrity of ProB. In this paper we present a summary of our validation efforts for ProB, in particular within the context of the norm EN 50128 and safety critical applications in the railway domain.Comment: In Proceedings F-IDE 2014, arXiv:1404.578

    A Practical Approach to Protect IoT Devices against Attacks and Compile Security Incident Datasets

    Get PDF
    open access articleThe Internet of Things (IoT) introduced the opportunity of remotely manipulating home appliances (such as heating systems, ovens, blinds, etc.) using computers and mobile devices. This idea fascinated people and originated a boom of IoT devices together with an increasing demand that was difficult to support. Many manufacturers quickly created hundreds of devices implementing functionalities but neglected some critical issues pertaining to device security. This oversight gave rise to the current situation where thousands of devices remain unpatched having many security issues that manufacturers cannot address after the devices have been produced and deployed. This article presents our novel research protecting IOT devices using Berkeley Packet Filters (BPFs) and evaluates our findings with the aid of our Filter.tlk tool, which is able to facilitate the development of BPF expressions that can be executed by GNU/Linux systems with a low impact on network packet throughput

    Mining Techniques For Invariants In Cloud Computing

    Get PDF
    The increasing popularity of Software as a Service (SaaS) stresses the need of solutions to predict failures and avoid service interruptions, which invariably result in SLA violations and severe loss of revenue. A promising approach to continuously monitor the correct functioning of the system is to check the execution conformance to a set of invariants, i.e., properties that must hold when the system is deemed to run correctly. This paper proposes a technique to spot a true anomalies by the use of various data mining techniques like clustering, association rule and decision tree algorithms help in finding the hidden and previously unknown information from the database. We assess the techniques in two invariants’ applications, namely executions characterization and anomaly detection, using the metrics of coverage, recall and precision. In this work two real-world datasets have been used - the publicly available Google datacenter dataset and a dataset of a commercial SaaS utility computing platform - for detecting the anomalies

    Data Mining in Electronic Commerce

    Full text link
    Modern business is rushing toward e-commerce. If the transition is done properly, it enables better management, new services, lower transaction costs and better customer relations. Success depends on skilled information technologists, among whom are statisticians. This paper focuses on some of the contributions that statisticians are making to help change the business world, especially through the development and application of data mining methods. This is a very large area, and the topics we cover are chosen to avoid overlap with other papers in this special issue, as well as to respect the limitations of our expertise. Inevitably, electronic commerce has raised and is raising fresh research problems in a very wide range of statistical areas, and we try to emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore