18,183 research outputs found

    A Framework for Enterprise Knowledge Discovery from Databases

    Get PDF
    Knowledge discovery from large databases has become an emerging research topic and application area in recent years primarily because of the successful introduction of large business information systems to enterprises in the electronic business era. However, transferring subjects/problems from managerial perspective to data mining tasks from information technology perspective requires multidisciplinary domain knowledge. This paper proposes a practical framework for enterprise knowledge discovery in a systematical manner. The six-step framework employs the cause-andeffect diagram to model enterprise processes, tasks and attributes corresponding diagram to define data mining tasks, and multi-criteria method to assess the mined results in the form of association rules. This research also applied the proposed framework to a real case study of knowledge discovery from service records. The mining results have been proven useful in product design and quality improvement and the framework has demonstrated its applicability of guiding an enterprise to discover knowledge from historical data to tackle existing problems

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    A Bayesian Approach to Identify Bitcoin Users

    Get PDF
    Bitcoin is a digital currency and electronic payment system operating over a peer-to-peer network on the Internet. One of its most important properties is the high level of anonymity it provides for its users. The users are identified by their Bitcoin addresses, which are random strings in the public records of transactions, the blockchain. When a user initiates a Bitcoin-transaction, his Bitcoin client program relays messages to other clients through the Bitcoin network. Monitoring the propagation of these messages and analyzing them carefully reveal hidden relations. In this paper, we develop a mathematical model using a probabilistic approach to link Bitcoin addresses and transactions to the originator IP address. To utilize our model, we carried out experiments by installing more than a hundred modified Bitcoin clients distributed in the network to observe as many messages as possible. During a two month observation period we were able to identify several thousand Bitcoin clients and bind their transactions to geographical locations

    Identifying candidate risk factors for prescription drug side effects using causal contrast set mining

    Get PDF
    Big longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates

    Identifying candidate risk factors for prescription drug side effects using causal contrast set mining

    Get PDF
    Big longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates

    Application Areas of Data Mining in Indian Retail Banking Sector

    Get PDF
    Banking systems collect huge amounts of data on day to day basis be it customer information transaction details risk profiles credit card details credit limit and collateral details compliance and Anti Money Laundering AML related information trade finance data SWIFT and telex messages Thousands of decisions are taken in a bank daily These decisions include credit decisions default decisions relationship start up investment decisions AML and Illegal financing related One needs to depend on various reports and drill down tools provided by the banking systems to arrive at these critical decisions But this is a manual process and is error prone and time consuming due to large volume of transactional and historical data Interesting patterns and knowledge can be mined from this huge volume of data that in turn can be used for this decision making process This article explores and reviews various data mining techniques that can be applied in banking areas It provides an overview of data mining techniques and procedures It also provides an insight into how these techniques can be used in banking areas to make the decision making process easier and productiv
    • …
    corecore