18,183 research outputs found
A Framework for Enterprise Knowledge Discovery from Databases
Knowledge discovery from large databases has become an emerging research topic and application area in recent years primarily because of the successful introduction of large business information systems to enterprises in the electronic business era. However, transferring subjects/problems from managerial perspective to data mining tasks from information technology perspective requires multidisciplinary domain knowledge. This paper proposes a practical framework for enterprise knowledge discovery in a systematical manner. The six-step framework employs the cause-andeffect diagram to model enterprise processes, tasks and attributes corresponding diagram to define data mining tasks, and multi-criteria method to assess the mined results in the form of association rules. This research also applied the proposed framework to a real case study of knowledge discovery from service records. The mining results have been proven useful in product design and quality improvement and the framework has demonstrated its applicability of guiding an enterprise to discover knowledge from historical data to tackle existing problems
Impliance: A Next Generation Information Management Appliance
ably successful in building a large market and adapting to the changes of the
last three decades, its impact on the broader market of information management
is surprisingly limited. If we were to design an information management system
from scratch, based upon today's requirements and hardware capabilities, would
it look anything like today's database systems?" In this paper, we introduce
Impliance, a next-generation information management system consisting of
hardware and software components integrated to form an easy-to-administer
appliance that can store, retrieve, and analyze all types of structured,
semi-structured, and unstructured information. We first summarize the trends
that will shape information management for the foreseeable future. Those trends
imply three major requirements for Impliance: (1) to be able to store, manage,
and uniformly query all data, not just structured records; (2) to be able to
scale out as the volume of this data grows; and (3) to be simple and robust in
operation. We then describe four key ideas that are uniquely combined in
Impliance to address these requirements, namely the ideas of: (a) integrating
software and off-the-shelf hardware into a generic information appliance; (b)
automatically discovering, organizing, and managing all data - unstructured as
well as structured - in a uniform way; (c) achieving scale-out by exploiting
simple, massive parallel processing, and (d) virtualizing compute and storage
resources to unify, simplify, and streamline the management of Impliance.
Impliance is an ambitious, long-term effort to define simpler, more robust, and
more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
A Bayesian Approach to Identify Bitcoin Users
Bitcoin is a digital currency and electronic payment system operating over a
peer-to-peer network on the Internet. One of its most important properties is
the high level of anonymity it provides for its users. The users are identified
by their Bitcoin addresses, which are random strings in the public records of
transactions, the blockchain. When a user initiates a Bitcoin-transaction, his
Bitcoin client program relays messages to other clients through the Bitcoin
network. Monitoring the propagation of these messages and analyzing them
carefully reveal hidden relations. In this paper, we develop a mathematical
model using a probabilistic approach to link Bitcoin addresses and transactions
to the originator IP address. To utilize our model, we carried out experiments
by installing more than a hundred modified Bitcoin clients distributed in the
network to observe as many messages as possible. During a two month observation
period we were able to identify several thousand Bitcoin clients and bind their
transactions to geographical locations
Identifying candidate risk factors for prescription drug side effects using causal contrast set mining
Big longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates
Identifying candidate risk factors for prescription drug side effects using causal contrast set mining
Big longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates
Application Areas of Data Mining in Indian Retail Banking Sector
Banking systems collect huge amounts of data on day to day basis be it customer information transaction details risk profiles credit card details credit limit and collateral details compliance and Anti Money Laundering AML related information trade finance data SWIFT and telex messages Thousands of decisions are taken in a bank daily These decisions include credit decisions default decisions relationship start up investment decisions AML and Illegal financing related One needs to depend on various reports and drill down tools provided by the banking systems to arrive at these critical decisions But this is a manual process and is error prone and time consuming due to large volume of transactional and historical data Interesting patterns and knowledge can be mined from this huge volume of data that in turn can be used for this decision making process This article explores and reviews various data mining techniques that can be applied in banking areas It provides an overview of data mining techniques and procedures It also provides an insight into how these techniques can be used in banking areas to make the decision making process easier and productiv
- …