60,967 research outputs found
Combining Objects with Rules to Represent Aggregation Knowledge in Data Warehouse and OLAP Systems
Data warehouses are based on multidimensional modeling. Using On-Line Analytical Processing (OLAP) tools, decision makers navigate through and analyze multidimensional data. Typically, users need to analyze data at different aggregation levels (using roll-up and drill-down functions). Therefore, aggregation knowledge should be adequately represented in conceptual multidimensional models, and mapped in subsequent logical and physical models. However, current conceptual multidimensional models poorly represent aggregation knowledge, which (1) has a complex structure and dynamics and (2) is highly contextual. In order to account for the characteristics of this knowledge, we propose to represent it with objects (UML class diagrams) and rules in Production Rule Representation (PRR) language. Static aggregation knowledge is represented in the class diagrams, while rules represent the dynamics (i.e. how aggregation may be performed depending on context). We present the class diagrams, and a typology and examples of associated rules. We argue that this representation of aggregation knowledge allows an early modeling of user requirements in a data warehouse project.Aggregation; Conceptual Multidimensional Model; Data Warehouse; On-line Analytical Processing (OLAP); Production Rule; UML
Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor
Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis & treatment of disease from the clinical dataset is therefore increasingly becoming necessary. Aim of this study was to assess the applicability of knowledge discovery in brain tumor data warehouse, applying data mining techniques for investigation of clinical parameters that can be associated with occurrence of brain tumor. In this study, a brain tumor warehouse was developed comprising of clinical data for 550 patients. Apriori association rule algorithm was applied to discover associative rules among the clinical parameters. The rules discovered in the study suggests - high values of Creatinine, Blood Urea Nitrogen (BUN), SGOT & SGPT to be directly associated with tumor occurrence for patients in the primary stage with atleast 85% confidence and more than 50% support. A normalized regression model is proposed based on these parameters along with Haemoglobin content, Alkaline Phosphatase and Serum Bilirubin for prediction of occurrence of STATE (brain tumor) as 0 (absent) or 1 (present). The results indicate that the methodology followed will be of good value for the diagnostic procedure of brain tumor, especially when large data volumes are involved and screening based on discovered parameters would allow clinicians to detect tumors at an early stage of development
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
High-Level Object Oriented Genetic Programming in Logistic Warehouse Optimization
DisertaÄŤnĂ práce je zaměřena na optimalizaci prĹŻbÄ›hu pracovnĂch operacĂ v logistickĂ˝ch skladech a distribuÄŤnĂch centrech. HlavnĂm cĂlem je optimalizovat procesy plánovánĂ, rozvrhovánĂ a odbavovánĂ. JelikoĹľ jde o problĂ©m patĹ™ĂcĂ do tĹ™Ădy sloĹľitosti NP-teĹľkĂ˝, je vĂ˝poÄŤetnÄ› velmi nároÄŤnĂ© nalĂ©zt optimálnĂ Ĺ™ešenĂ. MotivacĂ pro Ĺ™ešenĂ tĂ©to práce je vyplnÄ›nĂ pomyslnĂ© mezery mezi metodami zkoumanĂ˝mi na vÄ›deckĂ© a akademickĂ© pĹŻdÄ› a metodami pouĹľĂvanĂ˝mi v produkÄŤnĂch komerÄŤnĂch prostĹ™edĂch. Jádro optimalizaÄŤnĂho algoritmu je zaloĹľeno na základÄ› genetickĂ©ho programovánĂ Ĺ™ĂzenĂ©ho bezkontextovou gramatikou. HlavnĂm pĹ™Ănosem tĂ©to práce je a) navrhnout novĂ˝ optimalizaÄŤnĂ algoritmus, kterĂ˝ respektuje následujĂcĂ optimalizaÄŤnĂ podmĂnky: celkovĂ˝ ÄŤas zpracovánĂ, vyuĹľitĂ zdrojĹŻ, a zahlcenĂ skladovĂ˝ch uliÄŤek, kterĂ© mĹŻĹľe nastat bÄ›hem zpracovánĂ ĂşkolĹŻ, b) analyzovat historická data z provozu skladu a vyvinout sadu testovacĂch pĹ™ĂkladĹŻ, kterĂ© mohou slouĹľit jako referenÄŤnĂ vĂ˝sledky pro dalšà vĂ˝zkum, a dále c) pokusit se pĹ™edÄŤit stanovenĂ© referenÄŤnĂ vĂ˝sledky dosaĹľenĂ© kvalifikovanĂ˝m a trĂ©novanĂ˝m operaÄŤnĂm manaĹľerem jednoho z nejvÄ›tšĂch skladĹŻ ve stĹ™ednĂ EvropÄ›.This work is focused on the work-flow optimization in logistic warehouses and distribution centers. The main aim is to optimize process planning, scheduling, and dispatching. The problem is quite accented in recent years. The problem is of NP hard class of problems and where is very computationally demanding to find an optimal solution. The main motivation for solving this problem is to fill the gap between the new optimization methods developed by researchers in academic world and the methods used in business world. The core of the optimization algorithm is built on the genetic programming driven by the context-free grammar. The main contribution of the thesis is a) to propose a new optimization algorithm which respects the makespan, the utilization, and the congestions of aisles which may occur, b) to analyze historical operational data from warehouse and to develop the set of benchmarks which could serve as the reference baseline results for further research, and c) to try outperform the baseline results set by the skilled and trained operational manager of the one of the biggest warehouses in the middle Europe.
Mining Bad Credit Card Accounts from OLAP and OLTP
Credit card companies classify accounts as a good or bad based on historical
data where a bad account may default on payments in the near future. If an
account is classified as a bad account, then further action can be taken to
investigate the actual nature of the account and take preventive actions. In
addition, marking an account as "good" when it is actually bad, could lead to
loss of revenue - and marking an account as "bad" when it is actually good,
could lead to loss of business. However, detecting bad credit card accounts in
real time from Online Transaction Processing (OLTP) data is challenging due to
the volume of data needed to be processed to compute the risk factor. We
propose an approach which precomputes and maintains the risk probability of an
account based on historical transactions data from offline data or data from a
data warehouse. Furthermore, using the most recent OLTP transactional data,
risk probability is calculated for the latest transaction and combined with the
previously computed risk probability from the data warehouse. If accumulated
risk probability crosses a predefined threshold, then the account is treated as
a bad account and is flagged for manual verification.Comment: Conference proceedings of ICCDA, 201
An Intelligent Data Mining System to Detect Health Care Fraud
The chapter begins with an overview of the types of healthcare fraud. Next, there is a brief discussion of issues with the current fraud detection approaches. The chapter then develops information technology based approaches and illustrates how these technologies can improve current practice. Finally, there is a summary of the major findings and the implications for healthcare practice
Building XML data warehouse based on frequent patterns in user queries
[Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources
- …