20,031 research outputs found

    Finding Temporal Patterns in Noisy Longitudinal Data: A Study in Diabetic Retinopathy

    Get PDF
    This paper describes an approach to temporal pattern mining using the concept of user defined temporal prototypes to define the nature of the trends of interests. The temporal patterns are defined in terms of sequences of support values associated with identified frequent patterns. The prototypes are defined mathematically so that they can be mapped onto the temporal patterns. The focus for the advocated temporal pattern mining process is a large longitudinal patient database collected as part of a diabetic retinopathy screening programme, The data set is, in itself, also of interest as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The diabetic retinopathy application, the data warehousing and cleaning process, and the frequent pattern mining procedure (together with the application of the prototype concept) are all described in the paper. An evaluation of the frequent pattern mining process is also presented

    Data Mining in a Multidimensional Environment

    Get PDF
    Data Mining and Data Warehousing are two hot topics in the database research area. Until recently, conventional data mining algorithms were primarily developed for a relational environment. But a data warehouse database is based on a multidimensional model. In our paper we apply this basis for a seamless integration of data mining in the multidimensional model for the example of discovering association rules. Furthermore, we propose this method as a userguided technique because of the clear structure both of model and data. We present both the theoretical basis and efficient algorithms for data mining in the multidimensional data model. Our approach uses directly the requirements of dimensions, classifications and sparsity of the cube. Additionally we give heuristics for optimizing the search for rules

    OMARS: The Framework of an Online Multi-Dimensional Association Rules Mining System

    Get PDF
    Recently, the integration of data warehouses and data mining has been recognized as the primary platform for facilitating knowledge discovery. Effective data mining from data warehouses, however, needs exploratory data analysis. The users often need to investigate the warehousing data from various perspectives and analyze them at different levels of abstraction. To this end, comprehensive information processing and data analysis have to be systematically constructed surrounding data warehouses, and an on-line mining environment should be provided. In this paper, we propose a system framework to facilitate on-line association rules mining, called OMARS, which is based on the idea of integrating OLAP service and our proposed OLAM cubes and auxiliary cubes. According to the concept of OLAM cubes, we define the OLAM lattice framework that exploit arbitrary hierarchies of dimensions to model all possible OLAM data cubes

    Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

    Full text link
    The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Graph-based Modelling of Concurrent Sequential Patterns

    Get PDF
    Structural relation patterns have been introduced recently to extend the search for complex patterns often hidden behind large sequences of data. This has motivated a novel approach to sequential patterns post-processing and a corresponding data mining method was proposed for Concurrent Sequential Patterns (ConSP). This article refines the approach in the context of ConSP modelling, where a companion graph-based model is devised as an extension of previous work. Two new modelling methods are presented here together with a construction algorithm, to complete the transformation of concurrent sequential patterns to a ConSP-Graph representation. Customer orders data is used to demonstrate the effectiveness of ConSP mining while synthetic sample data highlights the strength of the modelling technique, illuminating the theories developed

    Sequential Patterns Post-processing for Structural Relation Patterns Mining

    Get PDF
    Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences. This article begins with the introduction of a model for the representation of sequential patterns—Sequential Patterns Graph—which motivates the search for new structural relation patterns. An integrative framework for the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides an efficient method for structural knowledge discover
    • …
    corecore