Search CORE

36 research outputs found

Efficient itemset generator discovery over a stream sliding window

Author: Chuancong Gao
Jianyong Wang
Publication venue
Publication date: 01/01/2009
Field of study

ABSTRACT Mining generator patterns has raised great research interest in recent years. The main purpose of mining itemset generators is that they can form equivalence classes together with closed itemsets, and can be used to generate simple classification rules according to the MDL principle. In this paper, we devise an efficient algorithm called StreamGen to mine frequent itemset generators over a stream sliding window. We adopt a novel enumeration tree structure to help keep the information of mined generators and the border between generators and non-generators, and propose some optimization techniques to speed up the mining process. We further extend the algorithm to directly mine a set of high quality classification rules over stream sliding windows while keeping high performance. The extensive performance study shows that our algorithm outperforms other state-of-the-art algorithms which perform similar tasks in terms of both runtime and memory usage efficiency, and has high utility in terms of classification

CiteSeerX

Mining Time-Changing Data Streams

Author: Tao Yingying
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

Streaming data have gained considerable attention in database and data mining communities because of the emergence of a class of applications, such as financial marketing, sensor networks, internet IP monitoring, and telecommunications that produce these data. Data streams have some unique characteristics that are not exhibited by traditional data: unbounded, fast-arriving, and time-changing. Traditional data mining techniques that make multiple passes over data or that ignore distribution changes are not applicable to dynamic data streams. Mining data streams has been an active research area to address requirements of the streaming applications. This thesis focuses on developing techniques for distribution change detection and mining time-changing data streams. Two techniques are proposed that can detect distribution changes in generic data streams. One approach for tackling one of the most popular stream mining tasks, frequent itemsets mining, is also presented in this thesis. All the proposed techniques are implemented and empirically studied. Experimental results show that the proposed techniques can achieve promising performance for detecting changes and mining dynamic data streams

University of Waterloo's Institutional Repository

Mining semi-structured data, theoretical and experimental aspects of pattern evaluation

Author: Graaf E.H. de
Publication venue: Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University
Publication date: 29/10/2008
Field of study

In dit proefschrift worden verschillende manieren onderzocht om semi-gestructureerde gegevens te analyseren, bijv. HTML bestanden. HTML bestanden hebben een structuur/opbouw, maar waar en hoe vaak je een tekst bold of italic maakt varieert voor elke HTML. Er is gekeken naar verschillende manieren om de voorkomens van een patroon (bijvoorbeeld alle moleculen in onze dataset bevatten een bepaalde set van atomen en verbindingen) te tellen om zo interessante patronen te vinden. Het juist presenteren van de resultaten aan de gebruiker is ook van belang. Dit proefschrift behandelt de visuele weergave van resultaten van de analyse (mining) van semi-gestructureerde gegevens, zodat de gebruiker eenvoudiger interessante patronen kan vinden. De conclusies zijn moeilijk kort samen te vatten. Echter het blijkt dat sommige patronen interessanter waren wanneer zij heel vlak achter elkaar voorkwamen en andere wanneer zij bijvoorbeeld wekelijks voorkwamen. Om nog meer interessante patronen te vinden is het aan te raden rekening te houden met dit element van tijd. Verder blijkt het dat visualisaties nodig zijn om de grote hoeveelheid patronen effectief te presenteren, bijvoorbeeld de gebruiker ziet in __n oog opslag substructuren van moleculen die voorkomen. Het onderzoek in dit proefschrift is belangrijk voor de analyse van data. Denk bijvoorbeeld aan de analyse van het gedrag van klanten. Het is interessant voor bedrijven om te weten dat klanten bepaalde producten aanschaffen bijvoorbeeld elke maandag. Dit is vernieuwend omdat wij subgroepen van producten ontdekken, maar wij tellen subgroepen met de juiste eigenschappen voor tijd zwaarder dan subgroepen die gewoon zomaar voorkomen. De visualisatie van samen voorkomende molecuul substructuren kan de analyse van deze versnellen en deze manier van plotten is nieuw.UBL - phd migration 201

Leiden University Scholary Publications

Predictive trend mining for social network analysis

Author: Nohuddin Puteri
Publication venue
Publication date
Field of study

This thesis describes research work within the theme of trend mining as applied to social network data. Trend mining is a type of temporal data mining that provides observation into how information changes over time. In the context of the work described in this thesis the focus is on how information contained in social networks changes with time. The work described proposes a number of data mining based techniques directed at mechanisms to not only detect change, but also support the analysis of change, with respect to social network data. To this end a trend mining framework is proposed to act as a vehicle for evaluating the ideas presented in this thesis. The framework is called the Predictive Trend Mining Framework (PTMF). It is designed to support "end-to-end" social network trend mining and analysis. The work described in this thesis is divided into two elements: Frequent Pattern Trend Analysis (FPTA) and Prediction Modeling (PM). For evaluation purposes three social network datasets have been considered: Great Britain Cattle Movement, Deeside Insurance and Malaysian Armed Forces Logistic Cargo. The evaluation indicates that a sound mechanism for identifying and analysing trends, and for using this trend knowledge for prediction purposes, has been established

University of Liverpool Repository

Unsupervised learning for anomaly detection in Australian medical payment data

Author: Kemp James
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately

A 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of

A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel

UNSWorks

Space-Efficient String Mining under Frequency Constraints

Author: Fischer Johannes
Mäkinen Veli
Välimäki Niko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2008
Field of study

Let D1 and D2 be two databases (i.e. multisets) of d strings, over an alphabet S, with overall length n. We study the problem of mining discriminative patterns between D1 and D2, e.g., patterns that are frequent in one database but not in the other, emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is O(n log n) bits, which is not optimal for |S| << n (in particular for constant |S|), as the databases themselves occupy only n log |S| bits. Because in many real-life applications space is a more critical resource than time, the aim of this article is to reduce the space, at the cost of an increased running time. In particular, we give a solution for the above problems that uses O(n log n+d log n) bits, while the time requirement is increased from the optimal linear time to O(n log n). Our new method is tested extensively on a biologically relevant datasets and shown to be usable even on a genome-scale data

Crossref

Helsingin yliopiston digitaalinen arkisto

Advances in knowledge discovery and data mining Part II

Author: CAO Tru
CHEUNG David Wai-Lok
HO Tu-Bao
LIM Ee Peng
MOTODA Hiroshi
ZHOU Zhi-Hua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

Institutional Knowledge at Singapore Management University

HKU Scholars Hub

Unsupervised monitoring of an elderly person\u27s activities of daily living using Kinect sensors and a power meter

Author: Pazhoumand-Dar Hossein
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2017
Field of study

The need for greater independence amongst the growing population of elderly people has made the concept of “ageing in place” an important area of research. Remote home monitoring strategies help the elderly deal with challenges involved in ageing in place and performing the activities of daily living (ADLs) independently. These monitoring approaches typically involve the use of several sensors, attached to the environment or person, in order to acquire data about the ADLs of the occupant being monitored. Some key drawbacks associated with many of the ADL monitoring approaches proposed for the elderly living alone need to be addressed. These include the need to label a training dataset of activities, use wearable devices or equip the house with many sensors. These approaches are also unable to concurrently monitor physical ADLs to detect emergency situations, such as falls, and instrumental ADLs to detect deviations from the daily routine. These are all indicative of deteriorating health in the elderly. To address these drawbacks, this research aimed to investigate the feasibility of unsupervised monitoring of both physical and instrumental ADLs of elderly people living alone via inexpensive minimally intrusive sensors. A hybrid framework was presented which combined two approaches for monitoring an elderly occupant’s physical and instrumental ADLs. Both approaches were trained based on unlabelled sensor data from the occupant’s normal behaviours. The data related to physical ADLs were captured from Kinect sensors and those related to instrumental ADLs were obtained using a combination of Kinect sensors and a power meter. Kinect sensors were employed in functional areas of the monitored environment to capture the occupant’s locations and 3D structures of their physical activities. The power meter measured the power consumption of home electrical appliances (HEAs) from the electricity panel. A novel unsupervised fuzzy approach was presented to monitor physical ADLs based on depth maps obtained from Kinect sensors. Epochs of activities associated with each monitored location were automatically identified, and the occupant’s behaviour patterns during each epoch were represented through the combinations of fuzzy attributes. A novel membership function generation technique was presented to elicit membership functions for attributes by analysing the data distribution of attributes while excluding noise and outliers in the data. The occupant’s behaviour patterns during each epoch of activity were then classified into frequent and infrequent categories using a data mining technique. Fuzzy rules were learned to model frequent behaviour patterns. An alarm was raised when the occupant’s behaviour in new data was recognised as frequent with a longer than usual duration or infrequent with a duration exceeding a data-driven value. Another novel unsupervised fuzzy approach to monitor instrumental ADLs took unlabelled training data from Kinect sensors and a power meter to model the key features of instrumental ADLs. Instrumental ADLs in the training dataset were identified based on associating the occupant’s locations with specific power signatures on the power line. A set of fuzzy rules was then developed to model the frequency and regularity of the instrumental activities tailored to the occupant. This set was subsequently used to monitor new data and to generate reports on deviations from normal behaviour patterns. As a proof of concept, the proposed monitoring approaches were evaluated using a dataset collected from a real-life setting. An evaluation of the results verified the high accuracy of the proposed technique to identify the epochs of activities over alternative techniques. The approach adopted for monitoring physical ADLs was found to improve elderly monitoring. It generated fuzzy rules that could represent the person’s physical ADLs and exclude noise and outliers in the data more efficiently than alternative approaches. The performance of different membership function generation techniques was compared. The fuzzy rule set obtained from the output of the proposed technique could accurately classify more scenarios of normal and abnormal behaviours. The approach for monitoring instrumental ADLs was also found to reliably distinguish power signatures generated automatically by self-regulated devices from those generated as a result of an elderly person’s instrumental ADLs. The evaluations also showed the effectiveness of the approach in correctly identifying elderly people’s interactions with specific HEAs and tracking simulated upward and downward deviations from normal behaviours. The fuzzy inference system in this approach was found to be robust in regards to errors when identifying instrumental ADLs as it could effectively classify normal and abnormal behaviour patterns despite errors in the list of the used HEAs

Research Online @ ECU

Recommended from our members

UNDERSTANDING CONDITIONAL MODES OF ACTIONS IN CHEMICAL-INDUCED TOXICITY USING RULE MODELS

Author: Mahmoud Samar
Publication venue: University of Cambridge
Publication date: 29/07/2019
Field of study

It is estimated that 115 million animals are used in experimental testing each year. Hence, shifting efforts toward alternative methods for toxicity assessment is essential. However, slow regulatory acceptance of new approaches is governed by knowledge gaps in toxicity modes of action. In this thesis, I describe these challenges and the use of in vitro screening as an alternative of animal testing. I also discuss common data-based methods to derive hypotheses about toxicity modes of actions, and the associated limitations in capturing multiple biological perturbations. I applied novel data-based workflows, using rule models, to prioritize in vitro assays predictive of toxicity as well as to detect significant polypharmacology profiles. I explain how constraints were applied to rule-based models to inform meaningful mechanistic interpretation for two toxicity endpoints: rat hepatotoxicity and acute toxicity. I compared assays selected, by rules, for predicting hepatotoxicity with endpoints used in in vitro models from commercial sources. An overlap was observed including cytochrome activity, mitochondrial toxicity and immunological responses. However, nuclear receptor activity, identified in rules, is not currently covered in commercial setups. I also demonstrate that endocrine disruption endpoints extrapolate better into in vivo toxicity when a set of specific conditions are met, such as physicochemical properties associated with good bioavailability. Next, I examined synergistic interactions between conditions in rules describing acute toxicity. I gained novel insights into how specific stressors potentiate the perturbation by known key events, such as acetylcholinesterase inhibition and neuro-signalling disruption. I show that examining polypharmacology profiles is particularly important at low bioactive potencies. Further, the overall predictive performance of rules describing acute toxicity was tested against a benchmark Random Forest model in a conformal prediction framework. Irrespective to the data type used in the training, the models were prone to bias over compounds promiscuity, by which high promiscuous compounds were more likely to be predicted as toxic. Overall, the studies conducted in this thesis provide novel insights into molecular mechanisms of toxicity, namely hepatotoxicity and acute toxicity, and with regards to chemical properties and polypharmacology. This knowledge can be used to improve the utility and design of alternative methods for toxicity, and hence, accelerate the regulatory acceptance.Islamic Development Bank Cambridge Trust Fun

Apollo (Cambridge)