Search CORE

32,717 research outputs found

Improving Efficiency of Incremental Mining by Trie Structure and Pre-Large Itemsets

Author: Hong Tzung-Pei
Hwang Dosam
Le Bac
Le Thien-Phuong
Vo Bay
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 04/02/2015
Field of study

Incremental data mining has been discussed widely in recent years, as it has many practical applications, and various incremental mining algorithms have been proposed. Hong et al. proposed an efficient incremental mining algorithm for handling newly inserted transactions by using the concept of pre-large itemsets. The algorithm aimed to reduce the need to rescan the original database and also cut maintenance costs. Recently, Lin et al. proposed the Pre-FUFP algorithm to handle new transactions more efficiently, and make it easier to update the FP-tree. However, frequent itemsets must be mined from the FP-growth algorithm. In this paper, we propose a Pre-FUT algorithm (Fast-Update algorithm using the Trie data structure and the concept of pre-large itemsets), which not only builds and updates the trie structure when new transactions are inserted, but also mines all the frequent itemsets easily from the tree. Experimental results show the good performance of the proposed algorithm

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Stage-specific histone modification profiles reveal global transitions in the Xenopus embryonic epigenome.

Author: Arteaga-Salas Jose M.
David Robert
Imhof Axel
Mentele Edith
Nicetto Dario
Rupp Ralph A. W.
Schneider Tobias D.
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2011
Field of study

Vertebrate embryos are derived from a transitory pool of pluripotent cells. By the process of embryonic induction, these precursor cells are assigned to specific fates and differentiation programs. Histone post-translational modifications are thought to play a key role in the establishment and maintenance of stable gene expression patterns underlying these processes. While on gene level histone modifications are known to change during differentiation, very little is known about the quantitative fluctuations in bulk histone modifications during development. To investigate this issue we analysed histones isolated from four different developmental stages of Xenopus laevis by mass spectrometry. In toto, we quantified 59 modification states on core histones H3 and H4 from blastula to tadpole stages. During this developmental period, we observed in general an increase in the unmodified states, and a shift from histone modifications associated with transcriptional activity to transcriptionally repressive histone marks. We also compared these naturally occurring patterns with the histone modifications of murine ES cells, detecting large differences in the methylation patterns of histone H3 lysines 27 and 36 between pluripotent ES cells and pluripotent cells from Xenopus blastulae. By combining all detected modification transitions we could cluster their patterns according to their embryonic origin, defining specific histone modification profiles (HMPs) for each developmental stage. To our knowledge, this data set represents the first compendium of covalent histone modifications and their quantitative flux during normogenesis in a vertebrate model organism. The HMPs indicate a stepwise maturation of the embryonic epigenome, which may be causal to the progressing restriction of cellular potency during development

Directory of Open Access Journals

Open Access LMU

PubMed Central

Enhanced PL-WAP tree method for incremental mining of sequential patterns.

Author: Chen Min
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2003
Field of study

Sequential mining as web usage mining has been used in improving web site design, increasing volume of e-business and providing marketing decision support. This thesis proposes PL4UP and EPL4UP algorithms which use the PLWAP tree structure to incrementally update sequential patterns. PL4UP does not scan old DB except when previous small 1-itemsets become large in updated database during which time its scans only all transactions in the old database that contain any small itemsets. EPL4UP rebuilds the old PLWAP tree using only the list of previous small itemsets once rather than scanning the entire old database twice like original PLWAP. PL4UP and EPL4UP first update old frequent patterns on the small PLWAP tree built for only the incremented part of the database, then they compare new added patterns generated from the small tree with the old frequent patterns to reduce the number of patterns to be checked on the old PLWAP tree. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .C47. Source: Masters Abstracts International, Volume: 42-03, page: 0959. Adviser: Christie Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2003

Scholarship at UWindsor

Mining Partially-Ordered Sequential Rules Common to Multiple Sequences

Author: Cao L
Fournier-Viger P
Nkambou R
Tseng VS
Wu CW
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

© 2015 IEEE. Sequential rule mining is an important data mining problem with multiple applications. An important limitation of algorithms for mining sequential rules common to multiple sequences is that rules are very specific and therefore many similar rules may represent the same situation. This can cause three major problems: (1) similar rules can be rated quite differently, (2) rules may not be found because they are individually considered uninteresting, and (3) rules that are too specific are less likely to be used for making predictions. To address these issues, we explore the idea of mining "partially-ordered sequential rules" (POSR), a more general form of sequential rules such that items in the antecedent and the consequent of each rule are unordered. To mine POSR, we propose the RuleGrowth algorithm, which is efficient and easily extendable. In particular, we present an extension (TRuleGrowth) that accepts a sliding-window constraint to find rules occurring within a maximum amount of time. A performance study with four real-life datasets show that RuleGrowth and TRuleGrowth have excellent performance and scalability compared to baseline algorithms and that the number of rules discovered can be several orders of magnitude smaller when the sliding-window constraint is applied. Furthermore, we also report results from a real application showing that POSR can provide a much higher prediction accuracy than regular sequential rules for sequence prediction

OPUS - University of Technology Sydney

Mining frequent sequential patterns in data streams using SSM-algorithm.

Author: Monwar Mostafa
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

Frequent sequential mining is the process of discovering frequent sequential patterns in data sequences as found in applications like web log access sequences. In data stream applications, data arrive at high speed rates in a continuous flow. Data stream mining is an online process different from traditional mining. Traditional mining algorithms work on an entire static dataset in order to obtain results while data stream mining algorithms work with continuously arriving data streams. With rapid change in technology, there are many applications that take data as continuous streams. Examples include stock tickers, network traffic measurements, click stream data, data feeds from sensor networks, and telecom call records. Mining frequent sequential patterns on data stream applications contend with many challenges such as limited memory for unlimited data, inability of algorithms to scan infinitely flowing original dataset more than once and to deliver current and accurate result on demand. This thesis proposes SSM-Algorithm (sequential stream mining-algorithm) that delivers frequent sequential patterns in data streams. The concept of this work came from FP-Stream algorithm that delivers time sensitive frequent patterns. Proposed SSM-Algorithm outperforms FP-Stream algorithm by the use of a hash based and two efficient tree based data structures. All incoming streams are handled dynamically to improve memory usage. SSM-Algorithm maintains frequent sequences incrementally and delivers most current result on demand. The introduced algorithm can be deployed to analyze e-commerce data where the primary source of the data is click stream data. (Abstract shortened by UMI.)Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .M668. Source: Masters Abstracts International, Volume: 44-03, page: 1409. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

The cognitive neuroscience of visual working memory

Author: Kaldy Zsuzsa
Sigala Natasha
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Visual working memory allows us to temporarily maintain and manipulate visual information in order to solve a task. The study of the brain mechanisms underlying this function began more than half a century ago, with Scoville and Milner’s (1957) seminal discoveries with amnesic patients. This timely collection of papers brings together diverse perspectives on the cognitive neuroscience of visual working memory from multiple fields that have traditionally been fairly disjointed: human neuroimaging, electrophysiological, behavioural and animal lesion studies, investigating both the developing and the adult brain

Directory of Open Access Books (DOAB)

Sussex Research Online

A Planning Approach to Migrating Domain-specific Legacy Systems into Service Oriented Architecture

Author: Zhang Zhuo
Publication venue: Software Technology Research Laboratory
Publication date: 01/01/2012
Field of study

The planning work prior to implementing an SOA migration project is very important for its success. Up to now, most of this kind of work has been manual work. An SOA migration planning approach based on intelligent information processing methods is addressed to semi-automate the manual work. This thesis will investigate the principle research question: “How can we obtain SOA migration planning schemas (semi-) automatically instead of by traditional manual work in order to determine if legacy software systems should be migrated to SOA computation environment?”. The controlled experiment research method has been adopted for directing research throughout the whole thesis. Data mining methods are used to analyse SOA migration source and migration targets. The mined information will be the supplementation of traditional analysis results. Text similarity measurement methods are used to measure the matching relationship between migration sources and migration targets. It implements the quantitative analysis of matching relationships instead of common qualitative analysis. Concretely, an association rule and sequence pattern mining algorithms are proposed to analyse legacy assets and domain logics for establishing a Service model and a Component model. These two algorithms can mine all motifs with any min-support number without assuming any ordering. It is better than the existing algorithms for establishing Service models and Component models in SOA migration situations. Two matching strategies based on keyword level and superficial semantic levels are described, which can calculate the degree of similarity between legacy components and domain services effectively. Two decision-making methods based on similarity matrix and hybrid information are investigated, which are for creating SOA migration planning schemas. Finally a simple evaluation method is depicted. Two case studies on migrating e-learning legacy systems to SOA have been explored. The results show the proposed approach is encouraging and applicable. Therefore, the SOA migration planning schemas can be created semi-automatically instead of by traditional manual work by using data mining and text similarity measurement methods

De Montfort University Open Research Archive