156 research outputs found

    A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

    Get PDF
    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method

    Mining Order-Preserving Submatrices from Data with Repeated Measurements

    Get PDF
    published_or_final_versio

    Computational Aesthetics and Identification of Working Style

    Get PDF
    Tänapäeval kasutab meeletu hulk ettevõtteid protsessimudelitel põhinevate äriprotsesside haldamiseks, teostamiseks, monitoorimiseks ja analüüsimiseks protsessiteadlikke infosüsteeme. Lisaks genereerivad need tarkvarasüsteemid monitoorimisetapi osana ka sündmuste logisid, mis kujutavad endast tegelikku faktidest tuletatud (aposteriori) töövoogu ning neid analüüsitakse protsessiandmete hankimise tehnikate abil. Selles töös, osana protsessiandmete hankimisest, tutvustame tööstiili kontseptsiooni töö olemuse kõikehõlmava analüüsi tööriistana. Äriprotsesse ja komponentidevahelist vastastikust sõltuvust saab hinnata tööstiili perspektiivist, mis väljendub meetmetes ja mustrites. Defineerime uuendusliku sündmuste logi esitlemise lähenemise, kus logifaili käsitletakse kujutisena. Lisaks pakume välja meetmete arvutamise ja mustrite identifitseerimise algoritmid, mis põhinevad kujutiste analüüsitehnika ja arvutusesteetika kombinatsioonil. Selle tulemusena on loodud tööstiili hindamise veebipõhise rakenduse prototüüp.Nowadays, an enormous amount of companies use Process-Aware Information Systems to manage, perform, monitor and analyze business processes based on process models. Moreover, as a part of the monitoring stage, these software systems generate event logs, which represent actual a-posteriori workflow and are analyzed by process mining techniques. In this work, as a part of process mining, we introduce the concept of working style as the tool for comprehensive analysis of the nature of work. Business processes and interdependencies between its constituents can be evaluated from the perspective of working style which is represented by measures and patterns. We define the novel event log representation approach, where the log file is treated as an image. Additionally, we propose measure computation and pattern identification algorithms based on image analysis technique in combination with computational aesthetics. As a result, the web-based prototype application for working style evaluation has been built

    New approaches for clustering high dimensional data

    Get PDF
    Clustering is one of the most effective methods for analyzing datasets that contain a large number of objects with numerous attributes. Clustering seeks to identify groups, or clusters, of similar objects. In low dimensional space, the similarity between objects is often evaluated by summing the difference across all of their attributes. High dimensional data, however, may contain irrelevant attributes which mask the existence of clusters. The discovery of groups of objects that are highly similar within some subsets of relevant attributes becomes an important but challenging task. My thesis focuses on various models and algorithms for this task. We first present a flexible clustering model, namely OP-Cluster (Order Preserving Cluster). Under this model, two objects are similar on a subset of attributes if the values of these two objects induce the same relative ordering of these attributes. OPClustering algorithm has demonstrated to be useful to identify co-regulated genes in gene expression data. We also propose a semi-supervised approach to discover biologically meaningful OP-Clusters by incorporating existing gene function classifications into the clustering process. This semi-supervised algorithm yields only OP-clusters that are significantly enriched by genes from specific functional categories. Real datasets are often noisy. We propose a noise-tolerant clustering algorithm for mining frequently occuring itemsets. This algorithm is called approximate frequent itemsets (AFI). Both the theoretical and experimental results demonstrate that our AFI mining algorithm has higher recoverability of real clusters than any other existing itemset mining approaches. Pair-wise dissimilarities are often derived from original data to reduce the complexities of high dimensional data. Traditional clustering algorithms taking pair-wise dissimilarities as input often generate disjoint clusters from pair-wise dissimilarities. It is well known that the classification model represented by disjoint clusters is inconsistent with many real classifications, such gene function classifications. We develop a Poclustering algorithm, which generates overlapping clusters from pair-wise dissimilarities. We prove that by allowing overlapping clusters, Poclustering fully preserves the information of any dissimilarity matrices while traditional partitioning algorithms may cause significant information loss

    Biclustering Based on FCA and Partition Pattern Structures for Recommendation Systems

    Get PDF
    International audienceThis paper focuses on item recommendation for visitors in a museum within the framework of European Project CrossCult about cultural heritage. We present a theoretical research work about recommendation using biclustering. Our approach is based on biclustering using FCA and partition pattern structures. First, we recall a previous method of recommendation based on constant-column biclusters. Then, we propose an alternative approach that incorporates an order information and that uses coherent-evolution-on-columns biclusters. This alternative approach shares some common features with sequential pattern mining. Finally, given a dataset of visitor trajectories, we indicate how these approaches can be used to build a collaborative recommendation strategy
    corecore