449 research outputs found

    Exploring Decomposition for Solving Pattern Mining Problems

    Get PDF
    This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552× on a single GPU using big transaction databases.publishedVersio

    IMPLEMENTASI DENSITY BASED CLUSTERING MENGGUNAKAN GRAPHICS PROCESSING UNIT (GPU)

    Get PDF
    Data merupakan sumber informasi yang berguna untuk kelangsungan hidup manusia. Untuk menjadikan data tersebut bermanfaat, diperlukan suatu metode yang dapat menggali informasi penting dari data yang ada. Salah satu metode penarikan informasi dari sekumpulan data dikenal dengan Data Mining. Teknik menambang informasi pada Data Mining pun beragam, salah satunya Clustering. Clustering merupakan metode pengelompokkan data yang memiliki kesamaan atribut kedalam satu kelompok dengan aturan tertentu. Pada penelitian ini algoritma Clustering yang digunakan adalah Density Based Spatial Clustering Application with Noise (DBSCAN). DBSCAN merupakan algoritma Cluster yang bersifat density-based, yaitu mengelompokkan data berdasarkan kepadatannya ke dalam satu kelompok, dan data yang jarang pada kelompok lainnya. Untuk mengelompokkan data dengan dimensi yang tinggi, diperlukan perangkat yang dapat meminimalkan biaya komputasi. GPU (Graphics Processing Unit) memungkinan mengolah data dengan dimensi tinggi dalam waktu yang singkat. Jika GPU dikombinasikan dengan DBSCAN pengelompokkan data dapat menghasilkan performansi kerja algoritma yang baik dengan akurasi yang tinggi serta biaya komputasi yang minimum. Salah satu metode penerapan GPU pada DBSCAN dengan melakukan perhitungan jarak antar data secara paralel di GPU. Hasil perhitungan ini mampu menghemat biaya komputasi rata – rata sebesar 1.035921875 detik untuk data dengan dimensi 15154 dan 0.063893878 detik untuk data dengan dimensi 12600. Selain itu pada evaluasi performansi, GPU menghasilkan nilai yang cukup baik dibandingkan dengan algoritma serialnya

    Track Seeding and Labelling with Embedded-space Graph Neural Networks

    Full text link
    To address the unprecedented scale of HL-LHC data, the Exa.TrkX project is investigating a variety of machine learning approaches to particle track reconstruction. The most promising of these solutions, graph neural networks (GNN), process the event as a graph that connects track measurements (detector hits corresponding to nodes) with candidate line segments between the hits (corresponding to edges). Detector information can be associated with nodes and edges, enabling a GNN to propagate the embedded parameters around the graph and predict node-, edge- and graph-level observables. Previously, message-passing GNNs have shown success in predicting doublet likelihood, and we here report updates on the state-of-the-art architectures for this task. In addition, the Exa.TrkX project has investigated innovations in both graph construction, and embedded representations, in an effort to achieve fully learned end-to-end track finding. Hence, we present a suite of extensions to the original model, with encouraging results for hitgraph classification. In addition, we explore increased performance by constructing graphs from learned representations which contain non-linear metric structure, allowing for efficient clustering and neighborhood queries of data points. We demonstrate how this framework fits in with both traditional clustering pipelines, and GNN approaches. The embedded graphs feed into high-accuracy doublet and triplet classifiers, or can be used as an end-to-end track classifier by clustering in an embedded space. A set of post-processing methods improve performance with knowledge of the detector physics. Finally, we present numerical results on the TrackML particle tracking challenge dataset, where our framework shows favorable results in both seeding and track finding.Comment: Proceedings submission in Connecting the Dots Workshop 2020, 10 page

    Machine Learning for Identifying Group Trajectory Outliers

    Get PDF
    Prior works on the trajectory outlier detection problem solely consider individual outliers. However, in real-world scenarios, trajectory outliers can often appear in groups, e.g., a group of bikes that deviates to the usual trajectory due to the maintenance of streets in the context of intelligent transportation. The current paper considers the Group Trajectory Outlier (GTO) problem and proposes three algorithms. The first and the second algorithms are extensions of the well-known DBSCAN and kNN algorithms, while the third one models the GTO problem as a feature selection problem. Furthermore, two different enhancements for the proposed algorithms are proposed. The first one is based on ensemble learning and computational intelligence, which allows for merging algorithms’ outputs to possibly improve the final result. The second is a general high-performance computing framework that deals with big trajectory databases, which we used for a GPU-based implementation. Experimental results on different real trajectory databases show the scalability of the proposed approaches.acceptedVersio

    Taramalı electron mikroskobu görüntülerinde mitokondrilerin otomatik olarak bölütlenmesi

    Get PDF
    Many studies have shown that shape of mitochondria indicates the occurrence of diseases. Scanning Electron Microscopy (SEM) enables to obtain image of internal structures of the cell and mitochondria. Automatic segmentation of mitochondria contributes to the decision of diseases by specialists. There is limited study about automatic segmentation of mitochondria in Serial Block-Face Scanning Electron Microscopy (SFBSEM) images. SBFSEM imaging technique provides full automation, well registered images, less time and less effort for data acquisition. Therefore, SBFSEM imaging technique is selected for this study. Recently, deep learning methods have been implemented for image processing of SEM datasets. However, due to requirement of huge datasets, much effort and powerful computers for preparing testing and training data, energy based model is implemented for this study. The algorithms used in this thesis are primarily the algorithms developed by Tasel et al for mitochondria segmentation in TEM images. The method includes preprocessing, ridge detection, energy mapping, curve fitting, snake-based shape extraction, validation and post-processing steps. In this thesis, these algorithms are adapted and refined for SBFSEM images to obtain optimum performance. Evaluations are made by using Dice Similarity Coefficient (DSC), precision, recall and F-Score metrics.Birçok çalışma mitokondri ve kristaların şeklinin hastalıkların oluşumunu belirttiğini göstermektedir. Taramalı Elektron Mikroskobu (SEM), hücrenin iç yapılarının ve mitokondrilerin görüntülerinin elde edilmesini sağlar. Mitokondrilerin otomatik bölütlenmesi uzmanlar tarafından hastalıkların karar verilmesine katkı sağlar. Seri Blok-Yüz Taramalı Elektron Mikroskobu (SFBSEM) görüntülerinde mitokondrinin otomatik segmentasyonu hakkında sınırlı çalışma vardır. SBFSEM görüntüleme tekniği, tam otomasyon, iyi kaydedilmiş görüntüler, veri elde etmek için daha az zaman ve daha az çaba sağlar. Bu nedenle, bu çalışma için SBFSEM görüntüleme tekniği seçilmiştir. Son zamanlarda, derin ögrenme yöntemleri SEM veri setlerinin görüntü işlemesi için uygulanmaktadır. Ancak, büyük veri setlerinin, fazla çabanın ve test ve eğitim verilerinin hazırlanması için güçlü bilgisayarların gerekliliğinden bu çalışma için enerji tabanlı model uygulanmaktadır. Bu tezde kullanılan algoritmalar öncelikle TEM görüntülerinde mitokondri bölütlenmesi için Taşel ve arkadaşları tarafından geliştirilen algoritmalardır. Yöntem, ön işleme, sırt algılama, enerji haritalama, eğri uyumlandırma, yılan temelli şekil çıkarma, doğrulama ve son işlem adımlarını içerir. Bu tezde, bu algoritmalar optimum performans elde etmek için SBFSEM görüntüleri için uyarlanmış ve yeniden düzenlenmiştir. Değerlendirmeler Dice Benzerlik Katsayısı(DSC), kesinlik, hatırlama ve F-Skoru metrikleri kullanılarak yapılır.M.S. - Master of Scienc

    Efficient data mining algorithms for time series and complex medical data

    Get PDF

    When the decomposition meets the constraint satisfaction problem

    Get PDF
    This paper explores the joint use of decomposition methods and parallel computing for solving constraint satisfaction problems and introduces a framework called Parallel Decomposition for Constraint Satisfaction Problems (PD-CSP). The main idea is that the set of constraints are first clustered using a decomposition algorithm in which highly correlated constraints are grouped together. Next, parallel search of variables is performed on the produced clusters in a way that is friendly for parallel computing. In particular, for the first step, we propose the adaptation of two well-known clustering algorithms ( k -means and DBSCAN). For the second step, we develop a GPU-based approach to efficiently explore the clusters. The results from the extensive experimental evaluation show that the PD-CSP provides competitive results in terms of accuracy and runtime

    Towards Automatic Digitalization of Railway Engineering Schematics

    Get PDF
    Relay-based Railways Interlocking Systems (RRIS) carry out critical functions to control stations. Despite being based on old and hard-to-maintain electro-mechanical technology, RRIS are still pervasive. A powerful CAD modeling and analysis approach based on symbolic logic has been recently proposed to support the re-engineering of relay diagrams into more maintainable computer-based technologies. However, the legacy engineering drawings that need to be digitized consist of large, hand-drawn diagrams dating back several decades. Manually transforming such diagrams into the format of the CAD tool is labor-intensive and error-prone, effectively a bottleneck in the reverse-engineering process. In this paper, we tackle the problem of automatic digitalization of RRIS schematics into the corresponding CAD format with an integrative Artificial Intelligence approach. Deep learning-based methods, segment detection, and clustering techniques for the automated digitalization of engineering schematics are used to detect and classify the single elements of the diagram. These elementary elements can then be aggregated into more complex objects leveraging the domain ontology. First results of the method’s capability of automatically reconstructing the engineering schematics are presented
    corecore