788 research outputs found
Hybrid group anomaly detection for sequence data: application to trajectory data analytics
Many research areas depend on group anomaly detection. The use of group anomaly detection can maintain and provide security and privacy to the data involved. This research attempts to solve the deficiency of the existing literature in outlier detection thus a novel hybrid framework to identify group anomaly detection from sequence data is proposed in this paper. It proposes two approaches for efficiently solving this problem: i) Hybrid Data Mining-based algorithm, consists of three main phases: first, the clustering algorithm is applied to derive the micro-clusters. Second, the kNN algorithm is applied to each micro-cluster to calculate the candidates of the group's outliers. Third, a pattern mining framework gets applied to the candidates of the group's outliers as a pruning strategy, to generate the groups of outliers, and ii) a GPU-based approach is presented, which benefits from the massively GPU computing to boost the runtime of the hybrid data mining-based algorithm. Extensive experiments were conducted to show the advantages of different sequence databases of our proposed model. Results clearly show the efficiency of a GPU direction when directly compared to a sequential approach by reaching a speedup of 451. In addition, both approaches outperform the baseline methods for group detection.acceptedVersio
SpecHD: Hyperdimensional Computing Framework for FPGA-based Mass Spectrometry Clustering
Mass spectrometry-based proteomics is a key enabler for personalized
healthcare, providing a deep dive into the complex protein compositions of
biological systems. This technology has vast applications in biotechnology and
biomedicine but faces significant computational bottlenecks. Current
methodologies often require multiple hours or even days to process extensive
datasets, particularly in the domain of spectral clustering. To tackle these
inefficiencies, we introduce SpecHD, a hyperdimensional computing (HDC)
framework supplemented by an FPGA-accelerated architecture with integrated
near-storage preprocessing. Utilizing streamlined binary operations in an HDC
environment, SpecHD capitalizes on the low-latency and parallel capabilities of
FPGAs. This approach markedly improves clustering speed and efficiency, serving
as a catalyst for real-time, high-throughput data analysis in future healthcare
applications. Our evaluations demonstrate that SpecHD not only maintains but
often surpasses existing clustering quality metrics while drastically cutting
computational time. Specifically, it can cluster a large-scale human proteome
dataset-comprising 25 million MS/MS spectra and 131 GB of MS data-in just 5
minutes. With energy efficiency exceeding 31x and a speedup factor that spans a
range of 6x to 54x over existing state of-the-art solutions, SpecHD emerges as
a promising solution for the rapid analysis of mass spectrometry data with
great implications for personalized healthcare
Exploring Decomposition for Solving Pattern Mining Problems
This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552Ă— on a single GPU using big transaction databases.publishedVersio
- …