Search CORE

121 research outputs found

Mining frequent biological sequences based on bitmap without candidate sequence generation

Author: Davis Darryl N.
Ren Jiadong
Wang Qian
Publication venue: 'Elsevier BV'
Publication date: 30/12/2015
Field of study

Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability

Repository@Hull - Worktribe

Principal Component Analysis Untuk Analisa Pola Tangkapan Ikan Di Indonesia

Author: Kristiana T. (Titin)
Publication venue: 'PPPM STMIK Nusa Mandiri'
Publication date: 01/01/2013
Field of study

Different kinds of fish in Indonesia is very much known to exist more than 80 species of fish caught in the waters of Indonesia. To find out which type of fish caught necessary analysis of the data pattern catches so as to know what kind of fish are caught. Search pattern or associative relationships of large-scale data that are closely related to data mining. Analysis of the association or the association rule mining is a data mining technique to discover the rules of associative between a combination of items. In the association rule method, there are two processes, namely the process of generating Frequent Itemset and trenching association rules. Frequent Itemset Generation is a process to get itemset interconnected and has a value of association based on the value of support and confidence. The algorithm used to generate the frequent itemset is Apriori Algorithm.Apriori algorithm has a weakness in the appropriate feature extraction that is used to attribute causing rule that formed a research banyak.dalam bebasis applying apriori algorithm principal component analysis to obtain a more optimal rule. After experiments using apriori algorithm with a magnitude Φ = 30, min Support 80% and 80% Confidence min rule formed results totaled 82 rules. While the second experiment was done by using an algorithm based on principal component analysis priori the magnitude Φ = 30, min Support 80% and 80% Confidence min formed results amounted to 12 rules to fully lift the ratio of

Neliti

ejournal.nusamandiri.ac.id (STMIK Nusa Mandiri)

Scaling pattern mining through non-overlapping variable partitioning

Author: Alexandre Leonardo
Costa Rafael S.
Henriques Rui
Publication venue
Publication date: 10/12/2022
Field of study

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and advancing biological knowledge. However, given the NP-hard nature of the biclustering task, algorithms with optimality guarantees tend to scale poorly in the presence of high-dimensionality data. To this end, we propose a pipeline for clustering-based vertical partitioning that takes into consideration both parallelization and cross-partition pattern merging needs. Given a specific type of pattern coherence, these clusters are built based on the likelihood that variables form those patterns. Subsequently, the extracted patterns per cluster are then merged together into a final set of closed patterns. This approach is evaluated using five published datasets. Results show that in some of the tested data, execution times yield statistically significant improvements when variables are clustered together based on the likelihood to form specific types of patterns, as opposed to partitions based on dissimilarity or randomness. This work offers a departuring step on the efficiency impact of vertical partitioning criteria along the different stages of pattern mining and biclustering algorithms. Availability: All the code is freely available at https://github.com/JupitersMight/pattern_merge under the MIT license

arXiv.org e-Print Archive

Logical Linked Data Compression

Author: A. Zhou
G.Ö. Özdogan
H. Lu
H. Zhang
J. Huang
J. Urbani
J. Völker
J.D. Fernández
L. Iannone
M. Meier
Q. Li
R. Pichler
Y. Guo
Publication venue: CORE Scholar
Publication date: 01/01/2013
Field of study

Linked data has experienced accelerated growth in recent years. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets, called Rule Based Compression (RB Compression) that compresses datasets by generating a set of new logical rules from the dataset and removing triples that can be inferred from these rules. Unlike other compression techniques, our approach not only takes advantage of syntactic verbosity and data redundancy but also utilizes semantic associations present in the RDF graph. Depending on the nature of the dataset, our system is able to prune more than 50% of the original triples without affecting data integrity

Crossref

CORE

DMET-Miner: Efficient discovery of association rules from pharmacogenomic data

Author: Agapito Giuseppe
Cannataro Mario
Guzzi Pietro H.
Publication venue: Elsevier Inc.
Publication date: 31/08/2015
Field of study

AbstractMicroarray platforms enable the investigation of allelic variants that may be correlated to phenotypes. Among those, the Affymetrix DMET (Drug Metabolism Enzymes and Transporters) platform enables the simultaneous investigation of all the genes that are related to drug absorption, distribution, metabolism and excretion (ADME). Although recent studies demonstrated the effectiveness of the use of DMET data for studying drug response or toxicity in clinical studies, there is a lack of tools for the automatic analysis of DMET data. In a previous work we developed DMET-Analyzer, a methodology and a supporting platform able to automatize the statistical study of allelic variants, that has been validated in several clinical studies. Although DMET-Analyzer is able to correlate a single variant for each probe (related to a portion of a gene) through the use of the Fisher test, it is unable to discover multiple associations among allelic variants, due to its underlying statistic analysis strategy that focuses on a single variant for each time. To overcome those limitations, here we propose a new analysis methodology for DMET data based on Association Rules mining, and an efficient implementation of this methodology, named DMET-Miner. DMET-Miner extends the DMET-Analyzer tool with data mining capabilities and correlates the presence of a set of allelic variants with the conditions of patient’s samples by exploiting association rules. To face the high number of frequent itemsets generated when considering large clinical studies based on DMET data, DMET-Miner uses an efficient data structure and implements an optimized search strategy that reduces the search space and the execution time. Preliminary experiments on synthetic DMET datasets, show how DMET-Miner outperforms off-the-shelf data mining suites such as the FP-Growth algorithms available in Weka and RapidMiner. To demonstrate the biological relevance of the extracted association rules and the effectiveness of the proposed approach from a medical point of view, some preliminary studies on a real clinical dataset are currently under medical investigation

Elsevier - Publisher Connector

Experimental Approach Based on Ensemble and Frequent Itemsets Mining for Image Spam Filtering

Author: Abdullah Azizi
Mat Ariff Nor Azman
Nasrudin Mohammad Faidzul
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 05/02/2018
Field of study

Excessive amounts of image spam cause many problems to e-mail users. Since image spam is difficult to detect using conventional text-based spam approach, various image processing techniques have been proposed. In this paper, we present an ensemble method using frequent itemset mining (FIM) for filtering image spam. Despite the fact that FIM techniques are well established in data mining, it is not commonly used in the ensemble method. In order to obtain a good filtering performance, a SIFT descriptor is used since it is widely known as effective image descriptors. K-mean clustering is applied to the SIFT keypoints which produce a visual codebook. The bag-of-word (BOW) feature vectors for each image is generated using a hard bag-of-features (HBOF) approach. FIM descriptors are obtained from the frequent itemsets of the BOW feature vectors. We combine BOW, FIM with another three different feature selections, namely Information Gain (IG), Symmetrical Uncertainty (SU) and Chi Square (CS) with a Spatial Pyramid in an ensemble method. We have performed experiments on Dredze and SpamArchive datasets. The results show that our ensemble that uses the frequent itemsets mining has significantly outperform the traditional BOW and naive approach that combines all descriptors directly in a very large single input vector

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Penguins Search Optimisation Algorithm for Association Rules Mining

Author: Abdelouahab Moussaoui
Peng Yeng Yin
Sohag Kabir
Youcef Djenouri
Youcef Gheraibia
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2016
Field of study

Association Rules Mining (ARM) is one of the most popular and well-known approaches for the decision-making process. All existing ARM algorithms are time consuming and generate a very large number of association rules with high overlapping. To deal with this issue, we propose a new ARM approach based on penguins search optimisation algorithm (Pe-ARM for short). Moreover, an efficient measure is incorporated into the main process to evaluate the amount of overlapping among the generated rules. The proposed approach also ensures a good diversification over the whole solutions space. To demonstrate the effectiveness of the proposed approach, several experiments have been carried out on different datasets and specifically on the biological ones. The results reveal that the proposed approach outperforms the well-known ARM algorithms in both execution time and solution quality

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

PRINCIPAL COMPONENT ANALYSIS UNTUK ANALISA POLA TANGKAPAN IKAN DI INDONESIA

Author: Kristiana Titin
Publication venue: Lembaga Penelitian dan Pengabdian Pada Masyarakat
Publication date: 15/03/2013
Field of study

Different kinds of fish in Indonesia is very much known to exist more than 80 species of fish caught in the waters of Indonesia. To find out which type of fish caught necessary analysis of the data pattern catches so as to know what kind of fish are caught. Search pattern or associative relationships of large-scale data that are closely related to data mining. Analysis of the association or the association rule mining is a data mining technique to discover the rules of associative between a combination of items. In the association rule method, there are two processes, namely the process of generating Frequent Itemset and trenching association rules. Frequent Itemset Generation is a process to get itemset interconnected and has a value of association based on the value of support and confidence. The algorithm used to generate the frequent itemset is Apriori Algorithm. Apriori algorithm has a weakness in the appropriate feature extraction that is used to attribute causing rule that formed a research a lot in based applying apriori algorithm principal component analysis to obtain a more optimal rule. After experiments using apriori algorithm with a magnitude Φ = 30, min Support 80% and 80% Confidence min rule formed results totaled 82 rules. While the second experiment was done by using an algorithm based on principal component analysis prior the magnitude Φ = 30, min Support 80% and 80% Confidence min formed results amounted to 12 rules to fully lift the ratio of

ejournal.nusamandiri.ac.id (STMIK Nusa Mandiri)