41 research outputs found
An Efficient Genetic Algorithm for Discovering Diverse-Frequent Patterns
Working with exhaustive search on large dataset is infeasible for several
reasons. Recently, developed techniques that made pattern set mining feasible
by a general solver with long execution time that supports heuristic search and
are limited to small datasets only. In this paper, we investigate an approach
which aims to find diverse set of patterns using genetic algorithm to mine
diverse frequent patterns. We propose a fast heuristic search algorithm that
outperforms state-of-the-art methods on a standard set of benchmarks and
capable to produce satisfactory results within a short period of time. Our
proposed algorithm uses a relative encoding scheme for the patterns and an
effective twin removal technique to ensure diversity throughout the search.Comment: 2015 International Conference on Electrical Engineering and
Information Communication Technology (ICEEICT
GreMuTRRR: A Novel Genetic Algorithm to Solve Distance Geometry Problem for Protein Structures
Nuclear Magnetic Resonance (NMR) Spectroscopy is a widely used technique to
predict the native structure of proteins. However, NMR machines are only able
to report approximate and partial distances between pair of atoms. To build the
protein structure one has to solve the Euclidean distance geometry problem
given the incomplete interval distance data produced by NMR machines. In this
paper, we propose a new genetic algorithm for solving the Euclidean distance
geometry problem for protein structure prediction given sparse NMR data. Our
genetic algorithm uses a greedy mutation operator to intensify the search, a
twin removal technique for diversification in the population and a random
restart method to recover stagnation. On a standard set of benchmark dataset,
our algorithm significantly outperforms standard genetic algorithms.Comment: Accepted for publication in the 8th International Conference on
Electrical and Computer Engineering (ICECE 2014
DPCSpell: A Transformer-based Detector-Purificator-Corrector Framework for Spelling Error Correction of Bangla and Resource Scarce Indic Languages
Spelling error correction is the task of identifying and rectifying
misspelled words in texts. It is a potential and active research topic in
Natural Language Processing because of numerous applications in human language
understanding. The phonetically or visually similar yet semantically distinct
characters make it an arduous task in any language. Earlier efforts on spelling
error correction in Bangla and resource-scarce Indic languages focused on
rule-based, statistical, and machine learning-based methods which we found
rather inefficient. In particular, machine learning-based approaches, which
exhibit superior performance to rule-based and statistical methods, are
ineffective as they correct each character regardless of its appropriateness.
In this work, we propose a novel detector-purificator-corrector framework based
on denoising transformers by addressing previous issues. Moreover, we present a
method for large-scale corpus creation from scratch which in turn resolves the
resource limitation problem of any left-to-right scripted language. The
empirical outcomes demonstrate the effectiveness of our approach that
outperforms previous state-of-the-art methods by a significant margin for
Bangla spelling error correction. The models and corpus are publicly available
at https://tinyurl.com/DPCSpell.Comment: 23 pages, 4 figures, and 7 table
CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data
mining and machine learning, as most of the real-life datasets are often
imbalanced in nature. Existing learning algorithms maximise the classification
accuracy by correctly classifying the majority class, but misclassify the
minority class. However, the minority class instances are representing the
concept with greater interest than the majority class instances in real-life
applications. Recently, several techniques based on sampling methods
(under-sampling of the majority class and over-sampling the minority class),
cost-sensitive learning methods, and ensemble learning have been used in the
literature for classifying imbalanced datasets. In this paper, we introduce a
new clustering-based under-sampling approach with boosting (AdaBoost)
algorithm, called CUSBoost, for effective imbalanced classification. The
proposed algorithm provides an alternative to RUSBoost (random under-sampling
with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost)
algorithms. We evaluated the performance of CUSBoost algorithm with the
state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost,
SMOTEBoost on 13 imbalance binary and multi-class datasets with various
imbalance ratios. The experimental results show that the CUSBoost is a
promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201
FGPGA: An Efficient Genetic Approach for Producing Feasible Graph Partitions
Graph partitioning, a well studied problem of parallel computing has many
applications in diversified fields such as distributed computing, social
network analysis, data mining and many other domains. In this paper, we
introduce FGPGA, an efficient genetic approach for producing feasible graph
partitions. Our method takes into account the heterogeneity and capacity
constraints of the partitions to ensure balanced partitioning. Such approach
has various applications in mobile cloud computing that include feasible
deployment of software applications on the more resourceful infrastructure in
the cloud instead of mobile hand set. Our proposed approach is light weight and
hence suitable for use in cloud architecture. We ensure feasibility of the
partitions generated by not allowing over-sized partitions to be generated
during the initialization and search. Our proposed method tested on standard
benchmark datasets significantly outperforms the state-of-the-art methods in
terms of quality of partitions and feasibility of the solutions.Comment: Accepted in the 1st International Conference on Networking Systems
and Security 2015 (NSysS 2015