2,460 research outputs found
An ontology enhanced parallel SVM for scalable spam filter training
This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart
On the use of algorithms to discover motifs in DNA sequences
Many approaches are currently devoted to find
DNA motifs in nucleotide sequences. However, this task remains
challenging for specialists nowadays due to the difficulties
they find to deeply understand gene regulatory mechanisms,
especially when analyzing binding sites in DNA. These sites or
specific nucleotide sequences are known to be responsible for
transcription processes. Thus, this work aims at providing an
updated overview on strategies developed to discover meaningful
motifs in DNA-related sequences, and, in particular, their
attempts to find out relevant binding sites. From all existing
approaches, this work is focused on dictionary, ensemble, and
artificial intelligence-based algorithms since they represent the
classical and the leading ones, respectively.Ministerio de Ciencia y TecnologĂa TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 02611
Hybrid ACO and SVM algorithm for pattern classification
Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to
solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the
SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parametersâ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while
the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification
accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
An IVR call performance classification system using computational intelligent techniques
Speech recognition adoption rate within Interactive Voice Response (IVR) systems is on
the increase. If implemented correctly, businesses experience an increase of IVR
utilization by customers, thus benefiting from reduced operational costs. However, it is
essential for businesses to evaluate the productivity, quality and call resolution
performance of these self-service applications. This research is concerned with the
development of a business analytics for IVR application that could assist contact centers
in evaluating these self-service IVR applications. A call classification system for a pay
beneficiary IVR application has been developed. The system comprises of field and call
performance classification components. âSay accountâ, âSay amountâ, âSelect beneficiaryâ
and âSay confirmationâ field classifiers were developed using Multi-Layer Perceptron
(MLP) Artificial Neural Network (ANN), Radial Basis Function (RBF) ANN, Fuzzy
Inference System (FIS) as well as Support Vector Machine (SVM). Call performance
classifiers were also developed using these computational intelligent techniques. Binary
and real coded Genetic Algorithm (GA) solutions were used to determine optimal MLP
and RBF ANN classifiers. These GA solutions produced accurate MLP and RBF ANN
classifiers. In order to increase the accuracy of the call performance RBF ANN classifier,
the classification threshold has been optimized. This process increased the classifier
accuracy by approximately eight percent. However, the field and call performance MLP
ANN classifiers were the most accurate ANN solutions. Polynomial and RBF SVM
kernel functions were most suited for field classifications. However, the linear SVM
kernel function is most accurate for call performance classification. When compared to
the ANN and SVM field classifiers, the FIS field classifiers did not perform well. The
FIS call performance classifier did outperform the RBF ANN call performance network.
Ensembles of MLP ANN, RBF ANN and SVM field classifiers were developed.
Ensembles of FIS, MLP ANN and SVM call performance classifiers were also
implemented. All the computational intelligent methods considered were compared in
relation to accuracy, sensitivity and specificity performance metrics. MLP classifier
solution is most appropriate for âSay accountâ field classification. Ensemble of field
classifiers and MLP classifier solutions performed the best in âSay amountâ field
classification. Ensemble of field classifiers and SVM classifier solutions are most suited
in âSelect beneficiaryâ and âSay confirmationâ field classifications. However, the
ensemble of call performance classifiers is the preferred classification solution for call
performance
- âŠ