Search CORE

2,460 research outputs found

An ontology enhanced parallel SVM for scalable spam filter training

Author: Bauer
Blanco
Blanzieri
Blei
Breiman
Cao
Caruana
Chawla
Colas
Cristianini
Dean
Do
Gansterer
Godwin Caruana
Graf
Hall
Huang
Kearns
Kim
Maozhen Li
Mei
Platt
Suykens
Taura
Vapnik
Wang
Woodsend
Yang Liu
Zanghirati
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

Crossref

Brunel University Research Archive

On the use of algorithms to discover motifs in DNA sequences

Author: Martínez Ballesteros María del Mar
Martínez Álvarez Francisco
Riquelme Santos José Cristóbal
Rubio Escudero Cristina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Many approaches are currently devoted to find DNA motifs in nucleotide sequences. However, this task remains challenging for specialists nowadays due to the difficulties they find to deeply understand gene regulatory mechanisms, especially when analyzing binding sites in DNA. These sites or specific nucleotide sequences are known to be responsible for transcription processes. Thus, this work aims at providing an updated overview on strategies developed to discover meaningful motifs in DNA-related sequences, and, in particular, their attempts to find out relevant binding sites. From all existing approaches, this work is focused on dictionary, ensemble, and artificial intelligence-based algorithms since they represent the classical and the leading ones, respectively.Ministerio de Ciencia y Tecnología TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 02611

idUS. Depósito de Investigación Universidad de Sevilla

Hybrid ACO and SVM algorithm for pattern classification

Author: Alwan Hiba Basim
Publication venue
Publication date: 01/01/2013
Field of study

Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parameters’ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO

Universiti Utara Malaysia: UUM eTheses

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Evolutionary approaches to fuzzy modelling for classification

Author: Galea Michelle
Shen Qiang
Publication venue
Publication date: 01/01/2004
Field of study

Aberystwyth Research Portal

An IVR call performance classification system using computational intelligent techniques

Author: Patel Pretesh Bhoola
Publication venue
Publication date: 16/09/2010
Field of study

Speech recognition adoption rate within Interactive Voice Response (IVR) systems is on the increase. If implemented correctly, businesses experience an increase of IVR utilization by customers, thus benefiting from reduced operational costs. However, it is essential for businesses to evaluate the productivity, quality and call resolution performance of these self-service applications. This research is concerned with the development of a business analytics for IVR application that could assist contact centers in evaluating these self-service IVR applications. A call classification system for a pay beneficiary IVR application has been developed. The system comprises of field and call performance classification components. ‘Say account’, ‘Say amount’, ‘Select beneficiary’ and ‘Say confirmation’ field classifiers were developed using Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN), Radial Basis Function (RBF) ANN, Fuzzy Inference System (FIS) as well as Support Vector Machine (SVM). Call performance classifiers were also developed using these computational intelligent techniques. Binary and real coded Genetic Algorithm (GA) solutions were used to determine optimal MLP and RBF ANN classifiers. These GA solutions produced accurate MLP and RBF ANN classifiers. In order to increase the accuracy of the call performance RBF ANN classifier, the classification threshold has been optimized. This process increased the classifier accuracy by approximately eight percent. However, the field and call performance MLP ANN classifiers were the most accurate ANN solutions. Polynomial and RBF SVM kernel functions were most suited for field classifications. However, the linear SVM kernel function is most accurate for call performance classification. When compared to the ANN and SVM field classifiers, the FIS field classifiers did not perform well. The FIS call performance classifier did outperform the RBF ANN call performance network. Ensembles of MLP ANN, RBF ANN and SVM field classifiers were developed. Ensembles of FIS, MLP ANN and SVM call performance classifiers were also implemented. All the computational intelligent methods considered were compared in relation to accuracy, sensitivity and specificity performance metrics. MLP classifier solution is most appropriate for ‘Say account’ field classification. Ensemble of field classifiers and MLP classifier solutions performed the best in ‘Say amount’ field classification. Ensemble of field classifiers and SVM classifier solutions are most suited in ‘Select beneficiary’ and ‘Say confirmation’ field classifications. However, the ensemble of call performance classifiers is the preferred classification solution for call performance

Wits Institutional Repository on DSPACE