2,460 research outputs found

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    On the use of algorithms to discover motifs in DNA sequences

    Get PDF
    Many approaches are currently devoted to find DNA motifs in nucleotide sequences. However, this task remains challenging for specialists nowadays due to the difficulties they find to deeply understand gene regulatory mechanisms, especially when analyzing binding sites in DNA. These sites or specific nucleotide sequences are known to be responsible for transcription processes. Thus, this work aims at providing an updated overview on strategies developed to discover meaningful motifs in DNA-related sequences, and, in particular, their attempts to find out relevant binding sites. From all existing approaches, this work is focused on dictionary, ensemble, and artificial intelligence-based algorithms since they represent the classical and the leading ones, respectively.Ministerio de Ciencia y TecnologĂ­a TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 02611

    Hybrid ACO and SVM algorithm for pattern classification

    Get PDF
    Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parameters’ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    An IVR call performance classification system using computational intelligent techniques

    Get PDF
    Speech recognition adoption rate within Interactive Voice Response (IVR) systems is on the increase. If implemented correctly, businesses experience an increase of IVR utilization by customers, thus benefiting from reduced operational costs. However, it is essential for businesses to evaluate the productivity, quality and call resolution performance of these self-service applications. This research is concerned with the development of a business analytics for IVR application that could assist contact centers in evaluating these self-service IVR applications. A call classification system for a pay beneficiary IVR application has been developed. The system comprises of field and call performance classification components. ‘Say account’, ‘Say amount’, ‘Select beneficiary’ and ‘Say confirmation’ field classifiers were developed using Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN), Radial Basis Function (RBF) ANN, Fuzzy Inference System (FIS) as well as Support Vector Machine (SVM). Call performance classifiers were also developed using these computational intelligent techniques. Binary and real coded Genetic Algorithm (GA) solutions were used to determine optimal MLP and RBF ANN classifiers. These GA solutions produced accurate MLP and RBF ANN classifiers. In order to increase the accuracy of the call performance RBF ANN classifier, the classification threshold has been optimized. This process increased the classifier accuracy by approximately eight percent. However, the field and call performance MLP ANN classifiers were the most accurate ANN solutions. Polynomial and RBF SVM kernel functions were most suited for field classifications. However, the linear SVM kernel function is most accurate for call performance classification. When compared to the ANN and SVM field classifiers, the FIS field classifiers did not perform well. The FIS call performance classifier did outperform the RBF ANN call performance network. Ensembles of MLP ANN, RBF ANN and SVM field classifiers were developed. Ensembles of FIS, MLP ANN and SVM call performance classifiers were also implemented. All the computational intelligent methods considered were compared in relation to accuracy, sensitivity and specificity performance metrics. MLP classifier solution is most appropriate for ‘Say account’ field classification. Ensemble of field classifiers and MLP classifier solutions performed the best in ‘Say amount’ field classification. Ensemble of field classifiers and SVM classifier solutions are most suited in ‘Select beneficiary’ and ‘Say confirmation’ field classifications. However, the ensemble of call performance classifiers is the preferred classification solution for call performance
    • 

    corecore