5,127 research outputs found

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Deep Learning in Cardiology

    Full text link
    The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

    Approximations for Throughput Maximization

    Get PDF
    In this paper we study the classical problem of throughput maximization. In this problem we have a collection JJ of nn jobs, each having a release time rjr_j, deadline djd_j, and processing time pjp_j. They have to be scheduled non-preemptively on mm identical parallel machines. The goal is to find a schedule which maximizes the number of jobs scheduled entirely in their [rj,dj][r_j,d_j] window. This problem has been studied extensively (even for the case of m=1m=1). Several special cases of the problem remain open. Bar-Noy et al. [STOC1999] presented an algorithm with ratio 11/(1+1/m)m1-1/(1+1/m)^m for mm machines, which approaches 11/e1-1/e as mm increases. For m=1m=1, Chuzhoy-Ostrovsky-Rabani [FOCS2001] presented an algorithm with approximation with ratio 11eε1-\frac{1}{e}-\varepsilon (for any ε>0\varepsilon>0). Recently Im-Li-Moseley [IPCO2017] presented an algorithm with ratio 11/eε01-1/e-\varepsilon_0 for some absolute constant ε0>0\varepsilon_0>0 for any fixed mm. They also presented an algorithm with ratio 1O(logm/m)ε1-O(\sqrt{\log m/m})-\varepsilon for general mm which approaches 1 as mm grows. The approximability of the problem for m=O(1)m=O(1) remains a major open question. Even for the case of m=1m=1 and c=O(1)c=O(1) distinct processing times the problem is open (Sgall [ESA2012]). In this paper we study the case of m=O(1)m=O(1) and show that if there are cc distinct processing times, i.e. pjp_j's come from a set of size cc, then there is a (1ε)(1-\varepsilon)-approximation that runs in time O(nmc7ε6logT)O(n^{mc^7\varepsilon^{-6}}\log T), where TT is the largest deadline. Therefore, for constant mm and constant cc this yields a PTAS. Our algorithm is based on proving structural properties for a near optimum solution that allows one to use a dynamic programming with pruning

    A sparse multinomial probit model for classification

    No full text
    A recent development in penalized probit modelling using a hierarchical Bayesian approach has led to a sparse binomial (two-class) probit classifier that can be trained via an EM algorithm. A key advantage of the formulation is that no tuning of hyperparameters relating to the penalty is needed thus simplifying the model selection process. The resulting model demonstrates excellent classification performance and a high degree of sparsity when used as a kernel machine. It is, however, restricted to the binary classification problem and can only be used in the multinomial situation via a one-against-all or one-against-many strategy. To overcome this, we apply the idea to the multinomial probit model. This leads to a direct multi-classification approach and is shown to give a sparse solution with accuracy and sparsity comparable with the current state-of-the-art. Comparative numerical benchmark examples are used to demonstrate the method
    corecore