70,249 research outputs found

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Input/Output of Ab-initio Nuclear Structure Calculations for Improved Performance and Portability

    Get PDF
    Many modern scientific applications rely on highly computation intensive calculations. However, most applications do not concentrate as much on the role that input/output operations can play for improved performance and portability. Parallelizing input/output operations of large files can significantly improve the performance of parallel applications where sequential I/O is a bottleneck. A proper choice of I/O library also offers a scope for making input/output operations portable across different architectures. Thus, use of parallel I/O libraries for organizing I/O of large data files offers great scope in improving performance and portability of applications. In particular, sequential I/O has been identified as a bottleneck for the highly scalable MFDn (Many Fermion Dynamics for nuclear structure) code performing ab-initio nuclear structure calculations. We develop interfaces and parallel I/O procedures to use a well-known parallel I/O library in MFDn. As a result, we gain efficient I/O of large datasets along with their portability and ease of use in the down-stream processing. Even situations where the amount of data to be written is not huge, proper use of input/output operations can boost the performance of scientific applications. Application checkpointing offers enormous performance improvement and flexibility by doing a negligible amount of I/O to disk. Checkpointing saves and resumes application state in such a manner that in most cases the application is unaware that there has been an interruption to its execution. This helps in saving large amount of work that has been previously done and continue application execution. This small amount of I/O provides substantial time saving by offering restart/resume capability to applications. The need for checkpointing in optimization code NEWUOA has been identified and checkpoint/restart capability has been implemented in NEWUOA by using simple file I/O

    ROBAST: Development of a ROOT-Based Ray-Tracing Library for Cosmic-Ray Telescopes and its Applications in the Cherenkov Telescope Array

    Full text link
    We have developed a non-sequential ray-tracing simulation library, ROOT-based simulator for ray tracing (ROBAST), which is aimed to be widely used in optical simulations of cosmic-ray (CR) and gamma-ray telescopes. The library is written in C++, and fully utilizes the geometry library of the ROOT framework. Despite the importance of optics simulations in CR experiments, no open-source software for ray-tracing simulations that can be widely used in the community has existed. To reduce the dispensable effort needed to develop multiple ray-tracing simulators by different research groups, we have successfully used ROBAST for many years to perform optics simulations for the Cherenkov Telescope Array (CTA). Among the six proposed telescope designs for CTA, ROBAST is currently used for three telescopes: a Schwarzschild-Couder (SC) medium-sized telescope, one of SC small-sized telescopes, and a large-sized telescope (LST). ROBAST is also used for the simulation and development of hexagonal light concentrators proposed for the LST focal plane. Making full use of the ROOT geometry library with additional ROBAST classes, we are able to build the complex optics geometries typically used in CR experiments and ground-based gamma-ray telescopes. We introduce ROBAST and its features developed for CR experiments, and show several successful applications for CTA.Comment: Accepted for publication in Astroparticle Physics. 11 pages, 10 figures, 4 table

    A strategy for reducing turnaround time in design optimization using a distributed computer system

    Get PDF
    There is a need to explore methods for reducing lengthly computer turnaround or clock time associated with engineering design problems. Different strategies can be employed to reduce this turnaround time. One strategy is to run validated analysis software on a network of existing smaller computers so that portions of the computation can be done in parallel. This paper focuses on the implementation of this method using two types of problems. The first type is a traditional structural design optimization problem, which is characterized by a simple data flow and a complicated analysis. The second type of problem uses an existing computer program designed to study multilevel optimization techniques. This problem is characterized by complicated data flow and a simple analysis. The paper shows that distributed computing can be a viable means for reducing computational turnaround time for engineering design problems that lend themselves to decomposition. Parallel computing can be accomplished with a minimal cost in terms of hardware and software

    Multi-Objective Big Data Optimization with jMetal and Spark

    Get PDF
    Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
    corecore