41,047 research outputs found

    Engineering Parallel String Sorting

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

    Parallel String Sample Sort

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table

    CharBot: A Simple and Effective Method for Evading DGA Classifiers

    Full text link
    Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date

    Variable Point Sources in Sloan Digital Sky Survey Stripe 82. I. Project Description and Initial Catalog (0 h < R.A. < 4 h)

    Full text link
    We report the first results of a study of variable point sources identified using multi-color time-series photometry from Sloan Digital Sky Survey (SDSS) Stripe 82 over a span of nearly 10 years (1998-2007). We construct a light-curve catalog of 221,842 point sources in the R.A. 0-4 h half of Stripe 82, limited to r = 22.0, that have at least 10 detections in the ugriz bands and color errors of < 0.2 mag. These objects are then classified by color and by cross-matching them to existing SDSS catalogs of interesting objects. We use inhomogeneous ensemble differential photometry techniques to greatly improve our sensitivity to variability. Robust variable identification methods are used to extract 6520 variable candidates in this dataset, resulting in an overall variable fraction of ~2.9% at the level of 0.05 mag variability. A search for periodic variables results in the identification of 30 eclipsing/ellipsoidal binary candidates, 55 RR Lyrae, and 16 Delta Scuti variables. We also identify 2704 variable quasars matched to the SDSS Quasar catalog (Schneider et al. 2007), as well as an additional 2403 quasar candidates identified by their non-stellar colors and variability properties. Finally, a sample of 11,328 point sources that appear to be nonvariable at the limits of our sensitivity is also discussed. (Abridged.)Comment: 67 pages, 27 figures. Accepted for publication in ApJS. Catalog available at http://shrike.pha.jhu.edu/stripe82-variable

    An evaluation of DGA classifiers

    Get PDF
    Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system

    Chromospherically Active Stars in the RAVE Survey. I. The Catalogue

    Get PDF
    RAVE, the unbiased magnitude limited survey of the southern sky stars, contained 456,676 medium-resolution spectra at the time of our analysis. Spectra cover the CaII IRT range which is a known indicator of chromospheric activity. Our previous work (Matijevi\v{c} et al. 2012) classified all spectra using locally linear embedding. It identified 53,347 cases with a suggested emission component in calcium lines. Here we use a spectral subtraction technique to measure the properties of this emission. Synthetic templates are replaced by the observed spectra of non-active stars to bypass the difficult computations of non-LTE profiles of the line cores and stellar parameter dependence. We derive both the equivalent width of the excess emission for each calcium line on a 5\AA\ wide interval and their sum EW_IRT for ~44,000 candidate active dwarf stars with S/N>20 and with no respect to the source of their emission flux. From these ~14,000 show a detectable chromospheric flux with at least 2\sigma\ confidence level. Our set of active stars vastly enlarges previously known samples. Atmospheric parameters and in some cases radial velocities of active stars derived from automatic pipeline suffer from systematic shifts due to their shallower calcium lines. We re-estimate the effective temperature, metallicity and radial velocities for candidate active stars. The overall distribution of activity levels shows a bimodal shape, with the first peak coinciding with non-active stars and the second with the pre main-sequence cases. The catalogue will be publicly available with the next RAVE public data releases.Comment: 13 pages, 9 figure
    corecore