41,047 research outputs found
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we propose string sample sort. The algorithm makes effective use of
the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Additionally, we parallelize variants of multikey
quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to
create lists of domain names which can be used for command and control (C&C)
purposes. Approaches based on machine learning have recently been developed to
automatically detect generated domain names in real-time. In this work, we
present a novel DGA called CharBot which is capable of producing large numbers
of unregistered domain names that are not detected by state-of-the-art
classifiers for real-time detection of DGAs, including the recently published
methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). CharBot is very simple, effective and requires no
knowledge of the targeted DGA classifiers. We show that retraining the
classifiers on CharBot samples is not a viable defense strategy. We believe
these findings show that DGA classifiers are inherently vulnerable to
adversarial attacks if they rely only on the domain name string to make a
decision. Designing a robust DGA classifier may, therefore, necessitate the use
of additional information besides the domain name alone. To the best of our
knowledge, CharBot is the simplest and most efficient black-box adversarial
attack against DGA classifiers proposed to date
Variable Point Sources in Sloan Digital Sky Survey Stripe 82. I. Project Description and Initial Catalog (0 h < R.A. < 4 h)
We report the first results of a study of variable point sources identified
using multi-color time-series photometry from Sloan Digital Sky Survey (SDSS)
Stripe 82 over a span of nearly 10 years (1998-2007). We construct a
light-curve catalog of 221,842 point sources in the R.A. 0-4 h half of Stripe
82, limited to r = 22.0, that have at least 10 detections in the ugriz bands
and color errors of < 0.2 mag. These objects are then classified by color and
by cross-matching them to existing SDSS catalogs of interesting objects. We use
inhomogeneous ensemble differential photometry techniques to greatly improve
our sensitivity to variability. Robust variable identification methods are used
to extract 6520 variable candidates in this dataset, resulting in an overall
variable fraction of ~2.9% at the level of 0.05 mag variability. A search for
periodic variables results in the identification of 30 eclipsing/ellipsoidal
binary candidates, 55 RR Lyrae, and 16 Delta Scuti variables. We also identify
2704 variable quasars matched to the SDSS Quasar catalog (Schneider et al.
2007), as well as an additional 2403 quasar candidates identified by their
non-stellar colors and variability properties. Finally, a sample of 11,328
point sources that appear to be nonvariable at the limits of our sensitivity is
also discussed. (Abridged.)Comment: 67 pages, 27 figures. Accepted for publication in ApJS. Catalog
available at http://shrike.pha.jhu.edu/stripe82-variable
An evaluation of DGA classifiers
Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system
Chromospherically Active Stars in the RAVE Survey. I. The Catalogue
RAVE, the unbiased magnitude limited survey of the southern sky stars,
contained 456,676 medium-resolution spectra at the time of our analysis.
Spectra cover the CaII IRT range which is a known indicator of chromospheric
activity. Our previous work (Matijevi\v{c} et al. 2012) classified all spectra
using locally linear embedding. It identified 53,347 cases with a suggested
emission component in calcium lines. Here we use a spectral subtraction
technique to measure the properties of this emission. Synthetic templates are
replaced by the observed spectra of non-active stars to bypass the difficult
computations of non-LTE profiles of the line cores and stellar parameter
dependence. We derive both the equivalent width of the excess emission for each
calcium line on a 5\AA\ wide interval and their sum EW_IRT for ~44,000
candidate active dwarf stars with S/N>20 and with no respect to the source of
their emission flux. From these ~14,000 show a detectable chromospheric flux
with at least 2\sigma\ confidence level. Our set of active stars vastly
enlarges previously known samples. Atmospheric parameters and in some cases
radial velocities of active stars derived from automatic pipeline suffer from
systematic shifts due to their shallower calcium lines. We re-estimate the
effective temperature, metallicity and radial velocities for candidate active
stars. The overall distribution of activity levels shows a bimodal shape, with
the first peak coinciding with non-active stars and the second with the pre
main-sequence cases. The catalogue will be publicly available with the next
RAVE public data releases.Comment: 13 pages, 9 figure
- …