2,478 research outputs found
Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy
Data collection for scientific applications is increasing exponentially and
is forecasted to soon reach peta- and exabyte scales. Applications which
process and analyze scientific data must be scalable and focus on execution
performance to keep pace. In the field of radio astronomy, in addition to
increasingly large datasets, tasks such as the identification of transient
radio signals from extrasolar sources are computationally expensive. We present
a scalable approach to radio pulsar detection written in Scala that
parallelizes candidate identification to take advantage of in-memory task
processing using Apache Spark on a YARN distributed system. Furthermore, we
introduce a novel automated multiclass supervised machine learning technique
that we combine with feature selection to reduce the time required for
candidate classification. Experimental testing on a Beowulf cluster with 15
data nodes shows that the parallel implementation of the identification
algorithm offers a speedup of up to 5X that of a similar multithreaded
implementation. Further, we show that the combination of automated multiclass
classification and feature selection speeds up the execution performance of the
RandomForest machine learning algorithm by an average of 54% with less than a
2% average reduction in the algorithm's ability to correctly classify pulsars.
The generalizability of these results is demonstrated by using two real-world
radio astronomy data sets.Comment: In Proceedings of the 47th International Conference on Parallel
Processing (ICPP 2018). ACM, New York, NY, USA, Article 11, 11 page
Recommended from our members
Verifying baselines for crisis event information classification on Twitter
Social media are rich information sources during and in the aftermath of crisis events such as earthquakes and terrorist attacks. Despite myriad challenges, with the right tools, significant insight can be gained which can assist emergency responders and related applications. However, most extant approaches are incomparable, using bespoke definitions, models, datasets and even evaluation metrics. Furthermore, it is rare that code, trained models, or exhaustive parametrisation details are made openly available. Thus, even confirmation of self-reported performance is problematic; authoritatively determining the state of the art (SOTA) is essentially impossible. Consequently, to begin addressing such endemic ambiguity, this paper seeks to make 3 contributions: 1) the replication and results confirmation of a leading (and generalisable) technique; 2) testing straightforward modifications of the technique likely to improve performance; and 3) the extension of the technique to a novel and complimentary type of crisis-relevant information to demonstrate it’s generalisability
MS-ADS: multistage spectrogram image-based anomaly detection system for IoT security.
The innovative computing idea of Internet-of-Things (IoT) architecture has gained tremendous popularity over the last decade, resulting in an exponential increase in the connected devices and the data processed in the IoT networks. Since IoT devices collect a massive amount of sensitive information exchanged over the traditional internet, security has become a prime concern due to the more frequent generation of network anomalies. A network-based anomaly detection system can provide the much-needed efficient security solution to the IoT network by detecting anomalies at the network entry points through constant traffic monitoring. Despite enormous efforts by researchers, these detection systems still suffer from lower detection accuracy in detecting anomalies and generate a high false alarm rate and false-negative rate in classifying network traffic. To this end, this paper proposes an efficient Multistage Spectrogram image-based network Anomaly Detection System (MS-ADS) using a deep convolution neural network that utilizes a short-time Fourier Transform to transform flow features into spectrogram images. The results demonstrate that the proposed method achieves high detection accuracy of 99.98% with a reduction in the false alarm rate to 0.006% in classifying network traffic. Also, the proposed scheme improves predicting the anomaly instances by 0.75% to 4.82%, comparing the benchmark methodologies to exhibit its efficiency for the IoT network. To minimize the computational and training cost for the model re-training phase, the proposed solution demonstrates that only 40500 network flows from the dataset suffice to achieve a detection accuracy of 99.5%
Efficient Image Gallery Representations at Scale Through Multi-Task Learning
Image galleries provide a rich source of diverse information about a product
which can be leveraged across many recommendation and retrieval applications.
We study the problem of building a universal image gallery encoder through
multi-task learning (MTL) approach and demonstrate that it is indeed a
practical way to achieve generalizability of learned representations to new
downstream tasks. Additionally, we analyze the relative predictive performance
of MTL-trained solutions against optimal and substantially more expensive
solutions, and find signals that MTL can be a useful mechanism to address
sparsity in low-resource binary tasks.Comment: Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieva
- …