3,087 research outputs found

    The Whole is Greater than the Sum of the Parts: Optimizing the Joint Science Return from LSST, Euclid and WFIRST

    Get PDF
    The focus of this report is on the opportunities enabled by the combination of LSST, Euclid and WFIRST, the optical surveys that will be an essential part of the next decade's astronomy. The sum of these surveys has the potential to be significantly greater than the contributions of the individual parts. As is detailed in this report, the combination of these surveys should give us multi-wavelength high-resolution images of galaxies and broadband data covering much of the stellar energy spectrum. These stellar and galactic data have the potential of yielding new insights into topics ranging from the formation history of the Milky Way to the mass of the neutrino. However, enabling the astronomy community to fully exploit this multi-instrument data set is a challenging technical task: for much of the science, we will need to combine the photometry across multiple wavelengths with varying spectral and spatial resolution. We identify some of the key science enabled by the combined surveys and the key technical challenges in achieving the synergies.Comment: Whitepaper developed at June 2014 U. Penn Workshop; 28 pages, 3 figure

    Catalog of quasars from the Kilo-Degree Survey Data Release 3

    Get PDF
    We present a catalog of quasars selected from broad-band photometric ugri data of the Kilo-Degree Survey Data Release 3 (KiDS DR3). The QSOs are identified by the random forest (RF) supervised machine learning model, trained on SDSS DR14 spectroscopic data. We first cleaned the input KiDS data from entries with excessively noisy, missing or otherwise problematic measurements. Applying a feature importance analysis, we then tune the algorithm and identify in the KiDS multiband catalog the 17 most useful features for the classification, namely magnitudes, colors, magnitude ratios, and the stellarity index. We used the t-SNE algorithm to map the multi-dimensional photometric data onto 2D planes and compare the coverage of the training and inference sets. We limited the inference set to r<22 to avoid extrapolation beyond the feature space covered by training, as the SDSS spectroscopic sample is considerably shallower than KiDS. This gives 3.4 million objects in the final inference sample, from which the random forest identified 190,000 quasar candidates. Accuracy of 97%, purity of 91%, and completeness of 87%, as derived from a test set extracted from SDSS and not used in the training, are confirmed by comparison with external spectroscopic and photometric QSO catalogs overlapping with the KiDS footprint. The robustness of our results is strengthened by number counts of the quasar candidates in the r band, as well as by their mid-infrared colors available from WISE. An analysis of parallaxes and proper motions of our QSO candidates found also in Gaia DR2 suggests that a probability cut of p(QSO)>0.8 is optimal for purity, whereas p(QSO)>0.7 is preferable for better completeness. Our study presents the first comprehensive quasar selection from deep high-quality KiDS data and will serve as the basis for versatile studies of the QSO population detected by this survey.Comment: Data available from the KiDS website at http://kids.strw.leidenuniv.nl/DR3/quasarcatalog.php and the source code from https://github.com/snakoneczny/kids-quasar

    Reduction of False Positives in Intrusion Detection Based on Extreme Learning Machine with Situation Awareness

    Get PDF
    Protecting computer networks from intrusions is more important than ever for our privacy, economy, and national security. Seemingly a month does not pass without news of a major data breach involving sensitive personal identity, financial, medical, trade secret, or national security data. Democratic processes can now be potentially compromised through breaches of electronic voting systems. As ever more devices, including medical machines, automobiles, and control systems for critical infrastructure are increasingly networked, human life is also more at risk from cyber-attacks. Research into Intrusion Detection Systems (IDSs) began several decades ago and IDSs are still a mainstay of computer and network protection and continue to evolve. However, detecting previously unseen, or zero-day, threats is still an elusive goal. Many commercial IDS deployments still use misuse detection based on known threat signatures. Systems utilizing anomaly detection have shown great promise to detect previously unseen threats in academic research. But their success has been limited in large part due to the excessive number of false positives that they produce. This research demonstrates that false positives can be better minimized, while maintaining detection accuracy, by combining Extreme Learning Machine (ELM) and Hidden Markov Models (HMM) as classifiers within the context of a situation awareness framework. This research was performed using the University of New South Wales - Network Based 2015 (UNSW-NB15) data set which is more representative of contemporary cyber-attack and normal network traffic than older data sets typically used in IDS research. It is shown that this approach provides better results than either HMM or ELM alone and with a lower False Positive Rate (FPR) than other comparable approaches that also used the UNSW-NB15 data set

    Hidden Markov Model Based Intrusion Alert Prediction

    Get PDF
    Intrusion detection is only a starting step in securing IT infrastructure. Prediction of intrusions is the next step to provide an active defense against incoming attacks. Most of the existing intrusion prediction methods mainly focus on prediction of either intrusion type or intrusion category. Also, most of them are built based on domain knowledge and specific scenario knowledge. This thesis proposes an alert prediction framework which provides more detailed information than just the intrusion type or category to initiate possible defensive measures. The proposed algorithm is based on hidden Markov model and it does not depend on specific domain knowledge. Instead, it depends on a training process. Hence the proposed algorithm is adaptable to different conditions. Also, it is based on prediction of the next alert cluster, which contains source IP address, destination IP range, alert type and alert category. Hence, prediction of next alert cluster provides more information about future strategies of the attacker. Experiments were conducted using a public data set generated over 2500 alert predictions. Proposed alert prediction framework achieved accuracy of 81% and 77% for single step and five step predictions respectively for prediction of the next alert cluster. It also achieved an accuracy of prediction of 95% and 92% for single step and five step predictions respectively for prediction of the next alert category. The proposed methods achieved 5% prediction accuracy improvement for alert category over variable length Markov based alert prediction method, while providing more information for a possible defense

    Survey of Vector Database Management Systems

    Full text link
    There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and difficulty of efficiently answering hybrid queries that require both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including native systems specialized for vectors and extended systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally we outline research challenges and point the direction for future work.Comment: 25 page

    Report by the ESA-ESO Working Group on Fundamental Cosmology

    Get PDF
    ESO and ESA agreed to establish a number of Working Groups to explore possible synergies between these two major European astronomical institutions. This Working Group's mandate was to concentrate on fundamental questions in cosmology, and the scope for tackling these in Europe over the next ~15 years. One major resulting recommendation concerns the provision of new generations of imaging survey, where the image quality and near-IR sensitivity that can be attained only in space are naturally matched by ground-based imaging and spectroscopy to yield massive datasets with well-understood photometric redshifts (photo-z's). Such information is essential for a range of new cosmological tests using gravitational lensing, large-scale structure, clusters of galaxies, and supernovae. Great scope in future cosmology also exists for ELT studies of the intergalactic medium and space-based studies of the CMB and gravitational waves; here the synergy is less direct, but these areas will remain of the highest mutual interest to the agencies. All these recommended facilities will produce vast datasets of general applicability, which will have a tremendous impact on broad areas of astronomy.Comment: ESA-ESO Working Groups Report No. 3, 125 pages, 28 figures. A PDF version including the cover is available from http://www.stecf.org/coordination/esa_eso/cosmology/report_cover.pdf and a printed version (A5 booklet) is available in limited numbers from the Space Telescope-European Coordinating Facility (ST-ECF): [email protected]

    Deep Learning-Based Intrusion Detection Methods for Computer Networks and Privacy-Preserving Authentication Method for Vehicular Ad Hoc Networks

    Get PDF
    The incidence of computer network intrusions has significantly increased over the last decade, partially attributed to a thriving underground cyber-crime economy and the widespread availability of advanced tools for launching such attacks. To counter these attacks, researchers in both academia and industry have turned to machine learning (ML) techniques to develop Intrusion Detection Systems (IDSes) for computer networks. However, many of the datasets use to train ML classifiers for detecting intrusions are not balanced, with some classes having fewer samples than others. This can result in ML classifiers producing suboptimal results. In this dissertation, we address this issue and present better ML based solutions for intrusion detection. Our contributions in this direction can be summarized as follows: Balancing Data Using Synthetic Data to detect intrusions in Computer Networks: In the past, researchers addressed the issue of imbalanced data in datasets by using over-sampling and under-sampling techniques. In this study, we go beyond such traditional methods and utilize a synthetic data generation method called Con- ditional Generative Adversarial Network (CTGAN) to balance the datasets and in- vestigate its impact on the performance of widely used ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples for balancing intrusion detection datasets. We use two widely used publicly available datasets and conduct extensive experiments and show that ML classifiers trained on these datasets balanced with synthetic samples generated by CTGAN have higher prediction accuracy and Matthew Correlation Coefficient (MCC) scores than those trained on imbalanced datasets by 8% and 13%, respectively. Deep Learning approach for intrusion detection using focal loss function: To overcome the data imbalance problem for intrusion detection, we leverage the specialized loss function, called focal loss, that automatically down-weighs easy ex- amples and focuses on the hard negatives by facilitating dynamically scaled-gradient updates for training ML models effectively. We implement our approach using two well-known Deep Learning (DL) neural network architectures. Compared to training DL models using cross-entropy loss function, our approach (training DL models using focal loss function) improved accuracy, precision, F1 score, and MCC score by 24%, 39%, 39%, and 60% respectively. Efficient Deep Learning approach to detect Intrusions using Few-shot Learning: To address the issue of imbalance the datasets and develop a highly effective IDS, we utilize the concept of few-shot learning. We present a Few-Shot and Self-Supervised learning framework, called FS3, for detecting intrusions in IoT networks. FS3 works in three phases. Our approach involves first pretraining an encoder on a large-scale external dataset in a selfsupervised manner. We then employ few-shot learning (FSL), which seeks to replicate the encoder’s ability to learn new patterns from only a few training examples. During the encoder training us- ing a small number of samples, we train them contrastively, utilizing the triplet loss function. The third phase introduces a novel K-Nearest neighbor algorithm that sub- samples the majority class instances to further reduce imbalance and improve overall performance. Our proposed framework FS3, utilizing only 20% of labeled data, out- performs fully supervised state-of-the-art models by up to 42.39% and 43.95% with respect to the metrics precision and F1 score, respectively. The rapid evolution of the automotive industry and advancements in wireless com- munication technologies will result in the widespread deployment of Vehicular ad hoc networks (VANETs). However, despite the network’s potential to enable intelligent and autonomous driving, it also introduces various attack vectors that can jeopardize its security. In this dissertation, we present efficient privacy-preserving authenticated message dissemination scheme in VANETs. Conditional Privacy-preserving Authentication and Message Dissemination Scheme using Timestamp based Pseudonyms: To authenticate a message sent by a vehicle using its pseudonym, a certificate of the pseudonym signed by the central authority is generally utilized. If a vehicle is found to be malicious, certificates associated with all the pseudonyms assigned to it must be revoked. Certificate revocation lists (CRLs) should be shared with all entities that will be corresponding with the vehicle. As each vehicle has a large pool of pseudonyms allocated to it, the CRL can quickly grow in size as the number of revoked vehicles increases. This results in high storage overheads for storing the CRL, and significant authentication overheads as the receivers must check their CRL for each message received to verify its pseudonym. To address this issue, we present a timestamp-based pseudonym allocation scheme that reduces the storage overhead and authentication overhead by streamlining the CRL management process
    • …
    corecore