4,952 research outputs found

    Ensemble Methods for Anomaly Detection

    Get PDF
    Anomaly detection has many applications in numerous areas such as intrusion detection, fraud detection, and medical diagnosis. Most current techniques are specialized for detecting one type of anomaly, and work well on specific domains and when the data satisfies specific assumptions. We address this problem, proposing ensemble anomaly detection techniques that perform well in many applications, with four major contributions: using bootstrapping to better detect anomalies on multiple subsamples, sequential application of diverse detection algorithms, a novel adaptive sampling and learning algorithm in which the anomalies are iteratively examined, and improving the random forest algorithms for detecting anomalies in streaming data. We design and evaluate multiple ensemble strategies using score normalization, rank aggregation and majority voting, to combine the results from six well-known base algorithms. We propose a bootstrapping algorithm in which anomalies are evaluated from multiple subsets of the data. Results show that our independent ensemble performs better than the base algorithms, and using bootstrapping achieves competitive quality and faster runtime compared with existing works. We develop new sequential ensemble algorithms in which the second algorithm performs anomaly detection based on the first algorithm\u27s outputs; best results are obtained by combining algorithms that are substantially different. We propose a novel adaptive sampling algorithm which uses the score output of the base algorithm to determine the hard-to-detect examples, and iteratively resamples more points from such examples in a complete unsupervised context. On streaming datasets, we analyze the impact of parameters used in random trees, and propose new algorithms that work well with high-dimensional data, improving performance without increasing the number of trees or their heights. We show that further improvements can be obtained with an Evolutionary Algorithm

    Developing a disturbance index and extreme land surface temperature in the western United States

    Get PDF

    Applying Machine Learning to Cyber Security

    Get PDF
    Intrusion Detection Systems (IDS) nowadays are a very important part of a system. In the last years many methods have been proposed to implement this kind of security measure against cyber attacks, including Machine Learning and Data Mining based. In this work we discuss in details the family of anomaly based IDSs, which are able to detect never seen attacks, paying particular attention to adherence to the FAIR principles. This principles include the Accessibility and the Reusability of software. Moreover, as the purpose of this work is the assessment of what is going on in the state of the art we have selected three approaches, according to their reproducibility and we have compared their performances with a common experimental setting. Lastly real world use case has been analyzed, resulting in the proposal of an usupervised ML model for pre-processing and analyzing web server logs. The proposed solution uses clustering and outlier detection techniques to detect attacks in an unsupervised way

    Quantifying the impact of BOReal forest fires on Tropospheric oxidants over the Atlantic using Aircraft and Satellites (BORTAS) experiment: design, execution and science overview

    Get PDF
    We describe the design and execution of the BORTAS (Quantifying the impact of BOReal forest fires on Tropospheric oxidants over the Atlantic using Aircraft and Satellites) experiment, which has the overarching objective of understanding the chemical aging of air masses that contain the emission products from seasonal boreal wildfires and how these air masses subsequently impact downwind atmospheric composition. The central focus of the experiment was a two-week deployment of the UK BAe-146-301 Atmospheric Research Aircraft (ARA) over eastern Canada, based out of Halifax, Nova Scotia. Atmospheric ground-based and sonde measurements over Canada and the Azores associated with the planned July 2010 deployment of the ARA, which was postponed by 12 months due to UK-based flights related to the dispersal of material emitted by the Eyjafjallajökull volcano, went ahead and constituted phase A of the experiment. Phase B of BORTAS in July 2011 involved the same atmospheric measurements, but included the ARA, special satellite observations and a more comprehensive ground-based measurement suite. The high-frequency aircraft data provided a comprehensive chemical snapshot of pyrogenic plumes from wildfires, corresponding to photochemical (and physical) ages ranging from 45 sr 10 days, largely by virtue of widespread fires over Northwestern Ontario. Airborne measurements reported a large number of emitted gases including semi-volatile species, some of which have not been been previously reported in pyrogenic plumes, with the corresponding emission ratios agreeing with previous work for common gases. Analysis of the NOy data shows evidence of net ozone production in pyrogenic plumes, controlled by aerosol abundance, which increases as a function of photochemical age. The coordinated ground-based and sonde data provided detailed but spatially limited information that put the aircraft data into context of the longer burning season in the boundary layer. Ground-based measurements of particulate matter smaller than 2.5 μm (PM2.5) over Halifax show that forest fires can on an episodic basis represent a substantial contribution to total surface PM2.5

    A New Satellite-Based Methodology for Continental-Scale Disturbance Detection

    Get PDF
    The timing, location, and magnitude of major disturbance events are currently major uncertainties in the global carbon cycle. Accurate information on the location, spatial extent, and duration of disturbance at the continental scale is needed to evaluate the ecosystem impacts of land cover changes due to wildfire, insect epidemics, flooding, climate change, and human-triggered land use. This paper describes an algorithm developed to serve as an automated, economical, systematic disturbance detection index for global application using Moderate Resolution Imaging Spectroradiometer (MODIS)/Aqua Land Surface Temperature (LST) and Terra/MODIS Enhanced Vegetation Index (EVI) data from 2003 to 2004. The algorithm is based on the consistent radiometric relationship between LST and EVI computed on a pixel-by-pixel basis. We used annual maximum composite LST data to detect fundamental changes in land–surface energy partitioning, while avoiding the high natural variability associated with tracking LST at daily, weekly, or seasonal time frames. Verification of potential disturbance events from our algorithm was carried out by demonstration of close association with independently confirmed, well-documented historical wildfire events throughout the study domain. We also examined the response of the disturbance index to irrigation by comparing a heavily irrigated poplar tree farm to the adjacent semiarid vegetation. Anomalous disturbance results were further examined by association with precipitation variability across areas of the study domain known for large interannual vegetation variability. The results illustrate that our algorithm is capable of detecting the location and spatial extent of wildfire with precision, is sensitive to the incremental process of recovery of disturbed landscapes, and shows strong sensitivity to irrigation. Disturbance detection in areas with high interannual variability of precipitation will benefit from a multiyear data set to better separate natural variability from true disturbance

    From Intrusion Detection to Attacker Attribution: A Comprehensive Survey of Unsupervised Methods

    Get PDF
    Over the last five years there has been an increase in the frequency and diversity of network attacks. This holds true, as more and more organisations admit compromises on a daily basis. Many misuse and anomaly based Intrusion Detection Systems (IDSs) that rely on either signatures, supervised or statistical methods have been proposed in the literature, but their trustworthiness is debatable. Moreover, as this work uncovers, the current IDSs are based on obsolete attack classes that do not reflect the current attack trends. For these reasons, this paper provides a comprehensive overview of unsupervised and hybrid methods for intrusion detection, discussing their potential in the domain. We also present and highlight the importance of feature engineering techniques that have been proposed for intrusion detection. Furthermore, we discuss that current IDSs should evolve from simple detection to correlation and attribution. We descant how IDS data could be used to reconstruct and correlate attacks to identify attackers, with the use of advanced data analytics techniques. Finally, we argue how the present IDS attack classes can be extended to match the modern attacks and propose three new classes regarding the outgoing network communicatio

    Non-Invasive Ambient Intelligence in Real Life: Dealing with Noisy Patterns to Help Older People

    Get PDF
    This paper aims to contribute to the field of ambient intelligence from the perspective of real environments, where noise levels in datasets are significant, by showing how machine learning techniques can contribute to the knowledge creation, by promoting software sensors. The created knowledge can be actionable to develop features helping to deal with problems related to minimally labelled datasets. A case study is presented and analysed, looking to infer high-level rules, which can help to anticipate abnormal activities, and potential benefits of the integration of these technologies are discussed in this context. The contribution also aims to analyse the usage of the models for the transfer of knowledge when different sensors with different settings contribute to the noise levels. Finally, based on the authors’ experience, a framework proposal for creating valuable and aggregated knowledge is depicted.This research was partially funded by Fundación Tecnalia Research & Innovation, and J.O.-M. also wants to recognise the support obtained from the EU RFCS program through project number 793505 ‘4.0 Lean system integrating workers and processes (WISEST)’ and from the grant PRX18/00036 given by the Spanish Secretaría de Estado de Universidades, Investigación, Desarrollo e Innovación del Ministerio de Ciencia, Innovación y Universidades
    corecore