3,616 research outputs found

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    ISBIS 2016: Meeting on Statistics in Business and Industry

    Get PDF
    This Book includes the abstracts of the talks presented at the 2016 International Symposium on Business and Industrial Statistics, held at Barcelona, June 8-10, 2016, hosted at the Universitat Politècnica de Catalunya - Barcelona TECH, by the Department of Statistics and Operations Research. The location of the meeting was at ETSEIB Building (Escola Tecnica Superior d'Enginyeria Industrial) at Avda Diagonal 647. The meeting organizers celebrated the continued success of ISBIS and ENBIS society, and the meeting draw together the international community of statisticians, both academics and industry professionals, who share the goal of making statistics the foundation for decision making in business and related applications. The Scientific Program Committee was constituted by: David Banks, Duke University Amílcar Oliveira, DCeT - Universidade Aberta and CEAUL Teresa A. Oliveira, DCeT - Universidade Aberta and CEAUL Nalini Ravishankar, University of Connecticut Xavier Tort Martorell, Universitat Politécnica de Catalunya, Barcelona TECH Martina Vandebroek, KU Leuven Vincenzo Esposito Vinzi, ESSEC Business Schoo

    Exploring anomalies in time

    Get PDF

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Politische Maschinen: Maschinelles Lernen für das Verständnis von sozialen Maschinen

    Get PDF
    This thesis investigates human-algorithm interactions in sociotechnological ecosystems. Specifically, it applies machine learning and statistical methods to uncover political dimensions of algorithmic influence in social media platforms and automated decision making systems. Based on the results, the study discusses the legal, political and ethical consequences of algorithmic implementations.Diese Arbeit untersucht Mensch-Algorithmen-Interaktionen in sozio-technologischen Ă–kosystemen. Sie wendet maschinelles Lernen und statistische Methoden an, um politische Dimensionen des algorithmischen Einflusses auf Socialen Medien und automatisierten Entscheidungssystemen aufzudecken. Aufgrund der Ergebnisse diskutiert die Studie die rechtlichen, politischen und ethischen Konsequenzen von algorithmischen Anwendungen

    Monitoring the UK’s wild mammals: A new grammar for citizen science engagement and ecology

    Get PDF
    Anthropogenic activities have imperilled not just global ecosystems, but also the ecosystem services they provide which are crucial for human livelihoods. To understand these changes, there is a need for effective monitoring over large spatial and temporal scales. This thesis will build on two proposed solutions. First, citizen science – defined here as the involvement of non-professionals in scientific enquiry – allows the crowdsourcing of data collection and classification to expand monitoring in ways that are logistically infeasible for ecologists alone. Second, motion-sensing camera traps can reduce the labour needed for monitoring since they can be deployed for long periods and provide continuous, relatively unbiased observations. In this thesis, I describe MammalWeb, a citizen science project in north-east England where I enlisted the aid of the local community in wild mammal monitoring. Motivated by the current unevenness of survey effort and data for mammals in Great Britain, MammalWeb involves citizen scientists in both the collection and classification of camera trap images, a novel combination. This is a multidisciplinary project, and in the following chapters I will begin, in Chapter 2, with a detailed reflection on the organisation of the MammalWeb citizen science project and approaches to evaluating its performance. I observe that the majority of contributions came from a small subset of citizen scientists. In Chapter 3, I develop an economical approach to deriving consensus classifications from the aggregated input of multiple users, which is a crucial part of many citizen science projects. This is followed in Chapter 4 by a case study of a partnership I initiated between MammalWeb and the local Belmont Community School, where we empowered a group of secondary school students to not only aid in collecting data for MammalWeb, but also design and deliver ecological outreach to their community. This is now the template for a wider network of school partnerships we are pursuing. Chapter 5 will examine common concerns around estimating species occupancy from camera trap data, including post-hoc discretisation of observations and effects of missing data. I also develop a resampling method to account for uncertain detections, a common issue when crowdsourcing data classifications. I show that, through resampling, the estimated parameters from occupancy models are robust against high uncertainty in the underlying detections. Lastly, Chapter 6 will discuss how my work on MammalWeb has laid the foundation for a wider citizen science camera trapping network in the United Kingdom and avenues for future work. Importantly, I show that MammalWeb citizen scientists have been empowered to be more than “mobile sensors” and act as independent researchers who have initiated ecological studies elsewhere
    • …
    corecore