2,316 research outputs found

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Exploring variability in medical imaging

    Get PDF
    Although recent successes of deep learning and novel machine learning techniques improved the perfor- mance of classification and (anomaly) detection in computer vision problems, the application of these methods in medical imaging pipeline remains a very challenging task. One of the main reasons for this is the amount of variability that is encountered and encapsulated in human anatomy and subsequently reflected in medical images. This fundamental factor impacts most stages in modern medical imaging processing pipelines. Variability of human anatomy makes it virtually impossible to build large datasets for each disease with labels and annotation for fully supervised machine learning. An efficient way to cope with this is to try and learn only from normal samples. Such data is much easier to collect. A case study of such an automatic anomaly detection system based on normative learning is presented in this work. We present a framework for detecting fetal cardiac anomalies during ultrasound screening using generative models, which are trained only utilising normal/healthy subjects. However, despite the significant improvement in automatic abnormality detection systems, clinical routine continues to rely exclusively on the contribution of overburdened medical experts to diagnosis and localise abnormalities. Integrating human expert knowledge into the medical imaging processing pipeline entails uncertainty which is mainly correlated with inter-observer variability. From the per- spective of building an automated medical imaging system, it is still an open issue, to what extent this kind of variability and the resulting uncertainty are introduced during the training of a model and how it affects the final performance of the task. Consequently, it is very important to explore the effect of inter-observer variability both, on the reliable estimation of model’s uncertainty, as well as on the model’s performance in a specific machine learning task. A thorough investigation of this issue is presented in this work by leveraging automated estimates for machine learning model uncertainty, inter-observer variability and segmentation task performance in lung CT scan images. Finally, a presentation of an overview of the existing anomaly detection methods in medical imaging was attempted. This state-of-the-art survey includes both conventional pattern recognition methods and deep learning based methods. It is one of the first literature surveys attempted in the specific research area.Open Acces

    Anomaly detection prototype for log-based predictive maintenance at INFN-CNAF tier-1

    Get PDF
    Splitting the evolution of HEP from the one of computational resources needed to perform analyses is, nowadays, not possible. Each year, in fact, LHC produces dozens of PetaBytes of data (e.g. collision data, particle simulation, metadata etc.) that need orchestrated computing resources for storage, computational power and high throughput networks to connect centers. As a consequence of the LHC upgrade, the Luminosity of the experiment will increase by a factor of 10 over its originally designed value, entailing a non negligible technical challenge at computing centers: it is expected, in fact, an uprising in the amount of data produced and processed by the experiment. With this in mind, the HEP Software Foundation took action and released a road-map document describing the actions needed to prepare the computational infrastructure to support the upgrade. As a part of this collective effort, involving all computing centres of the Grid, INFN-CNAF has set a preliminary study towards the development of AI driven maintenance paradigm. As a contribution to this preparatory study, this master thesis presents an original software prototype that has been developed to handle the task of identifying critical activity time windows of a specific service (StoRM). Moreover, the prototype explores the viability of a content extraction via Text Processing techniques, applying such strategies to messages belonging to anomalous time windows
    • …
    corecore