Search CORE

12 research outputs found

Systematic construction of anomaly detection benchmarks from real data

Author: Alan Fern
Andrew F. Emmott
Shubhomoy Das
Thomas Dietterich
Weng-keen Wong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Research in anomaly detection suffers from a lack of realis-tic and publicly-available problem sets. This paper discusses what properties such problem sets should possess. It then introduces a methodology for transforming existing classi-fication data sets into ground-truthed benchmark data sets for anomaly detection. The methodology produces data sets that vary along three important dimensions: (a) point diffi-culty, (b) relative frequency of anomalies, and (c) clustered-ness. We apply our generated datasets to benchmark several popular anomaly detection algorithms under a range of dif-ferent conditions. 1

CiteSeerX

Crossref

Mining coherent anomaly collections on web data

Author: DAI Hanbo
Ee-peng LIM
Hwee Hwa PANG
ZHU Feida
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/10/2012
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Automatic classification of Candida species using Raman spectroscopy and machine learning

Author: Calvo Montes Jorge
Cobo García Adolfo
Fariñas Álvarez María del Carmen
Fernández Manteca María Gabriela
López Higuera José Miguel
Madrazo Fidel
Ocampo Sosa Alain Antonio
Pía Roiz María
Rodríguez Cobo Luis
Rodríguez Grande Jorge
Ruiz de Alegría Puig Carlos
Publication venue: 'Elsevier BV'
Publication date: 05/04/2023
Field of study

One of the problems that most affect hospitals is infections by pathogenic microorganisms. Rapid identification and adequate, timely treatment can avoid fatal consequences and the development of antibiotic resistance, so it is crucial to use fast, reliable, and not too laborious techniques to obtain quick results. Raman spectroscopy has proven to be a powerful tool for molecular analysis, meeting these requirements better than traditional techniques. In this work, we have used Raman spectroscopy combined with machine learning algorithms to explore the automatic identification of eleven species of the genus Candida, the most common cause of fungal infections worldwide. The Raman spectra were obtained from more than 220 different measurements of dried drops from pure cultures of each Candida species using a Raman Confocal Microscope with a 532 nm laser excitation source. After developing a spectral preprocessing methodology, a study of the quality and variability of the measured spectra at the isolate and species level, and the spectral features contributing to inter-class variations, showed the potential to discriminate between those pathogenic yeasts. Several machine learning and deep learning algorithms were trained using hyperparameter optimization techniques to find the best possible classifier for this spectral data, in terms of accuracy and lowest possible overfitting. We found that a one-dimensional Convolutional Neural Network (1-D CNN) could achieve above 80 % overall accuracy for the eleven classes spectral dataset, with good generalization capabilities.This work was supported by the R + D projects INNVAL19/17 (funded by Instituto de Investigación Valdecilla-IDIVAL), PID2019-107270RB-C21 (funded by MCIN/ AEI /10.13039/501100011033) and by Plan Nacional de I + D + and Instituto de Salud Carlos III (ISCIII), Subdirección General de Redes y Centros de Investigación Cooperativa, Ministerio de Ciencia, Innovación y Universidades, Spanish Network for Research in Infectious Diseases (REIPI RD16/0016/0007), CIBERINFEC (CB21/13/00068), CIBER-BBN (BBNGC1601), cofinanced by European Development Regional Fund “A way to achieve Europe”. A. A. O.-S was financially supported by the Miguel Servet II program (ISCIII-CPII17-00011)

UCrea

Copula-based anomaly scoring and localization for large-scale, high-dimensional continuous data

Author: Horváth Gábor
Kovács Edith
Molontay Roland
Nováczki Szabolcs
Publication venue
Publication date: 04/12/2019
Field of study

The anomaly detection method presented by this paper has a special feature: it does not only indicate whether an observation is anomalous or not but also tells what exactly makes an anomalous observation unusual. Hence, it provides support to localize the reason of the anomaly. The proposed approach is model-based; it relies on the multivariate probability distribution associated with the observations. Since the rare events are present in the tails of the probability distributions, we use copula functions, that are able to model the fat-tailed distributions well. The presented procedure scales well; it can cope with a large number of high-dimensional samples. Furthermore, our procedure can cope with missing values, too, which occur frequently in high-dimensional data sets. In the second part of the paper, we demonstrate the usability of the method through a case study, where we analyze a large data set consisting of the performance counters of a real mobile telecommunication network. Since such networks are complex systems, the signs of sub-optimal operation can remain hidden for a potentially long time. With the proposed procedure, many such hidden issues can be isolated and indicated to the network operator.Comment: 27 pages, 12 figures, accepted at ACM Transactions on Intelligent Systems and Technolog

arXiv.org e-Print Archive

Repository of the Academy's Library

An overview of clustering methods with guidelines for application in mental health research

Author: Gao Caroline X.
Publication venue: Universidad de Granada
Publication date: 27/05/2023
Field of study

Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

Repositorio Institucional Universidad de Granada

Développement d'un modèle de détection d'anomalie basée sur les forêts d'isolement dans un environnement V2G

Author: Amar Cheikh Souhaibou
Publication venue
Publication date: 01/01/2022
Field of study

Dépôt numérique de UQTR