17 research outputs found

    Emotion recognition from speech using representation learning in extreme learning machines

    Get PDF
    We propose the use of an Extreme Learning Machine initialised as auto-encoder for emotion recognition from speech. This method is evaluated on three different speech corpora, namely EMO-DB, eNTERFACE and SmartKom. We compare our approach against state-of-the-art recognition rates achieved by Support Vector Machines (SVMs) and a deep learning approach based on Generalised Discriminant Analysis (GerDA). We could improve the recognition rate compared to SVMs by 3%-14% on all three corpora and those compared to GerDA by 8%-13% on two of the three corpora

    Evaluation of deep learning training strategies for the classification of bone marrow cell images

    Get PDF
    BACKGROUND AND OBJECTIVE The classification of bone marrow (BM) cells by light microscopy is an important cornerstone of hematological diagnosis, performed thousands of times a day by highly trained specialists in laboratories worldwide. As the manual evaluation of blood or BM smears is very time-consuming and prone to inter-observer variation, new reliable automated systems are needed. METHODS We aim to improve the automatic classification performance of hematological cell types. Therefore, we evaluate four state-of-the-art Convolutional Neural Network (CNN) architectures on a dataset of 171,374 microscopic cytological single-cell images obtained from BM smears from 945 patients diagnosed with a variety of hematological diseases. We further evaluate the effect of an in-domain vs. out-of-domain pre-training, and assess whether class activation maps provide human-interpretable explanations for the models' predictions. RESULTS The best performing pre-trained model (Regnet_y_32gf) yields a mean precision, recall, and F1 scores of 0.787±0.060, 0.755±0.061, and 0.762±0.050, respectively. This is a 53.5% improvement in precision and 7.3% improvement in recall over previous results with CNNs (ResNeXt-50) that were trained from scratch. The out-of-domain pre-training apparently yields general feature extractors/filters that apply very well to the BM cell classification use case. The class activation maps on cell types with characteristic morphological features were found to be consistent with the explanations of a human domain expert. For example, the Auer rods in the cytoplasm were the predictive cellular feature for correctly classified images of faggot cells. CONCLUSIONS Our study provides data that can help hematology laboratories to choose the optimal training strategy for blood cell classification deep learning models to improve computer-assisted blood and bone marrow cell identification. It also highlights the need for more specific training data, i.e. images of difficult-to-classify classes, including cells labeled with disease information

    Evaluation of deep learning training strategies for the classification of bone marrow cell images

    Get PDF
    Background and Objective: The classification of bone marrow (BM) cells by light mi- croscopy is an important cornerstone of hematological diagnosis, performed thousands of times a day by highly trained specialists in laboratories worldwide. As the manual evaluation of blood or BM smears is very time-consuming and prone to inter-observer variation, new reliable automated systems are needed. Methods: We aim to improve the automatic classification performance of hematolog- ical cell types. Therefore, we evaluate four state-of-the-art Convolutional Neural Net- work (CNN) architectures on a dataset of 171, 374 microscopic cytological single-cell images obtained from BM smears from 945 patients diagnosed with a variety of hema- tological diseases. We further evaluate the effect of an in-domain vs. out-of-domain pre-training, and assess whether class activation maps provide human-interpretable ex- planations for the models’ predictions. Results: The best performing pre-trained model (Regnet y 32gf) yields a mean pre- cision, recall, and F1 scores of 0.787 ± 0.060, 0.755 ± 0.061, and 0.762 ± 0.050, re- spectively. This is a 53.5% improvement in precision and 7.3% improvement in recall over previous results with CNNs (ResNeXt-50) that were trained from scratch. The out-of-domain pre-training apparently yields general feature extractors/filters that ap- ply very well to the BM cell classification use case. The class activation maps on cell types with characteristic morphological features were found to be consistent with the explanations of a human domain expert. For example, the Auer rods in the cytoplasm were the predictive cellular feature for correctly classified images of faggot cells. Conclusions: Our study provides data that can help hematology laboratories to choose the optimal training strategy for blood cell classification deep learning mod- els to improve computer-assisted blood and bone marrow cell identification. It also highlights the need for more specific training data, i.e. images of difficult-to-classify classes, including cells labeled with disease information

    Robust drone detection and classification from radio frequency signals using convolutional neural networks

    Get PDF
    As the number of unmanned aerial vehicles (UAVs) in the sky increases, safety issues have become more pressing. In this paper, we compare the performance of convolutional neural networks (CNNs) using first, 1D in-phase and quadrature (IQ) data and second, 2D spectrogram data for detection and classification of UAVs based on their radio frequency (RF) signals. We focus on the robustness of the models to low signal-to-noise ratios (SNRs), as this is the most relevant aspect for a real-world application. Within an input type, either IQ or spectrogram, we found no significant difference in performance between models, even as model complexity increased. In addition, we found an advantage in favor of the 2D spectrogram representation of the data. While there is basically no performance difference at SNRs ≥ 0 dB, we observed a 100% improvement in balanced accuracy at −12 dB, i.e. 0.842 on the spectrogram data compared to 0.413 on the IQ data for the VGG11 model. Together with an easy-to-use benchmark dataset, our findings can be used to develop better models for robust UAV detection systems

    Bias, awareness, and ignorance in deep-learning-based face recognition

    Get PDF
    Face Recognition (FR) is increasingly influencing our lives: we use it to unlock our phones; police uses it to identify suspects. Two main concerns are associated with this increase in facial recognition: (1) the fact that these systems are typically less accurate for marginalized groups, which can be described as “bias”, and (2) the increased surveillance through these systems. Our paper is concerned with the first issue. Specifically, we explore an intuitive technique for reducing this bias, namely “blinding” models to sensitive features, such as gender or race, and show why this cannot be equated with reducing bias. Even when not designed for this task, facial recognition models can deduce sensitive features, such as gender or race, from pictures of faces—simply because they are trained to determine the “similarity” of pictures. This means that people with similar skin tones, similar hair length, etc. will be seen as similar by facial recognition models. When confronted with biased decision-making by humans, one approach taken in job application screening is to “blind” the human decision-makers to sensitive attributes such as gender and race by not showing pictures of the applicants. Based on a similar idea, one might think that if facial recognition models were less aware of these sensitive features, the difference in accuracy between groups would decrease. We evaluate this assumption—which has already penetrated into the scientific literature as a valid de-biasing method—by measuring how “aware” models are of sensitive features and correlating this with differences in accuracy. In particular, we blind pre-trained models to make them less aware of sensitive attributes. We find that awareness and accuracy do not positively correlate, i.e., that bias ≠ awareness. In fact, blinding barely affects accuracy in our experiments. The seemingly simple solution of decreasing bias in facial recognition rates by reducing awareness of sensitive features does thus not work in practice: trying to ignore sensitive attributes is not a viable concept for less biased FR

    Animal detection and species classification on Swiss camera trap images using AI

    Get PDF
    Motion-triggered camera traps are essential for the monitoring and management of wildlife. As per today in Switzerland, a high number of pictures is manually processed (annotated and classified). We study the utilization of available detection and classification models to (semi-)automatize this process. Two main aspects were investigated: 1) evaluate the feasibility of a non-expert local application (with Microsoft's MegaDetector model), and 2) quantify model performance using several labelled datasets of varying quality and content. Our results show a highly accurate (sensitive and specific), and reliable, fast inference which efficiently allows the automatic pre-discarding of all non-animal images. Further, the MegaDetector turns out to be both, user-friendly and highly performant and thus, an ideal tool for Swiss wildlife experts and stakeholders. Incentives (educational and financial) are required to promote knowledge transfer to this field

    Increase of skin temperature prior to parturition in mares

    Get PDF
    Prediction of impending foaling is highly desirable as early intervention may improve mare and foal outcomes. However, monitoring the peripartum mare is a time-consuming challenge for breeders and many foaling prediction systems have limitations. "Heating up" of the mare is empirically used by breeders as a sign of upcoming parturition in mares. The purpose of this study was to investigate if an increase in skin temperature shortly before parturition is detectable and to determine whether such physiological changes could be an additional valuable parameter to predict foaling. For that, 56 foalings of 14 Warmblood mares, 5 Arabian mares, 27 Thoroughbred mares, and 2 mares of other breeds were analyzed in this 2-year-study. Eight mares were monitored in both years. Mares were between 4 and 22 years old (average: 10 ± 5.5 years) and the mean pregnancy length was 342 days (±9 days), resulting in 14 births from primiparous mares and 42 multiparous mares. For monitoring the periparturient mares, the Piavet® system (Piavita AG, Zurich, Switzerland) was fixed daily when the mares had returned from the field between 4:00 and 6:00 p.m. and collected the next morning between 6:30 and 7:30 a.m. until the time of foaling. Nocturnal rhythms of the skin temperature with the highest values at the start of measurements and a nadir at 6:00 a.m. were observed. On the foaling night, we found a rise in skin temperature starting on average around 90 min prepartum. Skin temperatures recorded at 50 min before parturition and at each 5 min time point until rupture of the allantochorion were significantly higher (p < 0.05) than the mean temperatures measured in the 5 nights before parturition at the same time, reaching a difference of approximately 0.5 °C. There was a significant effect of parity (p = 0.04) on skin temperature during the last hours before foaling where primiparous mares showed a higher mean temperature than uni- or pluriparous mares as early as from 180 min on before parturition. In conclusion, our study shows an increase in skin temperature in most mares within 90 min before birth. Using new biomechanical and digital technologies, this finding could generate an additional potential parameter for the detection of impending parturition. However, skin temperature cannot be used as the only predictive diagnostic of impending parturition in the absence of other parameters

    Causality detection in complex time dependent systems examplified in financial time series

    No full text
    Publiziert im Rahmen des KTI Projektes Sales Forecastin

    Radiomics approach to quantify shape irregularity from crowd-based qualitative assessment of intracranial aneurysms

    Full text link
    The morphological assessment of anatomical structures is clinically relevant, but often falls short of quantitative or standardised criteria. Whilst human observers are able to assess morphological characteristics qualitatively, the development of robust shape features remains challenging. In this study, we employ psychometric and radiomic methods to develop quantitative models of the perceived irregularity of intracranial aneurysms (IAs). First, we collect morphological characteristics (e.g. irregularity, asymmetry) in imaging-derived data and aggregated the data using rank-based analysis. Second, we compute regression models relating quantitative shape features to the aggregated qualitative ratings (ordinal or binary). We apply our method for quantifying perceived shape irregularity to a dataset of 134 IAs using a pool of 179 different shape indices. Ratings given by 39 participants show good agreement with the aggregated ratings (Spearman rank correlation ρSp=0.84). The best-performing regression model based on quantitative shape features predicts the perceived irregularity with R2:0.84±0.05

    Annotation and classification of changes of involvement in group conversation

    No full text
    The detection of involvement in a conversation is important to assess the level humans are participating in either a human-human or human-computer interaction. Especially, detecting changes in a group's involvement in a multi-party interaction is of interest to distinguish several constellations in the group itself. This information can further be used in situations where technical support of meetings is favoured, for instance, focusing a camera, switching microphones, etc. Moreover, this information could also help to improve the performance of technical systems applied in human-machine interaction. In this paper, we concentrate on video material given by the Table Talk corpus. Therefore, we introduce a way of annotating and classifying changes of involvement and discuss the reliability of the annotation. Further, we present classification results based on video features using Multi-Layer Networks
    corecore