562 research outputs found

    Automation of cleaning and ensembles for outliers detection in questionnaire data

    Get PDF
    This article is focused on the automatic detection of the corrupted or inappropriate responses in questionnaire data using unsupervised outliers detection. The questionnaire surveys are often used in psychology research to collect self-report data and their preprocessing takes a lot of manual effort. Unlike with numerical data where the distance-based outliers prevail, the records in questionnaires have to be assessed from various perspectives that do not relate so much. We identify the most frequent types of errors in questionnaires. For each of them, we suggest different outliers detection methods ranking the records with the usage of normalized scores. Considering the similarity between pairs of outlier scores (some are highly uncorrelated), we propose an ensemble based on the union of outliers detected by different methods. Our outlier detection framework consists of some well-known algorithms but we also propose novel approaches addressing the typical issues of questionnaires. The selected methods are based on distance, entropy, and probability. The experimental section describes the process of assembling the methods and selecting their parameters for the final model detecting significant outliers in the real-world HBSC dataset.Web of Science206art. no. 11780

    Hybrid genetic algorithm for clustering IC topographies of EEGs

    Get PDF
    Clustering of independent component (IC) topographies of Electroencephalograms (EEG) is an effective way to find brain-generated IC processes associated with a population of interest, particularly for those cases where event-related potential features are not available. This paper proposes a novel algorithm for the clustering of these IC topographies and compares its results with the most currently used clustering algorithms. In this study, 32-electrode EEG signals were recorded at a sampling rate of 500 Hz for 48 participants. EEG signals were pre-processed and IC topographies computed using the AMICA algorithm. The algorithm implements a hybrid approach where genetic algorithms are used to compute more accurate versions of the centroids and the final clusters after a pre-clustering phase based on spectral clustering. The algorithm automatically selects the optimum number of clusters by using a fitness function that involves local-density along with compactness and separation criteria. Specific internal validation metrics adapted to the use of the absolute correlation coefficient as the similarity measure are defined for the benchmarking process. Assessed results across different ICA decompositions and groups of subjects show that the proposed clustering algorithm significantly outperforms the (baseline) clustering algorithms provided by the software EEGLAB, including CORRMAP.Funding for open access charge: Universidad de Málaga / CBUA Funding for open access publishing: Universidad Málaga/CBUA. This work was supported by projects PGC2018-098,813-B C32 (Spanish “Ministerio de Ciencia, Innovación y Universidades”), UMA20-FEDERJA-086 (Consejería de economía y conocimiento, Junta de Andalucía), Project P18-rt-1624, and by European Regional Development Funds (ERDF). We also thank the Leeduca research group and Junta de Andalucía for the data supplied and the support

    Semi-automated techniques for the retrieval of dermatological condition in color skin images

    Get PDF
    Dermatologists base the diagnosis of skin disease on the visual assessment of the skin. This fact shows that correct diagnosis is highly dependent on the observer\u27s experience and on his or her visual perception. Moreover, the human vision system lacks accuracy, reproducibility, and quantification in the way it gathers information from an image. So, there is a great need for computer-aided diagnosis. We propose a content-based image retrieval (CBIR) system to aid in the diagnosis of skin disease. First, after examining the skin images, pre-processing will be performed. Second, we examine the visual features for skin disease classified in the database and select color, texture and shape for characterization of a certain skin disease. Third, feature extraction techniques for each visual feature are investigated respectively. Fourth, similarity measures based on the extracted features will be discussed. Last, after discussing single feature performance, a distance metric combination scheme will be explored. The experimental data set is divided into two parts: developmental data set used as an image library and an unlabeled independent test data set. Two sets of experiments are performed: the input image of the skin image retrieval algorithm is either from developmental data set or independent test data set. The results are top five candidates of the input query image, that is, five labeled images from image library. Results are laid out separately for developmental data set and independent test data set. Two evaluation systems, both the standard precision vs. recall method, and the self-developed scoring method are carried out. The evaluation results obtained by both methods are given for each class of disease. Among all visual features, we found the color feature played a dominating role in distinguishing different types of skin disease. Among all classes of images, the class with best feature consistency gained the best retrieval accuracy based on the evaluation result. For future research we recommend further work in image collection protocol, color balancing, combining the feature metrics, improving texture characterization and incorporating semantic assistance in the retrieved process

    Exploring variability in medical imaging

    Get PDF
    Although recent successes of deep learning and novel machine learning techniques improved the perfor- mance of classification and (anomaly) detection in computer vision problems, the application of these methods in medical imaging pipeline remains a very challenging task. One of the main reasons for this is the amount of variability that is encountered and encapsulated in human anatomy and subsequently reflected in medical images. This fundamental factor impacts most stages in modern medical imaging processing pipelines. Variability of human anatomy makes it virtually impossible to build large datasets for each disease with labels and annotation for fully supervised machine learning. An efficient way to cope with this is to try and learn only from normal samples. Such data is much easier to collect. A case study of such an automatic anomaly detection system based on normative learning is presented in this work. We present a framework for detecting fetal cardiac anomalies during ultrasound screening using generative models, which are trained only utilising normal/healthy subjects. However, despite the significant improvement in automatic abnormality detection systems, clinical routine continues to rely exclusively on the contribution of overburdened medical experts to diagnosis and localise abnormalities. Integrating human expert knowledge into the medical imaging processing pipeline entails uncertainty which is mainly correlated with inter-observer variability. From the per- spective of building an automated medical imaging system, it is still an open issue, to what extent this kind of variability and the resulting uncertainty are introduced during the training of a model and how it affects the final performance of the task. Consequently, it is very important to explore the effect of inter-observer variability both, on the reliable estimation of model’s uncertainty, as well as on the model’s performance in a specific machine learning task. A thorough investigation of this issue is presented in this work by leveraging automated estimates for machine learning model uncertainty, inter-observer variability and segmentation task performance in lung CT scan images. Finally, a presentation of an overview of the existing anomaly detection methods in medical imaging was attempted. This state-of-the-art survey includes both conventional pattern recognition methods and deep learning based methods. It is one of the first literature surveys attempted in the specific research area.Open Acces

    Advances in Robotics, Automation and Control

    Get PDF
    The book presents an excellent overview of the recent developments in the different areas of Robotics, Automation and Control. Through its 24 chapters, this book presents topics related to control and robot design; it also introduces new mathematical tools and techniques devoted to improve the system modeling and control. An important point is the use of rational agents and heuristic techniques to cope with the computational complexity required for controlling complex systems. Through this book, we also find navigation and vision algorithms, automatic handwritten comprehension and speech recognition systems that will be included in the next generation of productive systems developed by man

    Biometric Systems

    Get PDF
    Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p
    corecore