Search CORE

562 research outputs found

Automation of cleaning and ensembles for outliers detection in questionnaire data

Author: Baďura Petr
Dráždilová Pavla
Platoš Jan
Uher Vojtěch
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

This article is focused on the automatic detection of the corrupted or inappropriate responses in questionnaire data using unsupervised outliers detection. The questionnaire surveys are often used in psychology research to collect self-report data and their preprocessing takes a lot of manual effort. Unlike with numerical data where the distance-based outliers prevail, the records in questionnaires have to be assessed from various perspectives that do not relate so much. We identify the most frequent types of errors in questionnaires. For each of them, we suggest different outliers detection methods ranking the records with the usage of normalized scores. Considering the similarity between pairs of outlier scores (some are highly uncorrelated), we propose an ensemble based on the union of outliers detected by different methods. Our outlier detection framework consists of some well-known algorithms but we also propose novel approaches addressing the typical issues of questionnaires. The selected methods are based on distance, entropy, and probability. The experimental section describes the process of assembling the methods and selecting their parameters for the final model detecting significant outliers in the real-world HBSC dataset.Web of Science206art. no. 11780

DSpace at VSB Technical University of Ostrava

Recommended from our members

Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.

Author: Chan Michelle M
Hussmann Jeffrey A
Jones Matthew G
Khodaverdian Alex
Quinn Jeffrey J
Wang Robert
Weissman Jonathan S
Xu Chenling
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia

eScholarship - University of California

Hybrid genetic algorithm for clustering IC topographies of EEGs

Author: Al-Safi Haedar E. S.
Luque-Vilaseca Juan Luis
Munilla-Fajardo Jorge
Ortiz-García Andrés
Publication venue: Springer
Publication date: 01/01/2023
Field of study

Clustering of independent component (IC) topographies of Electroencephalograms (EEG) is an effective way to find brain-generated IC processes associated with a population of interest, particularly for those cases where event-related potential features are not available. This paper proposes a novel algorithm for the clustering of these IC topographies and compares its results with the most currently used clustering algorithms. In this study, 32-electrode EEG signals were recorded at a sampling rate of 500 Hz for 48 participants. EEG signals were pre-processed and IC topographies computed using the AMICA algorithm. The algorithm implements a hybrid approach where genetic algorithms are used to compute more accurate versions of the centroids and the final clusters after a pre-clustering phase based on spectral clustering. The algorithm automatically selects the optimum number of clusters by using a fitness function that involves local-density along with compactness and separation criteria. Specific internal validation metrics adapted to the use of the absolute correlation coefficient as the similarity measure are defined for the benchmarking process. Assessed results across different ICA decompositions and groups of subjects show that the proposed clustering algorithm significantly outperforms the (baseline) clustering algorithms provided by the software EEGLAB, including CORRMAP.Funding for open access charge: Universidad de Málaga / CBUA Funding for open access publishing: Universidad Málaga/CBUA. This work was supported by projects PGC2018-098,813-B C32 (Spanish “Ministerio de Ciencia, Innovación y Universidades”), UMA20-FEDERJA-086 (Consejería de economía y conocimiento, Junta de Andalucía), Project P18-rt-1624, and by European Regional Development Funds (ERDF). We also thank the Leeduca research group and Junta de Andalucía for the data supplied and the support

Repositorio Institucional Universidad de Málaga

Semi-automated techniques for the retrieval of dermatological condition in color skin images

Author: Huang Ranxi
Publication venue: RIT Scholar Works
Publication date: 01/05/2009
Field of study

Dermatologists base the diagnosis of skin disease on the visual assessment of the skin. This fact shows that correct diagnosis is highly dependent on the observer\u27s experience and on his or her visual perception. Moreover, the human vision system lacks accuracy, reproducibility, and quantification in the way it gathers information from an image. So, there is a great need for computer-aided diagnosis. We propose a content-based image retrieval (CBIR) system to aid in the diagnosis of skin disease. First, after examining the skin images, pre-processing will be performed. Second, we examine the visual features for skin disease classified in the database and select color, texture and shape for characterization of a certain skin disease. Third, feature extraction techniques for each visual feature are investigated respectively. Fourth, similarity measures based on the extracted features will be discussed. Last, after discussing single feature performance, a distance metric combination scheme will be explored. The experimental data set is divided into two parts: developmental data set used as an image library and an unlabeled independent test data set. Two sets of experiments are performed: the input image of the skin image retrieval algorithm is either from developmental data set or independent test data set. The results are top five candidates of the input query image, that is, five labeled images from image library. Results are laid out separately for developmental data set and independent test data set. Two evaluation systems, both the standard precision vs. recall method, and the self-developed scoring method are carried out. The evaluation results obtained by both methods are given for each class of disease. Among all visual features, we found the color feature played a dominating role in distinguishing different types of skin disease. Among all classes of images, the class with best feature consistency gained the best retrieval accuracy based on the evaluation result. For future research we recommend further work in image collection protocol, color balancing, combining the feature metrics, improving texture characterization and incorporating semantic assistance in the retrieved process

RIT Scholar Works

Exploring variability in medical imaging

Author: Chotzoglou Elissavet
Publication venue: Computing, Imperial College London
Publication date: 01/04/2022
Field of study

Although recent successes of deep learning and novel machine learning techniques improved the perfor- mance of classification and (anomaly) detection in computer vision problems, the application of these methods in medical imaging pipeline remains a very challenging task. One of the main reasons for this is the amount of variability that is encountered and encapsulated in human anatomy and subsequently reflected in medical images. This fundamental factor impacts most stages in modern medical imaging processing pipelines. Variability of human anatomy makes it virtually impossible to build large datasets for each disease with labels and annotation for fully supervised machine learning. An efficient way to cope with this is to try and learn only from normal samples. Such data is much easier to collect. A case study of such an automatic anomaly detection system based on normative learning is presented in this work. We present a framework for detecting fetal cardiac anomalies during ultrasound screening using generative models, which are trained only utilising normal/healthy subjects. However, despite the significant improvement in automatic abnormality detection systems, clinical routine continues to rely exclusively on the contribution of overburdened medical experts to diagnosis and localise abnormalities. Integrating human expert knowledge into the medical imaging processing pipeline entails uncertainty which is mainly correlated with inter-observer variability. From the per- spective of building an automated medical imaging system, it is still an open issue, to what extent this kind of variability and the resulting uncertainty are introduced during the training of a model and how it affects the final performance of the task. Consequently, it is very important to explore the effect of inter-observer variability both, on the reliable estimation of model’s uncertainty, as well as on the model’s performance in a specific machine learning task. A thorough investigation of this issue is presented in this work by leveraging automated estimates for machine learning model uncertainty, inter-observer variability and segmentation task performance in lung CT scan images. Finally, a presentation of an overview of the existing anomaly detection methods in medical imaging was attempted. This state-of-the-art survey includes both conventional pattern recognition methods and deep learning based methods. It is one of the first literature surveys attempted in the specific research area.Open Acces

Spiral - Imperial College Digital Repository

Advances in Robotics, Automation and Control

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The book presents an excellent overview of the recent developments in the different areas of Robotics, Automation and Control. Through its 24 chapters, this book presents topics related to control and robot design; it also introduces new mathematical tools and techniques devoted to improve the system modeling and control. An important point is the use of rational agents and heuristic techniques to cope with the computational complexity required for controlling complex systems. Through this book, we also find navigation and vision algorithms, automatic handwritten comprehension and speech recognition systems that will be included in the next generation of productive systems developed by man

Directory of Open Access Books (DOAB)

Biometric Systems

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study

Directory of Open Access Books (DOAB)

Text Similarity Between Concepts Extracted from Source Code and Documentation

Author: Capiluppi Andrea
Pauzi Zaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2020
Field of study

Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen