142 research outputs found

    An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images

    Get PDF
    This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm subimages are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method

    Hematological image analysis for acute lymphoblastic leukemia detection and classification

    Get PDF
    Microscopic analysis of peripheral blood smear is a critical step in detection of leukemia.However, this type of light microscopic assessment is time consuming, inherently subjective, and is governed by hematopathologists clinical acumen and experience. To circumvent such problems, an efficient computer aided methodology for quantitative analysis of peripheral blood samples is required to be developed. In this thesis, efforts are therefore made to devise methodologies for automated detection and subclassification of Acute Lymphoblastic Leukemia (ALL) using image processing and machine learning methods.Choice of appropriate segmentation scheme plays a vital role in the automated disease recognition process. Accordingly to segment the normal mature lymphocyte and malignant lymphoblast images into constituent morphological regions novel schemes have been proposed. In order to make the proposed schemes viable from a practical and real–time stand point, the segmentation problem is addressed in both supervised and unsupervised framework. These proposed methods are based on neural network,feature space clustering, and Markov random field modeling, where the segmentation problem is formulated as pixel classification, pixel clustering, and pixel labeling problem respectively. A comprehensive validation analysis is presented to evaluate the performance of four proposed lymphocyte image segmentation schemes against manual segmentation results provided by a panel of hematopathologists. It is observed that morphological components of normal and malignant lymphocytes differ significantly. To automatically recognize lymphoblasts and detect ALL in peripheral blood samples, an efficient methodology is proposed.Morphological, textural and color features are extracted from the segmented nucleus and cytoplasm regions of the lymphocyte images. An ensemble of classifiers represented as EOC3 comprising of three classifiers shows highest classification accuracy of 94.73% in comparison to individual members. The subclassification of ALL based on French–American–British (FAB) and World Health Organization (WHO) criteria is essential for prognosis and treatment planning. Accordingly two independent methodologies are proposed for automated classification of malignant lymphocyte (lymphoblast) images based on morphology and phenotype. These methods include lymphoblast image segmentation, nucleus and cytoplasm feature extraction, and efficient classification

    Biomedical Image Segmentation Based on Multiple Image Features

    Get PDF

    An intelligent decision support system for acute lymphoblastic leukaemia detection

    Get PDF
    The morphological analysis of blood smear slides by haematologists or haematopathologists is one of the diagnostic procedures available to evaluate the presence of acute leukaemia. This operation is a complex and costly process, and often lacks standardized accuracy owing to a variety of factors, including insufficient expertise and operator fatigue. This research proposes an intelligent decision support system for automatic detection of acute lymphoblastic leukaemia (ALL) using microscopic blood smear images to overcome the above barrier. The work has four main key stages. (1) Firstly, a modified marker-controlled watershed algorithm integrated with the morphological operations is proposed for the segmentation of the membrane of the lymphocyte and lymphoblast cell images. The aim of this stage is to isolate a lymphocyte/lymphoblast cell membrane from touching and overlapping of red blood cells, platelets and artefacts of the microscopic peripheral blood smear sub-images. (2) Secondly, a novel clustering algorithm with stimulating discriminant measure (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of the nucleus and cytoplasm of lymphocytic cell membranes. The SDM measures are used in conjunction with Genetic Algorithm for the clustering of nucleus, cytoplasm, and background regions. (3) Thirdly, a total of eighty features consisting of shape, texture, and colour information from the nucleus and cytoplasm of the identified lymphocyte/lymphoblast images are extracted. (4) Finally, the proposed feature optimisation algorithm, namely a variant of Bare-Bones Particle Swarm Optimisation (BBPSO), is presented to identify the most significant discriminative characteristics of the nucleus and cytoplasm segmented by the SDM-based clustering algorithm. The proposed BBPSO variant algorithm incorporates Cuckoo Search, Dragonfly Algorithm, BBPSO, and local and global random walk operations of uniform combination, and Lévy flights to diversify the search and mitigate the premature convergence problem of the conventional BBPSO. In addition, it also employs subswarm concepts, self-adaptive parameters, and convergence degree monitoring mechanisms to enable fast convergence. The optimal feature subsets identified by the proposed algorithm are subsequently used for ALL detection and classification. The proposed system achieves the highest classification accuracy of 96.04% and significantly outperforms related meta-heuristic search methods and related research for ALL detection

    Methodology for automatic classification of atypical lymphoid cells from peripheral blood cell images

    Get PDF
    Morphological analysis is the starting point for the diagnostic approach of more than 80% of the hematological diseases. However, the morphological differentiation among different types of abnormal lymphoid cells in peripheral blood is a difficult task, which requires high experience and skill. Objective values do not exist to define cytological variables, which sometimes results in doubts on the correct cell classification in the daily hospital routine. Automated systems exist which are able to get an automatic preclassification of the normal blood cells, but fail in the automatic recognition of the abnormal lymphoid cells. The general objective of this thesis is to develop a complete methodology to automatically recognize images of normal and reactive lymphocytes, and several types of neoplastic lymphoid cells circulating in peripheral blood in some mature B-cell neoplasms using digital image processing methods. This objective follows two directions: (1) with engineering and mathematical background, transversal methodologies and software tools are developed; and (2) with a view towards the clinical laboratory diagnosis, a system prototype is built and validated, whose input is a set of pathological cell images from individual patients, and whose output is the automatic classification in one of the groups of the different pathologies included in the system. This thesis is the evolution of various works, starting with a discrimination between normal lymphocytes and two types of neoplastic lymphoid cells, and ending with the design of a system for the automatic recognition of normal lymphocytes and five types of neoplastic lymphoid cells. All this work involves the development of a robust segmentation methodology using color clustering, which is able to separate three regions of interest: cell, nucleus and peripheral zone around the cell. A complete lymphoid cell description is developed by extracting features related to size, shape, texture and color. To reduce the complexity of the process, a feature selection is performed using information theory. Then, several classifiers are implemented to automatically recognize different types of lymphoid cells. The best classification results are achieved using support vector machines with radial basis function kernel. The methodology developed, which combines medical, engineering and mathematical backgrounds, is the first step to design a practical hematological diagnosis support tool in the near future.Los análisis morfológicos son el punto de partida para la orientación diagnóstica en más del 80% de las enfermedades hematológicas. Sin embargo, la clasificación morfológica entre diferentes tipos de células linfoides anormales en la sangre es una tarea difícil que requiere gran experiencia y habilidad. No existen valores objetivos para definir variables citológicas, lo que en ocasiones genera dudas en la correcta clasificación de las células en la práctica diaria en un laboratorio clínico. Existen sistemas automáticos que realizan una preclasificación automática de las células sanguíneas, pero no son capaces de diferenciar automáticamente las células linfoides anormales. El objetivo general de esta tesis es el desarrollo de una metodología completa para el reconocimiento automático de imágenes de linfocitos normales y reactivos, y de varios tipos de células linfoides neoplásicas circulantes en sangre periférica en algunos tipos de neoplasias linfoides B maduras, usando métodos de procesamiento digital de imágenes. Este objetivo sigue dos direcciones: (1) con una orientación propia de la ingeniería y la matemática de soporte, se desarrollan las metodologías transversales y las herramientas de software para su implementación; y (2) con un enfoque orientado al diagnóstico desde el laboratorio clínico, se construye y se valida un prototipo de un sistema cuya entrada es un conjunto de imágenes de células patológicas de pacientes analizados de forma individual, obtenidas mediante microscopía y cámara digital, y cuya salida es la clasificación automática en uno de los grupos de las distintas patologías incluidas en el sistema. Esta tesis es el resultado de la evolución de varios trabajos, comenzando con una discriminación entre linfocitos normales y dos tipos de células linfoides neoplásicas, y terminando con el diseño de un sistema para el reconocimiento automático de linfocitos normales y reactivos, y cinco tipos de células linfoides neoplásicas. Todo este trabajo involucra el desarrollo de una metodología de segmentación robusta usando agrupamiento por color, la cual es capaz de separar tres regiones de interés: la célula, el núcleo y la zona externa alrededor de la célula. Se desarrolla una descripción completa de la célula linfoide mediante la extracción de descriptores relacionados con el tamaño, la forma, la textura y el color. Para reducir la complejidad del proceso, se realiza una selección de descriptores usando teoría de la información. Posteriormente, se implementan varios clasificadores para reconocer automáticamente diferentes tipos de células linfoides. Los mejores resultados de clasificación se logran utilizando máquinas de soporte vectorial con núcleo de base radial. La metodología desarrollada, que combina conocimientos médicos, matemáticos y de ingeniería, es el primer paso para el diseño de una herramienta práctica de soporte al diagnóstico hematológico en un futuro cercano

    A Colour Wheel to Rule them All: Analysing Colour & Geometry in Medical Microscopy

    Get PDF
    Personalized medicine is a rapidly growing field in healthcare that aims to customize medical treatments and preventive measures based on each patient’s unique characteristics, such as their genes, environment, and lifestyle factors. This approach acknowledges that people with the same medical condition may respond differently to therapies and seeks to optimize patient outcomes while minimizing the risk of adverse effects. To achieve these goals, personalized medicine relies on advanced technologies, such as genomics, proteomics, metabolomics, and medical imaging. Digital histopathology, a crucial aspect of medical imaging, provides clinicians with valuable insights into tissue structure and function at the cellular and molecular levels. By analyzing small tissue samples obtained through minimally invasive techniques, such as biopsy or aspirate, doctors can gather extensive data to evaluate potential diagnoses and clinical decisions. However, digital analysis of histology images presents unique challenges, including the loss of 3D information and stain variability, which is further complicated by sample variability. Limited access to data exacerbates these challenges, making it difficult to develop accurate computational models for research and clinical use in digital histology. Deep learning (DL) algorithms have shown significant potential for improving the accuracy of Computer-Aided Diagnosis (CAD) and personalized treatment models, particularly in medical microscopy. However, factors such as limited generability, lack of interpretability, and bias sometimes hinder their clinical impact. Furthermore, the inherent variability of histology images complicates the development of robust DL methods. Thus, this thesis focuses on developing new tools to address these issues. Our essential objective is to create transparent, accessible, and efficient methods based on classical principles from various disciplines, including histology, medical imaging, mathematics, and art, to tackle microscopy image registration and colour analysis successfully. These methods can contribute significantly to the advancement of personalized medicine, particularly in studying the tumour microenvironment for diagnosis and therapy research. First, we introduce a novel automatic method for colour analysis and non-rigid histology registration, enabling the study of heterogeneity morphology in tumour biopsies. This method achieves accurate tissue cut registration, drastically reducing landmark distance and excellent border overlap. Second, we introduce ABANICCO, a novel colour analysis method that combines geometric analysis, colour theory, fuzzy colour spaces, and multi-label systems for automatically classifying pixels into a set of conventional colour categories. ABANICCO outperforms benchmark methods in accuracy and simplicity. It is computationally straightforward, making it useful in scenarios involving changing objects, limited data, unclear boundaries, or when users lack prior knowledge of the image or colour theory. Moreover, results can be modified to match each particular task. Third, we apply the acquired knowledge to create a novel pipeline of rigid histology registration and ABANICCO colour analysis for the in-depth study of triple-negative breast cancer biopsies. The resulting heterogeneity map and tumour score provide valuable insights into the composition and behaviour of the tumour, informing clinical decision-making and guiding treatment strategies. Finally, we consolidate the developed ideas into an efficient pipeline for tissue reconstruction and multi-modality data integration on Tuberculosis infection data. This enables accurate element distribution analysis to understand better interactions between bacteria, host cells, and the immune system during the course of infection. The methods proposed in this thesis represent a transparent approach to computational pathology, addressing the needs of medical microscopy registration and colour analysis while bridging the gap between clinical practice and computational research. Moreover, our contributions can help develop and train better, more robust DL methods.En una época en la que la medicina personalizada está revolucionando la asistencia sanitaria, cada vez es más importante adaptar los tratamientos y las medidas preventivas a la composición genética, el entorno y el estilo de vida de cada paciente. Mediante el empleo de tecnologías avanzadas, como la genómica, la proteómica, la metabolómica y la imagen médica, la medicina personalizada se esfuerza por racionalizar el tratamiento para mejorar los resultados y reducir los efectos secundarios. La microscopía médica, un aspecto crucial de la medicina personalizada, permite a los médicos recopilar y analizar grandes cantidades de datos a partir de pequeñas muestras de tejido. Esto es especialmente relevante en oncología, donde las terapias contra el cáncer se pueden optimizar en función de la apariencia tisular específica de cada tumor. La patología computacional, un subcampo de la visión por ordenador, trata de crear algoritmos para el análisis digital de biopsias. Sin embargo, antes de que un ordenador pueda analizar imágenes de microscopía médica, hay que seguir varios pasos para conseguir las imágenes de las muestras. La primera etapa consiste en recoger y preparar una muestra de tejido del paciente. Para que esta pueda observarse fácilmente al microscopio, se corta en secciones ultrafinas. Sin embargo, este delicado procedimiento no está exento de dificultades. Los frágiles tejidos pueden distorsionarse, desgarrarse o agujerearse, poniendo en peligro la integridad general de la muestra. Una vez que el tejido está debidamente preparado, suele tratarse con tintes de colores característicos. Estos tintes acentúan diferentes tipos de células y tejidos con colores específicos, lo que facilita a los profesionales médicos la identificación de características particulares. Sin embargo, esta mejora en visualización tiene un alto coste. En ocasiones, los tintes pueden dificultar el análisis informático de las imágenes al mezclarse de forma inadecuada, traspasarse al fondo o alterar el contraste entre los distintos elementos. El último paso del proceso consiste en digitalizar la muestra. Se toman imágenes de alta resolución del tejido con distintos aumentos, lo que permite su análisis por ordenador. Esta etapa también tiene sus obstáculos. Factores como una calibración incorrecta de la cámara o unas condiciones de iluminación inadecuadas pueden distorsionar o hacer borrosas las imágenes. Además, las imágenes de porta completo obtenidas so de tamaño considerable, complicando aún más el análisis. En general, si bien la preparación, la tinción y la digitalización de las muestras de microscopía médica son fundamentales para el análisis digital, cada uno de estos pasos puede introducir retos adicionales que deben abordarse para garantizar un análisis preciso. Además, convertir un volumen de tejido completo en unas pocas secciones teñidas reduce drásticamente la información 3D disponible e introduce una gran incertidumbre. Las soluciones de aprendizaje profundo (deep learning, DL) son muy prometedoras en el ámbito de la medicina personalizada, pero su impacto clínico a veces se ve obstaculizado por factores como la limitada generalizabilidad, el sobreajuste, la opacidad y la falta de interpretabilidad, además de las preocupaciones éticas y en algunos casos, los incentivos privados. Por otro lado, la variabilidad de las imágenes histológicas complica el desarrollo de métodos robustos de DL. Para superar estos retos, esta tesis presenta una serie de métodos altamente robustos e interpretables basados en principios clásicos de histología, imagen médica, matemáticas y arte, para alinear secciones de microscopía y analizar sus colores. Nuestra primera contribución es ABANICCO, un innovador método de análisis de color que ofrece una segmentación de colores objectiva y no supervisada y permite su posterior refinamiento mediante herramientas fáciles de usar. Se ha demostrado que la precisión y la eficacia de ABANICCO son superiores a las de los métodos existentes de clasificación y segmentación del color, e incluso destaca en la detección y segmentación de objetos completos. ABANICCO puede aplicarse a imágenes de microscopía para detectar áreas teñidas para la cuantificación de biopsias, un aspecto crucial de la investigación de cáncer. La segunda contribución es un método automático y no supervisado de segmentación de tejidos que identifica y elimina el fondo y los artefactos de las imágenes de microscopía, mejorando así el rendimiento de técnicas más sofisticadas de análisis de imagen. Este método es robusto frente a diversas imágenes, tinciones y protocolos de adquisición, y no requiere entrenamiento. La tercera contribución consiste en el desarrollo de métodos novedosos para registrar imágenes histopatológicas de forma eficaz, logrando el equilibrio adecuado entre un registro preciso y la preservación de la morfología local, en función de la aplicación prevista. Como cuarta contribución, los tres métodos mencionados se combinan para crear procedimientos eficientes para la integración completa de datos volumétricos, creando visualizaciones altamente interpretables de toda la información presente en secciones consecutivas de biopsia de tejidos. Esta integración de datos puede tener una gran repercusión en el diagnóstico y el tratamiento de diversas enfermedades, en particular el cáncer de mama, al permitir la detección precoz, la realización de pruebas clínicas precisas, la selección eficaz de tratamientos y la mejora en la comunicación el compromiso con los pacientes. Por último, aplicamos nuestros hallazgos a la integración multimodal de datos y la reconstrucción de tejidos para el análisis preciso de la distribución de elementos químicos en tuberculosis, lo que arroja luz sobre las complejas interacciones entre las bacterias, las células huésped y el sistema inmunitario durante la infección tuberculosa. Este método también aborda problemas como el daño por adquisición, típico de muchas modalidades de imagen. En resumen, esta tesis muestra la aplicación de métodos clásicos de visión por ordenador en el registro de microscopía médica y el análisis de color para abordar los retos únicos de este campo, haciendo hincapié en la visualización eficaz y fácil de datos complejos. Aspiramos a seguir perfeccionando nuestro trabajo con una amplia validación técnica y un mejor análisis de los datos. Los métodos presentados en esta tesis se caracterizan por su claridad, accesibilidad, visualización eficaz de los datos, objetividad y transparencia. Estas características los hacen perfectos para tender puentes robustos entre los investigadores de inteligencia artificial y los clínicos e impulsar así la patología computacional en la práctica y la investigación médicas.Programa de Doctorado en Ciencia y Tecnología Biomédica por la Universidad Carlos III de MadridPresidenta: María Jesús Ledesma Carbayo.- Secretario: Gonzalo Ricardo Ríos Muñoz.- Vocal: Estíbaliz Gómez de Marisca

    A Comprehensive Overview of Computational Nuclei Segmentation Methods in Digital Pathology

    Full text link
    In the cancer diagnosis pipeline, digital pathology plays an instrumental role in the identification, staging, and grading of malignant areas on biopsy tissue specimens. High resolution histology images are subject to high variance in appearance, sourcing either from the acquisition devices or the H\&E staining process. Nuclei segmentation is an important task, as it detects the nuclei cells over background tissue and gives rise to the topology, size, and count of nuclei which are determinant factors for cancer detection. Yet, it is a fairly time consuming task for pathologists, with reportedly high subjectivity. Computer Aided Diagnosis (CAD) tools empowered by modern Artificial Intelligence (AI) models enable the automation of nuclei segmentation. This can reduce the subjectivity in analysis and reading time. This paper provides an extensive review, beginning from earlier works use traditional image processing techniques and reaching up to modern approaches following the Deep Learning (DL) paradigm. Our review also focuses on the weak supervision aspect of the problem, motivated by the fact that annotated data is scarce. At the end, the advantages of different models and types of supervision are thoroughly discussed. Furthermore, we try to extrapolate and envision how future research lines will potentially be, so as to minimize the need for labeled data while maintaining high performance. Future methods should emphasize efficient and explainable models with a transparent underlying process so that physicians can trust their output.Comment: 47 pages, 27 figures, 9 table

    Computer aided diagnosis algorithms for digital microscopy

    Get PDF
    Automatic analysis and information extraction from an image is still a highly chal- lenging research problem in the computer vision area, attempting to describe the image content with computational and mathematical techniques. Moreover the in- formation extracted from the image should be meaningful and as most discrimi- natory as possible, since it will be used to categorize its content according to the analysed problem. In the Medical Imaging domain this issue is even more felt because many important decisions that affect the patient care, depend on the use- fulness of the information extracted from the image. Manage medical image is even more complicated not only due to the importance of the problem, but also because it needs a fair amount of prior medical knowledge to be able to represent with data the visual information to which pathologist refer. Today medical decisions that impact patient care rely on the results of laboratory tests to a greater extent than ever before, due to the marked expansion in the number and complexity of offered tests. These developments promise to improve the care of patients, but the more increase the number and complexity of the tests, the more increases the possibility to misapply and misinterpret the test themselves, leading to inappropriate diagnosis and therapies. Moreover, with the increased number of tests also the amount of data to be analysed increases, forcing pathologists to devote much time to the analysis of the tests themselves rather than to patient care and the prescription of the right therapy, especially considering that most of the tests performed are just check up tests and most of the analysed samples come from healthy patients. Then, a quantitative evaluation of medical images is really essential to overcome uncertainty and subjectivity, but also to greatly reduce the amount of data and the timing for the analysis. In the last few years, many computer assisted diagno- sis systems have been developed, attempting to mimic pathologists by extracting features from the images. Image analysis involves complex algorithms to identify and characterize cells or tissues using image pattern recognition technology. This thesis addresses the main problems associated to the digital microscopy analysis in histology and haematology diagnosis, with the development of algorithms for the extraction of useful information from different digital images, but able to distinguish different biological structures in the images themselves. The proposed methods not only aim to improve the degree of accuracy of the analysis, and reducing time, if used as the only means of diagnoses, but also they can be used as intermediate tools for skimming the number of samples to be analysed directly from the pathologist, or as double check systems to verify the correct results of the automated facilities used today

    Computer aided diagnosis algorithms for digital microscopy

    Get PDF
    Automatic analysis and information extraction from an image is still a highly chal- lenging research problem in the computer vision area, attempting to describe the image content with computational and mathematical techniques. Moreover the in- formation extracted from the image should be meaningful and as most discrimi- natory as possible, since it will be used to categorize its content according to the analysed problem. In the Medical Imaging domain this issue is even more felt because many important decisions that affect the patient care, depend on the use- fulness of the information extracted from the image. Manage medical image is even more complicated not only due to the importance of the problem, but also because it needs a fair amount of prior medical knowledge to be able to represent with data the visual information to which pathologist refer. Today medical decisions that impact patient care rely on the results of laboratory tests to a greater extent than ever before, due to the marked expansion in the number and complexity of offered tests. These developments promise to improve the care of patients, but the more increase the number and complexity of the tests, the more increases the possibility to misapply and misinterpret the test themselves, leading to inappropriate diagnosis and therapies. Moreover, with the increased number of tests also the amount of data to be analysed increases, forcing pathologists to devote much time to the analysis of the tests themselves rather than to patient care and the prescription of the right therapy, especially considering that most of the tests performed are just check up tests and most of the analysed samples come from healthy patients. Then, a quantitative evaluation of medical images is really essential to overcome uncertainty and subjectivity, but also to greatly reduce the amount of data and the timing for the analysis. In the last few years, many computer assisted diagno- sis systems have been developed, attempting to mimic pathologists by extracting features from the images. Image analysis involves complex algorithms to identify and characterize cells or tissues using image pattern recognition technology. This thesis addresses the main problems associated to the digital microscopy analysis in histology and haematology diagnosis, with the development of algorithms for the extraction of useful information from different digital images, but able to distinguish different biological structures in the images themselves. The proposed methods not only aim to improve the degree of accuracy of the analysis, and reducing time, if used as the only means of diagnoses, but also they can be used as intermediate tools for skimming the number of samples to be analysed directly from the pathologist, or as double check systems to verify the correct results of the automated facilities used today

    Structures in High-Dimensional Data: Intrinsic Dimension and Cluster Analysis

    Get PDF
    With today's improved measurement and data storing technologies it has become common to collect data in search for hypotheses instead of for testing hypotheses---to do exploratory data analysis. Finding patterns and structures in data is the main goal. This thesis deals with two kinds of structures that can convey relationships between different parts of data in a high-dimensional space: manifolds and clusters. They are in a way opposites of each other: a manifold structure shows that it is plausible to connect two distant points through the manifold, a clustering shows that it is plausible to separate two nearby points by assigning them to different clusters. But clusters and manifolds can also be the same: each cluster can be a manifold of its own.The first paper in this thesis concerns one specific aspect of a manifold structure, namely its dimension, also called the intrinsic dimension of the data. A novel estimator of intrinsic dimension, taking advantage of ``the curse of dimensionality'', is proposed and evaluated. It is shown that it has in general less bias than estimators from the literature and can therefore better distinguish manifolds with different dimensions.The second and third paper in this thesis concern cluster analysis of data generated by flow cytometry---a high-throughput single-cell measurement technology. In this area, clustering is performed routinely by manual assignment of data in two-dimensional plots, to identify cell populations. It is a tedious and subjective task, especially since data often has four, eight, twelve or even more dimensions, and the analysts need to decide which two dimensions to look at together, and in which order.In the second paper of the thesis a new pipeline for automated cell population identification is proposed, which can process multiple flow cytometry samples in parallel using a hierarchical model that shares information between the clusterings of the samples, thus making corresponding clusters in different samples similar while allowing for variation in cluster location and shape.In the third and final paper of the thesis, statistical tests for unimodality are investigated as a tool for quality control of automated cell population identification algorithms. It is shown that the different tests have different interpretations of unimodality and thus accept different kinds of clusters as sufficiently close to unimodal
    corecore