269 research outputs found

    Image analysis for the study of chromatin distribution in cell nuclei with application to cervical cancer screening

    Get PDF

    Machine Learning for Instance Segmentation

    Get PDF
    Volumetric Electron Microscopy images can be used for connectomics, the study of brain connectivity at the cellular level. A prerequisite for this inquiry is the automatic identification of neural cells, which requires machine learning algorithms and in particular efficient image segmentation algorithms. In this thesis, we develop new algorithms for this task. In the first part we provide, for the first time in this field, a method for training a neural network to predict optimal input data for a watershed algorithm. We demonstrate its superior performance compared to other segmentation methods of its category. In the second part, we develop an efficient watershed-based algorithm for weighted graph partitioning, the \emph{Mutex Watershed}, which uses negative edge-weights for the first time. We show that it is intimately related to the multicut and has a cutting edge performance on a connectomics challenge. Our algorithm is currently used by the leaders of two connectomics challenges. Finally, motivated by inpainting neural networks, we create a method to learn the graph weights without any supervision

    Thermodynamic and Structural Phase Behavior of Colloidal and Nanoparticle Systems.

    Full text link
    We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source. Common thermodynamic ensembles can be run in two dimensional or three dimensional triclinic boxes. We developed an efficient scheme for hard particle pressure measurement based on volume perturbation. We demonstrate the effectiveness of low order virial coefficients in describing the compressibility factor of fluids of hard polyhedra. The second virial coefficient is obtained analytically from particle asphericity and can be used to define an effective sphere with similar low-density behavior. Higher-order virial coefficients --- efficiently calculated with Mayer Sampling Monte Carlo --- are used to define an exponential approximant that exhibits the best known semi-analytic characterization of hard polyhedron fluid state functions. We present a general method for the exact calculation of convex polyhedron diffraction form factors that is more easily applied to common shape data structures than the techniques typically presented in literature. A proof of concept user interface illustrates how a researcher might investigate the role of particle form factor in the diffraction patterns of different particles in known structures. We present a square-triangle dodecagonal quasicrystal (DQC) in a binary mixture of nanocrystals (NCs). We demonstrate how the decoration of the square and triangle tiles naturally gives rise to partial matching rules via symmetry breaking in layers perpendicular to the dodecagonal axis. We analyze the geometry of the experimental tiling and, following the ``cut and project'' theory, lift the square and triangle tiling pattern to four dimensional space to perform phason analysis historically applied only in simulation and atomic systems. Hard particle models are unsuccessful at explaining the stability of the binary nanoparticle super lattice. However, with a simple isotropic soft particle model, we are able to demonstrate seeded growth of the experimental structure in simulation. These simulations indicate that the most important stabilizing properties of the short range structure are the size ratio of the particles and an A--B particle attraction.PhDMaterials Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/120906/1/eirrgang_1.pd

    Automated Segmentation of Large 3D Images of Nervous Systems Using a Higher-order Graphical Model

    Get PDF
    This thesis presents a new mathematical model for segmenting volume images. The model is an energy function defined on the state space of all possibilities to remove or preserve splitting faces from an initial over-segmentation of the 3D image into supervoxels. It decomposes into potential functions that are learned automatically from a small amount of empirical training data. The learning is based on features of the distribution of gray values in the volume image and on features of the geometry and topology of the supervoxel segmentation. To be able to extract these features from large 3D images that consist of several billion voxels, a new algorithm is presented that constructs a suitable representation of the geometry and topology of volume segmentations in a block-wise fashion, in log-linear runtime (in the number of voxels) and in parallel, using only a prescribed amount of memory. At the core of this thesis is the optimization problem of finding, for a learned energy function, a segmentation with minimal energy. This optimization problem is difficult because the energy function consists of 3rd and 4th order potential functions that are not submodular. For sufficiently small problems with 10,000 degrees of freedom, it can be solved to global optimality using Mixed Integer Linear Programming. For larger models with 10,000,000 degrees of freedom, an approximate optimizer is proposed and compared to state-of-the-art alternatives. Using these new techniques and a unified data structure for multi-variate data and functions, a complete processing chain for segmenting large volume images, from the restoration of the raw volume image to the visualization of the final segmentation, has been implemented in C++. Results are shown for an application in neuroscience, namely the segmentation of a part of the inner plexiform layer of rabbit retina in a volume image of 2048 x 1792 x 2048 voxels that was acquired by means of Serial Block Face Scanning Electron Microscopy (Denk and Horstmann, 2004) with a resolution of 22nm x 22nm x 30nm. The quality of the automated segmentation as well as the improvement over a simpler model that does not take geometric context into account, are confirmed by a quantitative comparison with the gold standard

    Uma abordagem de agrupamento baseada na técnica de divisão e conquista e floresta de caminhos ótimos

    Get PDF
    Orientador: Alexandre Xavier FalcãoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O agrupamento de dados é um dos principais desafios em problemas de Ciência de Dados. Apesar do seu progresso científico em quase um século de existência, algoritmos de agrupamento ainda falham na identificação de grupos (clusters) naturalmente relacionados com a semântica do problema. Ademais, os avanços das tecnologias de aquisição, comunicação, e armazenamento de dados acrescentam desafios cruciais com o aumento considerável de dados, os quais não são tratados pela maioria das técnicas. Essas questões são endereçadas neste trabalho através da proposta de uma abordagem de divisão e conquista para uma técnica de agrupamento única em encontrar um grupo por domo da função de densidade de probabilidade dos dados --- o algoritmo de agrupamento por floresta de caminhos ótimos (OPF - Optimum-Path Forest). Nesta técnica, amostras são interpretadas como nós de um grafo cujos arcos conectam os kk-vizinhos mais próximos no espaço de características. Os nós são ponderados pela sua densidade de probabilidade e um mapa de conexidade é maximizado de modo que cada máximo da função densidade de probabilidade se torna a raiz de uma árvore de caminhos ótimos (grupo). O melhor valor de kk é estimado por otimização em um intervalo de valores dependente da aplicação. O problema com este método é que um número alto de amostras torna o algoritmo inviável, devido ao espaço de memória necessário para armazenar o grafo e o tempo computacional para encontrar o melhor valor de kk. Visto que as soluções existentes levam a resultados ineficazes, este trabalho revisita o problema através da proposta de uma abordagem de divisão e conquista com dois níveis. No primeiro nível, o conjunto de dados é dividido em subconjuntos (blocos) menores e as amostras pertencentes a cada bloco são agrupadas pelo algoritmo OPF. Em seguida, as amostras representativas de cada grupo (mais especificamente as raízes da floresta de caminhos ótimos) são levadas ao segundo nível, onde elas são agrupadas novamente. Finalmente, os rótulos de grupo obtidos no segundo nível são transferidos para todas as amostras do conjunto de dados através de seus representantes do primeiro nível. Nesta abordagem, todas as amostras, ou pelo menos muitas delas, podem ser usadas no processo de aprendizado não supervisionado, sem afetar a eficácia do agrupamento e, portanto, o procedimento é menos susceptível a perda de informação relevante ao agrupamento. Os resultados mostram agrupamentos satisfatórios em dois cenários, segmentação de imagem e agrupamento de dados arbitrários, tendo como base a comparação com abordagens populares. No primeiro cenário, a abordagem proposta atinge os melhores resultados em todas as bases de imagem testadas. No segundo cenário, os resultados são similares aos obtidos por uma versão otimizada do método original de agrupamento por floresta de caminhos ótimosAbstract: Data clustering is one of the main challenges when solving Data Science problems. Despite its progress over almost one century of research, clustering algorithms still fail in identifying groups naturally related to the semantics of the problem. Moreover, the advances in data acquisition, communication, and storage technologies add crucial challenges with a considerable data increase, which are not handled by most techniques. We address these issues by proposing a divide-and-conquer approach to a clustering technique, which is unique in finding one group per dome of the probability density function of the data --- the Optimum-Path Forest (OPF) clustering algorithm. In the OPF-clustering technique, samples are taken as nodes of a graph whose arcs connect the kk-nearest neighbors in the feature space. The nodes are weighted by their probability density values and a connectivity map is maximized such that each maximum of the probability density function becomes the root of an optimum-path tree (cluster). The best value of kk is estimated by optimization within an application-specific interval of values. The problem with this method is that a high number of samples makes the algorithm prohibitive, due to the required memory space to store the graph and the computational time to obtain the clusters for the best value of kk. Since the existing solutions lead to ineffective results, we decided to revisit the problem by proposing a two-level divide-and-conquer approach. At the first level, the dataset is divided into smaller subsets (blocks) and the samples belonging to each block are grouped by the OPF algorithm. Then, the representative samples (more specifically the roots of the optimum-path forest) are taken to a second level where they are clustered again. Finally, the group labels obtained in the second level are transferred to all samples of the dataset through their representatives of the first level. With this approach, we can use all samples, or at least many samples, in the unsupervised learning process without affecting the grouping performance and, therefore, the procedure is less likely to lose relevant grouping information. We show that our proposal can obtain satisfactory results in two scenarios, image segmentation and the general data clustering problem, in comparison with some popular baselines. In the first scenario, our technique achieves better results than the others in all tested image databases. In the second scenario, it obtains outcomes similar to an optimized version of the traditional OPF-clustering algorithmMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Efficient Algorithms for Large-Scale Image Analysis

    Get PDF
    This work develops highly efficient algorithms for analyzing large images. Applications include object-based change detection and screening. The algorithms are 10-100 times as fast as existing software, sometimes even outperforming FGPA/GPU hardware, because they are designed to suit the computer architecture. This thesis describes the implementation details and the underlying algorithm engineering methodology, so that both may also be applied to other applications

    Designing a scalable dynamic load -balancing algorithm for pipelined single program multiple data applications on a non-dedicated heterogeneous network of workstations

    Get PDF
    Dynamic load balancing strategies have been shown to be the most critical part of an efficient implementation of various applications on large distributed computing systems. The need for dynamic load balancing strategies increases when the underlying hardware is a non-dedicated heterogeneous network of workstations (HNOW). This research focuses on the single program multiple data (SPMD) programming model as it has been extensively used in parallel programming for its simplicity and scalability in terms of computational power and memory size.;This dissertation formally defines and addresses the problem of designing a scalable dynamic load-balancing algorithm for pipelined SPMD applications on non-dedicated HNOW. During this process, the HNOW parameters, SPMD application characteristics, and load-balancing performance parameters are identified.;The dissertation presents a taxonomy that categorizes general load balancing algorithms and a methodology that facilitates creating new algorithms that can harness the HNOW computing power and still preserve the scalability of the SPMD application.;The dissertation devises a new algorithm, DLAH (Dynamic Load-balancing Algorithm for HNOW). DLAH is based on a modified diffusion technique, which incorporates the HNOW parameters. Analytical performance bound for the worst-case scenario of the diffusion technique has been derived.;The dissertation develops and utilizes an HNOW simulation model to conduct extensive simulations. These simulations were used to validate DLAH and compare its performance to related dynamic algorithms. The simulations results show that DLAH algorithm is scalable and performs well for both homogeneous and heterogeneous networks. Detailed sensitivity analysis was conducted to study the effects of key parameters on performance

    Learning-based Segmentation for Connectomics

    Get PDF
    Recent advances in electron microscopy techniques make it possible to acquire highresolution, isotropic volume images of neural circuitry. In connectomics, neuroscientists seek to obtain the circuit diagram involving all neurons and synapses in such a volume image. Mapping neuron connectivity requires tracing each and every neural process through terabytes of image data. Due to the size and complexity of these volume images, fully automated analysis methods are desperately needed. In this thesis, I consider automated, machine learning-based neurite segmentation approaches based on a simultaneous merge decision of adjacent supervoxels. - Given a learned likelihood of merging adjacent supervoxels, Chapter 4 adapts a probabilistic graphical model which ensures that merge decisions are consistent and the surfaces of final segments are closed. This model can be posed as a multicut optimization problem and is solved with the cutting-plane method. In order to scale to large datasets, a fast search for (and good choice of) violated cycle constraints is crucial. Quantitative experiments show that the proposed closed-surface regularization significantly improves segmentation performance. - In Chapter 5, I investigate whether the edge weights of the previous model can be chosen to minimize the loss with respect to non-local segmentation quality measures (e.g. Rand Index). Suitable w are obtained from a structured learning approach. In the Structured Support Vector Machine formulation, a novel fast enumeration scheme is used to find the most violated constraint. Quantitative experiments show that structured learning can improve upon unstructured methods. Furthermore, I introduce a new approximate, hierarchical and blockwise optimization approach for large-scale multicut segmentation. Using this method, high-quality approximate solutions for large problem instances are found quickly. - Chapter 6 introduces another novel approximate scheme for multicut segmentation -- Cut, Glue&Cut -- which is based on the move-making paradigm. First, the graph is recursively partitioned into small regions (cut phase). Then, for any two adjacent regions, alternative cuts of these two regions define possible moves (glue&cut phase). The proposed algorithm finds segmentations that are { as measured by a loss function { as close to the ground-truth as the global optimum found by exact solvers, while being significantly faster than existing methods. - In order to jointly label resulting segments as well as to label the boundaries between segments, Chapter 7 proposes the Asymmetric Multi-way Cut model, a variant of Multi-way Cut. In this new model, within-class cuts are allowed for some labels, while being forbidden for other labels. Qualitative experiments show when such a formulation can be beneficial. In particular, an application to joint neurite and cell organelle labeling in EM volume images is discussed. - Custom software tools that can cope with the large data volumes common in the field of connectomics are a prerequisite for the implementation and evaluation of novel segmentation techniques. Chapter 3 presents version 1.0 of ilastik, a joint effort of multiple researchers. I have co-written its volume viewing component, volumina. ilastik provides an interactive pixel classification work ow on largerthan-RAM datasets as well as a semi-automated segmentation module useful for acquiring gold standard segmentations. Furthermore, I describe new software for dealing with hierarchies of cell complexes as well as for blockwise image processing operations on large datasets. The different segmentation methods presented in this thesis provide a promising direction towards reaching the required reliability as well as the required data throughput necessary for connectomics applications
    • …
    corecore