529 research outputs found

    Unsupervised Machine Learning Algorithms to Characterize Single-Cell Heterogeneity and Perturbation Response

    Get PDF
    Recent advances in microfluidic technologies facilitate the measurement of gene expression, DNA accessibility, protein content, or genomic mutations at unprecedented scale. The challenges imposed by the scale of these datasets are further exacerbated by non-linearity in molecular effects, complex interdependencies between features, and a lack of understanding of both data generating processes and sources of technical and biological noise. As a result, analysis of modern single-cell data requires the development of specialized computational tools. One solution to these problems is the use of manifold learning, a sub-field of unsupervised machine learning that seeks to model data geometry using a simplifying assumption that the underlying system is continuous and locally Euclidean. In this dissertation, I show how manifold learning is naturally suited for single-cell analysis and introduce three related algorithms for characterization of single-cell heterogeneity and perturbation response. I first describe Vertex Frequency Clustering, an algorithm that identifies groups of cells with similar responses to an experiment perturbation by analyzing the spectral representation of condition labels expressed as signals over a cell similarity graph. Next, I introduce MELD, an algorithm that expands on these ideas to estimate the density of each experimental sample over the graph to quantify the effect of an experimental perturbation at single cell resolution. Finally, I describe a neural network for archetypal analysis that represents the data as continuously distributed between a set of extrema. Each of these algorithms are demonstrated on a combination of real and synthetic datasets and are benchmarked against state-of-the-art algorithms

    Complexity in Developmental Systems: Toward an Integrated Understanding of Organ Formation

    Get PDF
    During animal development, embryonic cells assemble into intricately structured organs by working together in organized groups capable of implementing tightly coordinated collective behaviors, including patterning, morphogenesis and migration. Although many of the molecular components and basic mechanisms underlying such collective phenomena are known, the complexity emerging from their interplay still represents a major challenge for developmental biology. Here, we first clarify the nature of this challenge and outline three key strategies for addressing it: precision perturbation, synthetic developmental biology, and data-driven inference. We then present the results of our effort to develop a set of tools rooted in two of these strategies and to apply them to uncover new mechanisms and principles underlying the coordination of collective cell behaviors during organogenesis, using the zebrafish posterior lateral line primordium as a model system. To enable precision perturbation of migration and morphogenesis, we sought to adapt optogenetic tools to control chemokine and actin signaling. This endeavor proved far from trivial and we were ultimately unable to derive functional optogenetic constructs. However, our work toward this goal led to a useful new way of perturbing cortical contractility, which in turn revealed a potential role for cell surface tension in lateral line organogenesis. Independently, we hypothesized that the lateral line primordium might employ plithotaxis to coordinate organ formation with collective migration. We tested this hypothesis using a novel optical tool that allows targeted arrest of cell migration, finding that contrary to previous assumptions plithotaxis does not substantially contribute to primordium guidance. Finally, we developed a computational framework for automated single-cell segmentation, latent feature extraction and quantitative analysis of cellular architecture. We identified the key factors defining shape heterogeneity across primordium cells and went on to use this shape space as a reference for mapping the results of multiple experiments into a quantitative atlas of primordium cell architecture. We also propose a number of data-driven approaches to help bridge the gap from big data to mechanistic models. Overall, this study presents several conceptual and methodological advances toward an integrated understanding of complex multi-cellular systems

    Development of statistical methodologies applied to anthropometric data oriented towards the ergonomic design of products

    Get PDF
    Ergonomics is the scientific discipline that studies the interactions between human beings and the elements of a system and presents multiple applications in areas such as clothing and footwear design or both working and household environments. In each of these sectors, knowing the anthropometric dimensions of the current target population is fundamental to ensure that products suit as well as possible most of the users who make up the population. Anthropometry refers to the study of the measurements and dimensions of the human body and it is considered a very important branch of Ergonomics because its considerable influence on the ergonomic design of products. Human body measurements have usually been taken using rules, calipers or measuring tapes. These procedures are simple and cheap to carry out. However, they have one major drawback: the body measurements obtained and consequently, the human shape information, is imprecise and inaccurate. Furthermore, they always require interaction with real subjects, which increases the measure time and data collecting. The development of new three-dimensional (3D) scanning techniques has represented a huge step forward in the way of obtaining anthropometric data. This technology allows 3D images of human shape to be captured and at the same time, generates highly detailed and reproducible anthropometric measurements. The great potential of these new scanning systems for the digitalization of human body has contributed to promoting new anthropometric studies in several countries, such as United Kingdom, Australia, Germany, France or USA, in order to acquire accurate anthropometric data of their current population. In this context, in 2006 the Spanish Ministry of Health commissioned a 3D anthropometric survey of the Spanish female population, following the agreement signed by the Ministry itself with the Spanish associations and companies of manufacturing, distribution, fashion design and knitted sectors. A sample of 10415 Spanish females from 12 to 70 years old, randomly selected from the official Postcode Address File, was measured. The two main objectives of this study, which was conducted by the Biomechanics Institute of Valencia, were the following: on the one hand, to characterize the shape and body dimensions of the current Spanish women population to develop a standard sizing system that could be used by all clothing designers. On the other hand, to promote a healthy image of beauty through the representation of suited mannequins. In order to tackle both objectives, Statistics plays an essential role. Thus, the statistical methodologies presented in this PhD work have been applied to the database obtained from the Spanish anthropometric study. Clothing sizing systems classify the population into homogeneous groups (size groups) based on some key anthropometric dimensions. All members of the same group are similar in body shape and size, so they can wear the same garment. In addition, members of different groups are very different with respect to their body dimensions. An efficient and optimal sizing system aims at accommodating as large a percentage of the population as possible, in the optimum number of size groups that better describes the shape variability of the population. Besides, the garment fit for the accommodated individuals must be as good as possible. A very valuable reference related to sizing systems is the book Sizing in clothing: Developing effective sizing systems for ready-to-wear clothing, by Susan Ashdown. Each clothing size is defined from a person whose body measurements are located toward the central value for each of the dimensions considered in the analysis. The central person, which is considered as the size representative (the size prototype), becomes the basic pattern from which the clothing line in the same size is designed. Clustering is the statistical tool that divides a set of individuals in groups (clusters), in such a way that subjects of the same cluster are more similar to each other than to those in other groups. In addition, clustering defines each group by means of a representative individual. Therefore, it arises in a natural way the idea of using clustering to try to define an efficient sizing system. Specifically, four of the methodologies presented in this PhD thesis aimed at segmenting the population into optimal sizes, use different clustering methods. The first one, called trimowa, has been published in Expert Systems with Applications. It is based on using an especially defined distance to examine differences between women regarding their body measurements. The second and third ones (called biclustAnthropom and TDDclust, respectively) will soon be submitted in the same paper. BiclustAnthropom adapts to the field of Anthropometry a clustering method addressed in the specific case of gene expression data. Moreover, TDDclust uses the concept of statistical depth for grouping according to the most central (deep) observation in each size. As mentioned, current sizing systems are based on using an appropriate set of anthropometric dimensions, so clustering is carried out in the Euclidean space. In the three previous proposals, we have always worked in this way. Instead, in the fourth and last approach, called kmeansProcrustes, a clustering procedure is proposed for grouping taking into account the women shape, which is represented by a set of anatomical markers (landmarks). For this purpose, the statistical shape analysis will be fundamental. This contribution has been submitted for publication. A sizing system is intended to cover the so-called standard population, discarding the individuals with extreme sizes (both large and small). In mathematical language, these individuals can be considered outliers. An outlier is an observation point that is distant from other observations. In our case, a person with extreme anthopometric measurements would be considered as a statistical outlier. Clothing companies usually design garments for the standard sizes so that their market share is optimal. Nevertheless, with their foreign expansion, a lot of brands are spreading their collection and they already have a special sizes section. In last years, Internet shopping has been an alternative for consumers with extreme sizes looking for clothes that follow trends. The custom-made fabrication is other possibility with the advantage of making garments according to the customers' preferences. The four aforementioned methodologies (trimowa, biclustAnthropom, TDDclust and kmeansProcrustes) have been adapted to only accommodate the standard population. Once a particular garment has been designed, the assessing and analysis of fit is performed using one or more fit models. The fit model represents the body dimensions selected by each company to define the proportional relationships needed to achieve the fit the company has determined. The definition of an efficient sizing system relies heavily on the accuracy and representativeness of the fit models regarding the population to which it is addressed. In this PhD work, a statistical approach is proposed to identify representative fit models. It is based on another clustering method originally developed for grouping gene expression data. This method, called hipamAnthropom, has been published in Decision Support Systems. From well-defined fit models and prototypes, representative and accurate mannequins of the population can be made. Unlike clothing design, where representative cases correspond with central individuals, in the design of working and household environments, the variability of human shape is described by extreme individuals, which are those that have the largest or smallest values (or extreme combinations) in the dimensions involved in the study. This is often referred to as the accommodation problem. A very interesting reference in this area is the book entitled Guidelines for Using Anthropometric Data in Product Design, published by The Human Factors and Ergonomics Society. The idea behind this way of proceeding is that if a product fits extreme observations, it will also fit the others (less extreme). To that end, in this PhD thesis we propose two methodological contributions based on the statistical archetypal analysis. An archetype in Statistics is an extreme individual that is obtained as a convex combination of other subjects of the sample. The first of these methodologies has been published in Computers and Industrial Engineering, whereas the second one has been submitted for publication. The outline of this PhD report is as follows: Chapter 1 reviews the state of the art of Ergonomics and Anthropometry and introduces the anthropometric survey of the Spanish female population. Chapter 2 presents the trimowa, biclustAnthropom and hipamAnthropom methodologies. In Chapter 3 the kmeansProcrustes proposal is detailed. The TDDclust methodology is explained in Chapter 4. Chapter 5 presents the two methodologies related to the archetypal analysis. Since all these contributions have been programmed in the statistical software R, Chapter 6 presents the Anthropometry R package, that brings together all the algorithms associated with each approach. In this way, from Chapter 2 to Chapter 6 all the methodologies and results included in this PhD thesis are presented. At last, Chapter 7 provides the most important conclusions

    An image-based data-driven analysis of cellular architecture in a developing tissue

    Full text link
    Quantitative microscopy is becoming increasingly crucial in efforts to disentangle the complexity of organogenesis, yet adoption of the potent new toolbox provided by modern data science has been slow, primarily because it is often not directly applicable to developmental imaging data. We tackle this issue with a newly developed algorithm that uses point cloud-based morphometry to unpack the rich information encoded in 3D image data into a straightforward numerical representation. This enabled us to employ data science tools, including machine learning, to analyze and integrate cell morphology, intracellular organization, gene expression and annotated contextual knowledge. We apply these techniques to construct and explore a quantitative atlas of cellular architecture for the zebrafish posterior lateral line primordium, an experimentally tractable model of complex self-organized organogenesis. In doing so, we are able to retrieve both previously established and novel biologically relevant patterns, demonstrating the potential of our data-driven approach

    Cell Type-specific Analysis of Human Interactome and Transcriptome

    Get PDF
    Cells are the fundamental building block of complex tissues in higher-order organisms. These cells take different forms and shapes to perform a broad range of functions. What makes a cell uniquely eligible to perform a task, however, is not well-understood; neither is the defining characteristic that groups similar cells together to constitute a cell type. Even for known cell types, underlying pathways that mediate cell type-specific functionality are not readily available. These functions, in turn, contribute to cell type-specific susceptibility in various disorders
    corecore