481 research outputs found

    A new definition of mixing and segregation: Three dimensions of a key process variable

    Get PDF
    Although a number of definitions of mixing have been proposed in the literature, no single definition accurately and clearly describes the full range of problems in the field of industrial mixing. An alternate approach is proposed which defines segregation as being composed of three separate dimensions. The first dimension is the intensity of segregation quantified by the normalized concentration variance (CoV); the second dimension is the scale of segregation or clustering; and the last dimension is the exposure or the potential to reduce segregation. The first dimension focuses on the instantaneous concentration variance; the second on the instantaneous length scales in the mixing field; and the third on the driving force for change, i.e. the mixing time scale, or the instantaneous rate of reduction in segregation. With these three dimensions in hand, it is possible to speak more clearly about what is meant by the control of segregation in industrial mixing processes. In this paper, the three dimensions of segregation are presented and defined in the context of previous definitions of mixing, and then applied to a range of industrial mixing problems to test their accuracy and robustness

    Statistical learning of random probability measures

    Get PDF
    The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle

    An application of machine learning to statistical physics: from the phases of quantum control to satisfiability problems

    Full text link
    This dissertation presents a study of machine learning methods with a focus on applications to statistical and condensed matter physics, in particular the problem of quantum state preparation, spin-glass and constraint satisfiability. We will start by introducing the core principles of machine learning such as overfitting, bias-variance tradeoff and the disciplines of supervised, unsupervised and reinforcement learning. This discussion will be set in the context of recent applications of machine learning to statistical physics and condensed matter physics. We then present the problem of quantum state preparation and show how reinforcement learning along with stochastic optimization methods can be applied to identify and define phases of quantum control. Reminiscent of condensed matter physics, the underlying phases of quantum control are identified via a set of order parameters and further detailed in terms of their universal implications for optimal quantum control. In particular, casting the optimal quantum control problem as an optimization problem, we show that it exhibits a generic glassy phase and establish a connection with the fields of spin-glass physics and constraint satisfiability problems. We then demonstrate how unsupervised learning methods can be used to obtain important information about the complexity of the phases described. We end by presenting a novel clustering framework, termed HAL for hierarchical agglomerative learning, which exploits out-of-sample accuracy estimates of machine learning classifiers to perform robust clustering of high-dimensional data. We show applications of HAL to various clustering problems

    Multiscale Molecular Dynamics Simulations of Histidine Kinase Activity

    Get PDF
    Zweikomponentensysteme (TCS), bestehend aus einer Sensorhistidinkinase (HK) und einem Antwortregulationsprotein, sind Schlüsselbausteine in bakteriellen Signaluebertragungsmechanismen. Die Fähigkeit von Bakterien auf eine breite Vielfalt von chemischen und physikalischen Stimuli angemessen zu reagieren ist ausschlaggebend für ihr Überleben. Es ist daher nicht überraschend, dass TCS zu den meistuntersuchten bakteriellen Proteinsystemen gehört. Sensorhistidinkinasen sind typischerweise in die Zellmembran integrierte, homodimere Proteine bestehend aus mehreren Domänen. Reizwahrnehmung an der Sensordomäne von HK löst eine Reihe von großskaligen Konformationsübergangen entlang der Domänen aus. Während sich die strukturellen Eigenschaften von verschiedenen HKs unterscheiden können, erhalten sie alle einen katalytischen ATP-bindenden Kern (CA) und dimerisierende Histidinphosphotransferdomänen (DHp). Während der Signalkaskade nimmt der Kern eine asymmetrische Konformation an, sodass die Kinase an einem der Protomere aktiv ist und die der anderen inaktiv. Das ermöglicht es dem ATP in einer der CA-Domänen seine γ\gamma-Phosphatgruppe an das Histidin der DHp abzugeben. Diese Phosphorylgrouppe wird anschließend an das Antwortregulationsprotein weitergegeben, die eine angemessene Reaktion der Zelle veranlasst. In der vorliegenden Arbeit untersuche ich die Konformationsdynamik des Kinasekerns mithilfe von Molekulardynamiksimulationen (MD). Der Fokus der Arbeit liegt auf zwei verschiedenen HKs: WalK und CpxA. Wegen der Größe der Systeme und den erforderlichen biologischen Zeitskalen, ist es nicht möglich die relevanten Konformationsübergänge in klassischen MD-Simulationen zu berechnen. Um dieses Problem zu umgehen, verwende ich ein strukturbasiertes Modell mit paarweisen harmonischen Potentialen. Diese Näherung erlaubt es, den Übergang zwischen dem inaktiven und den aktiven Zustand mit wesentlich geringerem rechnerischen Aufwand zu untersuchen. Nachdem ich das System mithilfe dieses vereinfachten Modells erkundet habe, benutze ich angereicherte Stichprobenverfahren mit atomistischen Modellen um detailliertere Einsichten in die Dynamik zu gewinnen. Die Ergebnisse in dieser Arbeit legen nahe, dass das Verhalten der einzelnen Unterdomänen des Kinasekerns eng miteinander gekoppelt ist

    Discovering structure without labels

    Get PDF
    The scarcity of labels combined with an abundance of data makes unsupervised learning more attractive than ever. Without annotations, inductive biases must guide the identification of the most salient structure in the data. This thesis contributes to two aspects of unsupervised learning: clustering and dimensionality reduction. The thesis falls into two parts. In the first part, we introduce Mod Shift, a clustering method for point data that uses a distance-based notion of attraction and repulsion to determine the number of clusters and the assignment of points to clusters. It iteratively moves points towards crisp clusters like Mean Shift but also has close ties to the Multicut problem via its loss function. As a result, it connects signed graph partitioning to clustering in Euclidean space. The second part treats dimensionality reduction and, in particular, the prominent neighbor embedding methods UMAP and t-SNE. We analyze the details of UMAP's implementation and find its actual loss function. It differs drastically from the one usually stated. This discrepancy allows us to explain some typical artifacts in UMAP plots, such as the dataset size-dependent tendency to produce overly crisp substructures. Contrary to existing belief, we find that UMAP's high-dimensional similarities are not critical to its success. Based on UMAP's actual loss, we describe its precise connection to the other state-of-the-art visualization method, t-SNE. The key insight is a new, exact relation between the contrastive loss functions negative sampling, employed by UMAP, and noise-contrastive estimation, which has been used to approximate t-SNE. As a result, we explain that UMAP embeddings appear more compact than t-SNE plots due to increased attraction between neighbors. Varying the attraction strength further, we obtain a spectrum of neighbor embedding methods, encompassing both UMAP- and t-SNE-like versions as special cases. Moving from more attraction to more repulsion shifts the focus of the embedding from continuous, global to more discrete and local structure of the data. Finally, we emphasize the link between contrastive neighbor embeddings and self-supervised contrastive learning. We show that different flavors of contrastive losses can work for both of them with few noise samples

    A discrete choice modeling framework for pedestrian walking behavior with application to human tracking in video sequences

    Get PDF
    Intelligent Transportation Systems (ITS) have triggered important research activities in the context of behavioral dynamics. Several new models and simulators for driving and travel behaviors, along with new integrated systems to manage various elements of ITS, have been proposed in the past decades. In this context, less attention has been given to pedestrian modeling and simulation. In 2001, the first international conference on Pedestrian and Evacuation Dynamics took place in Duisburg, Germany, showing the recent, growing interest in pedestrian simulation and modeling in the scientific community. The ability of predicting the movements of pedestrians is valuable indeed in many contexts. Architects are interested in understanding how individuals move into buildings to find out optimality criteria for space design. Transport engineers face the problem of integration of transportation facilities, with particular emphasis on safety issues for pedestrians. Recent tragic events have increased the interest for automatic video surveillance systems, able to monitoring pedestrian flows in public spaces, throwing alarms when abnormal behaviors occur. In this spirit, it is important to define mathematical models based on specific (and context-dependent) behavioral assumptions, tested by means of proper statistical methods. Data collection for pedestrian dynamics is particularly difficult and few models presented in literature have been calibrated and validated on real datasets. Pedestrian behavior can be modelled at various scales. This work addresses the problem of pedestrian walking behavior modeling, interpreting the walking process as a sequence of choices over time. People are assumed to be rational decision makers. They are involved in the process of choosing their next position in the surrounding space, as a function of their kinematic characteristics and reacting to the presence of other individuals. We choose a mathematical framework based on discrete choice analysis, which provides a set of well founded econometric tools to model disaggregate phenomena. The pedestrian model is applied in a computer vision application, namely detection and tracking of pedestrians in video sequences. A methodology to integrate behavioral and image-based information is proposed. The result of this approach is a dynamic detection of the individuals in the video sequence. We do not make a clear cut between detection and tracking, which are rather thought as inter-operating procedures, in order to generate a set of hypothetical pedestrian trajectories, evaluated with the proposed model, exploiting both dynamic and behavioral information. The main advantage applying such methodology is given by the fact that the standard target detection/ recognition step is bypassed, reducing the complexity of the system, with a consistent gain in computational time. On the other hand, the price to pay as a consequence for the simple initialization procedure is the overestimation of the number of targets. In order to reduce the bias in the targets' number estimation, a comparative study between different approaches, based on clustering techniques, is proposed
    corecore