24 research outputs found

    The Frequency and Mass-Ratio Distribution of Binaries in Clusters I: Description of the method and application to M67

    Full text link
    We present a new method for probabilistic generative modelling of stellar colour-magnitude diagrams (CMDs) to infer the frequency of binary stars and their mass-ratio distribution. The method invokes a mixture model to account for overlapping populations of single stars, binaries and outliers in the CMD. We apply the model to Gaia observations of the old open cluster, M67, and find a frequency fB(q>0.5)=0.258±0.019f_B(q > 0.5) = 0.258 \pm 0.019 for binary stars with mass ratio greater than 0.5. The form of the mass-ratio distribution function rises towards higher mass ratios for q>0.3q > 0.3.Comment: 9 pages, 6 figures, accepted by MNRA

    Data compression and computational efficiency

    Get PDF
    In this thesis we seek to make advances towards the goal of effective learned compression. This entails using machine learning models as the core constituent of compression algorithms, rather than hand-crafted components. To that end, we first describe a new method for lossless compression. This method allows a class of existing machine learning models – latent variable models – to be turned into lossless compressors. Thus many future advancements in the field of latent variable modelling can be leveraged in the field of lossless compression. We demonstrate a proof-of-concept of this method on image compression. Further, we show that it can scale to very large models, and image compression problems which closely resemble the real-world use cases that we seek to tackle. The use of the above compression method relies on executing a latent variable model. Since these models can be large in size and slow to run, we consider how to mitigate these computational costs. We show that by implementing much of the models using binary precision parameters, rather than floating-point precision, we can still achieve reasonable modelling performance but requiring a fraction of the storage space and execution time. Lastly, we consider how learned compression can be applied to 3D scene data - a data medium increasing in prevalence, and which can require a significant amount of space. A recently developed class of machine learning models - scene representation functions - has demonstrated good results on modelling such 3D scene data. We show that by compressing these representation functions themselves we can achieve good scene reconstruction with a very small model size

    Leveraging 2D data to learn textured 3D mesh generation

    Get PDF
    Numerous methods have been proposed for probabilistic generative modelling of 3D objects. However, none of these is able to produce textured objects, which renders them of limited use for practical tasks. In this work, we present the first generative model of textured 3D meshes. Training such a model would traditionally require a large dataset of textured meshes, but unfortunately, existing datasets of meshes lack detailed textures. We instead propose a new training methodology that allows learning from collections of 2D images without any 3D information. To do so, we train our model to explain a distribution of images by modelling each image as a 3D foreground object placed in front of a 2D background. Thus, it learns to generate meshes that when rendered, produce images similar to those in its training set. A well-known problem when generating meshes with deep networks is the emergence of self-intersections, which are problematic for many use-cases. As a second contribution we therefore introduce a new generation process for 3D meshes that guarantees no self-intersections arise, based on the physical intuition that faces should push one another out of the way as they move. We conduct extensive experiments on our approach, reporting quantitative and qualitative results on both synthetic data and natural images. These show our method successfully learns to generate plausible and diverse textured 3D samples for five challenging object classes

    Zero resource speech synthesis using transcripts derived from perceptual acoustic units

    Full text link
    Zerospeech synthesis is the task of building vocabulary independent speech synthesis systems, where transcriptions are not available for training data. It is, therefore, necessary to convert training data into a sequence of fundamental acoustic units that can be used for synthesis during the test. This paper attempts to discover, and model perceptual acoustic units consisting of steady-state, and transient regions in speech. The transients roughly correspond to CV, VC units, while the steady-state corresponds to sonorants and fricatives. The speech signal is first preprocessed by segmenting the same into CVC-like units using a short-term energy-like contour. These CVC segments are clustered using a connected components-based graph clustering technique. The clustered CVC segments are initialized such that the onset (CV) and decays (VC) correspond to transients, and the rhyme corresponds to steady-states. Following this initialization, the units are allowed to re-organise on the continuous speech into a final set of AUs in an HMM-GMM framework. AU sequences thus obtained are used to train synthesis models. The performance of the proposed approach is evaluated on the Zerospeech 2019 challenge database. Subjective and objective scores show that reasonably good quality synthesis with low bit rate encoding can be achieved using the proposed AUs

    Applications of generative probabilistic models for information recovery in 1H NMR metabolomics

    Get PDF
    Metabolomics is a well-established approach for investigation of the metabolic state of an organism usually conducted via high-throughput methods and focusing on quantification and identification of small molecules. A popular analytical technique used in metabolomics is 1H NMR spectroscopy. The data obtained in NMR experiments contains a wealth of information on metabolites in a sample and their chemical structure. To help uncover this information and find patterns in the data, statistical and machine learning methods must be applied. The work presented in this thesis demonstrates applications of probabilistic generative modelling, with particular focus in Latent Dirichlet Allocation (LDA), as a tool for information recovery in 1H NMR data sets obtained in metabolomics research. LDA is an example of a topic model. The model is based on a generative process which can be thought of as a source of the data. Topics are latent variables which select co-occurring metabolites in a sample. In turn, NMR spectra can be represented in the latent variable space. We present applications of LDA in three scenarios. (1) How LDA can be used to simulate NMR spectra; such spectra demonstrate that LDA is a valid model for NMR data and also provide synthetic data for evaluation of statistical models. (2) Unsupervised learning with LDA to uncover patterns in the NMR data; we use synthetics and real NMR data with knowledge of key biomarkers from a prior study and conclude that LDA was successful in the recovery of useful topics. (3) Supervised learning with SLDA and combined latent variable models with ElasticNet regression where we investigate NMR data from The Multi-Ethnic Study of Atherosclerosis (MESA) study which is paired with clinical variables such as BMI. The goal was to examine if topics can be informative about clinical outcomes.Open Acces

    Wedgelet Enhanced Appearance Models

    Get PDF
    corecore