5 research outputs found

    A Learning-Based EM Clustering for Circular Data with Unknown Number of Clusters

    Get PDF
    Clustering is a method for analyzing grouped data. Circular data were well used in various applications, such as wind directions, departure directions of migrating birds or animals, etc. The expectation & maximization (EM) algorithm on mixtures of von Mises distributions is popularly used for clustering circular data. In general, the EM algorithm is sensitive to initials and not robust to outliers in which it is also necessary to give a number of clusters a priori. In this paper, we consider a learning-based schema for EM, and then propose a learning-based EM algorithm on mixtures of von Mises distributions for clustering grouped circular data. The proposed clustering method is without any initial and robust to outliers with automatically finding the number of clusters. Some numerical and real data sets are used to compare the proposed algorithm with existing methods. Experimental results and comparisons actually demonstrate these good aspects of effectiveness and superiority of the proposed learning-based EM algorithm

    New Foundation in the Sciences: Physics without sweeping infinities under the rug

    Get PDF
    It is widely known among the Frontiers of physics, that “sweeping under the rug” practice has been quite the norm rather than exception. In other words, the leading paradigms have strong tendency to be hailed as the only game in town. For example, renormalization group theory was hailed as cure in order to solve infinity problem in QED theory. For instance, a quote from Richard Feynman goes as follows: “What the three Nobel Prize winners did, in the words of Feynman, was to get rid of the infinities in the calculations. The infinities are still there, but now they can be skirted around . . . We have designed a method for sweeping them under the rug. [1] And Paul Dirac himself also wrote with similar tune: “Hence most physicists are very satisfied with the situation. They say: Quantum electrodynamics is a good theory, and we do not have to worry about it any more. I must say that I am very dissatisfied with the situation, because this so-called good theory does involve neglecting infinities which appear in its equations, neglecting them in an arbitrary way. This is just not sensible mathematics. Sensible mathematics involves neglecting a quantity when it turns out to be small—not neglecting it just because it is infinitely great and you do not want it!”[2] Similarly, dark matter and dark energy were elevated as plausible way to solve the crisis in prevalent Big Bang cosmology. That is why we choose a theme here: New Foundations in the Sciences, in order to emphasize the necessity to introduce a new set of approaches in the Sciences, be it Physics, Cosmology, Consciousness etc

    The 8th International Conference on Time Series and Forecasting

    Get PDF
    The aim of ITISE 2022 is to create a friendly environment that could lead to the establishment or strengthening of scientific collaborations and exchanges among attendees. Therefore, ITISE 2022 is soliciting high-quality original research papers (including significant works-in-progress) on any aspect time series analysis and forecasting, in order to motivating the generation and use of new knowledge, computational techniques and methods on forecasting in a wide range of fields

    Improving the census of open clusters in the Milky Way with data from Gaia

    Get PDF
    For over a century, open clusters have been a key tool for understanding stellar and galactic evolution. Now, thanks to groundbreaking new astrometric and photometric data from the European Space Agency's Gaia satellite, it is possible to study open clusters to never before seen levels of accuracy and precision. In this thesis, I develop and apply new methodologies to improve the census of open clusters with data from Gaia. I focus on using modern, efficient, and statistically rigorous techniques, aiming to maximise the reliability and usefulness of the open cluster census despite the many challenges of working with the billion-star dataset of Gaia. Firstly, I conducted a comparative study of clustering algorithms for retrieving open clusters blindly from Gaia data. I found that a previously untrialed algorithm, HDBSCAN, is the most sensitive algorithm for open cluster recovery. Next, using this methodology, I used Gaia DR3 data to create the largest homogeneous catalogue of open clusters to date, recovering a total of 7167 clusters -- 2387 of which are candidate new objects. I developed an approximate Bayesian neural network for classifying the reliability of the colour-magnitude diagrams of the clusters in the census. Additionally, I used a modification of this network to infer parameters such as the age and extinction of these clusters. Finally, since many of the objects in my catalogue appeared more compatible with moving groups, I measured accurate masses, Jacobi radii, and velocity dispersions for these clusters, thus creating the largest catalogue of these parameters for open clusters to date. Using said parameters, I showed that no more than 5619 of the clusters in my catalogue are compatible with bound open clusters. I used my mass estimates to derive an approximate completeness estimate for the Gaia DR3 open cluster census, finding that the approximate 100% completeness limit depends strongly on cluster mass. The results of this thesis show that it is possible to reliably create a catalogue of open clusters with a single blind search, in addition to measuring parameters for these objects. The methods developed in this thesis will be applicable to future data releases from Gaia and other sources

    Bayesian computation in astronomy: novel methods for parallel and gradient-free inference

    Get PDF
    The goal of this thesis is twofold; introduce the fundamentals of Bayesian inference and computation focusing on astronomical and cosmological applications, and present recent advances in probabilistic computational methods developed by the author that aim to facilitate Bayesian data analysis for the next generation of astronomical observations and theoretical models. The first part of this thesis familiarises the reader with the notion of probability and its relevance for science through the prism of Bayesian reasoning, by introducing the key constituents of the theory and discussing its best practices. The second part includes a pedagogical introduction to the principles of Bayesian computation motivated by the geometric characteristics of probability distributions and followed by a detailed exposition of various methods including Markov chain Monte Carlo (MCMC), Sequential Monte Carlo (SMC) and Nested Sampling (NS). Finally, the third part presents two novel computational methods and their respective software implementations. The first such development is Ensemble Slice Sampling (ESS), a new class of MCMC algorithms that extend the applicability of the standard Slice Sampler by adaptively tuning its only hyperparameter and utilising an ensemble of parallel walkers in order to efficiently handle strong correlations between parameters. The parallel, black–box and gradient-free nature of the method renders it ideal for use in combination with computationally expensive and non–differentiable models often met in astronomy. ESS is implemented in Python in the well–tested and open-source software package called zeus that is specifically designed to tackle the computational challenges posed by modern astronomical and cosmological analyses. In particular, use of the code requires minimal, if any, hand–tuning of hyperparameters while its performance is insensitive to linear correlations and it can scale up to thousands of CPUs without any extra effort. The next contribution includes the introduction of Preconditioned Monte Carlo (PMC), a novel Monte Carlo method for Bayesian inference that facilitates effective sampling of probability distributions with non–trivial geometry. PMC utilises a Normalising Flow (NF) in order to decorrelate the parameters of the distribution and then proceeds by sampling from the preconditioned target distribution using an adaptive SMC scheme. PMC, through its Python implementation pocoMC, achieves excellent sampling performance, including accurate estimation of the model evidence, for highly correlated, non–Gaussian, and multimodal target distributions. Finally, the code is directly parallelisable, manifesting linear scaling up to thousands of CPUs
    corecore