600 research outputs found

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

    Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces

    Get PDF
    This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy

    Bibliographie

    Get PDF

    Analysis of Biochemical Reaction Networks using Tropical and Polyhedral Geometry Methods

    Get PDF
    The field of systems biology makes an attempt to realise various biological functions and processes as the emergent properties of the underlying biochemical network model. The area of computational systems biology deals with the computational methods to compute such properties. In this context, the thesis primarily discusses novel computational methods to compute the emergent properties as well as to recognize the essence in complex network models. The computational methods described in the thesis are based on the computer algebra techniques, namely tropical geometry and extreme currents. Tropical geometry is based on ideas of dominance of monomials appearing in a system of differential equations, which are often used to describe the dynamics of the network model. In such differential equation based models, tropical geometry deals with identification of the metastable regimes, defined as low dimensional regions of the phase space close to which the dynamics is much slower compared to the rest of the phase space. The application of such properties in model reduction and symbolic dynamics are demonstrated in the network models obtained from a public database namely Biomodels. Extreme currents are limiting edges of the convex polyhedrons describing the admissible fluxes in biochemical networks, which are helpful to decompose a biochemical network into a set of irreducible pathways. The pathways are shown to be associated with given clinical outcomes thereby providing some mechanistic insights associated with the clinical phenotypes. Similar to the tropical geometry, the method based on extreme currents is evaluated on the network models derived from a public database namely KEGG. Therefore, this thesis makes an attempt to explain the emergent properties of the network model by determining extreme currents or metastable regimes. Additionally, their applicability in the real world network models are discussed

    Physics‐constrained non‐Gaussian probabilistic learning on manifolds

    Get PDF
    International audienceAn extension of the probabilistic learning on manifolds (PLoM), recently introduced by the authors, has been presented: In addition to the initial data set given for performing the probabilistic learning, constraints are given, which correspond to statistics of experiments or of physical models. We consider a non-Gaussian random vector whose unknown probability distribution has to satisfy constraints. The method consists in constructing a generator using the PLoM and the classical Kullback-Leibler minimum cross-entropy principle. The resulting optimization problem is reformulated using Lagrange multipliers associated with the constraints. The optimal solution of the Lagrange multipliers is computed using an efficient iterative algorithm. At each iteration, the Markov chainMonte Carlo algorithm developed for the PLoM is used, consisting in solving an Itô stochastic differential equation that is projected on a diffusion-maps basis. The method and the algorithm are efficient and allow the construction ofprobabilistic models for high-dimensional problems from small initial data sets and for which an arbitrary number of constraints are specified. The first application is sufficiently simple in order to be easily reproduced. The second one is relative to a stochastic elliptic boundary value problem in high dimension

    Bibliographie

    Get PDF

    Probabilistic Learning on Manifolds

    Full text link
    This paper presents mathematical results in support of the methodology of the probabilistic learning on manifolds (PLoM) recently introduced by the authors, which has been used with success for analyzing complex engineering systems. The PLoM considers a given initial dataset constituted of a small number of points given in an Euclidean space, which are interpreted as independent realizations of a vector-valued random variable for which its non-Gaussian probability measure is unknown but is, \textit{a priori}, concentrated in an unknown subset of the Euclidean space. The objective is to construct a learned dataset constituted of additional realizations that allow the evaluation of converged statistics. A transport of the probability measure estimated with the initial dataset is done through a linear transformation constructed using a reduced-order diffusion-maps basis. In this paper, it is proven that this transported measure is a marginal distribution of the invariant measure of a reduced-order It\^o stochastic differential equation that corresponds to a dissipative Hamiltonian dynamical system. This construction allows for preserving the concentration of the probability measure. This property is shown by analyzing a distance between the random matrix constructed with the PLoM and the matrix representing the initial dataset, as a function of the dimension of the basis. It is further proven that this distance has a minimum for a dimension of the reduced-order diffusion-maps basis that is strictly smaller than the number of points in the initial dataset. Finally, a brief numerical application illustrates the mathematical results.Comment: 41 pages, 4 figure
    corecore