8 research outputs found

    Robust and Efficient Data Clustering with Signal Processing on Graphs

    Get PDF
    Data is pervasive in today's world and has actually been for quite some time. With the increasing volume of data to process, there is a need for faster and at least as accurate techniques than what we already have. In particular, the last decade recorded the effervescence of social networks and ubiquitous sensing (through smartphones and the Internet of Things). These phenomena, including also the progresses in bioinformatics and traffic monitoring, pushed forward the research on graph analysis and called for more efficient techniques. Clustering is an important field of machine learning because it belongs to the unsupervised techniques (i.e., one does not need to possess a ground truth about the data to start learning). With it, one can extract meaningful patterns from large data sources without requiring an expert to annotate a portion of the data, which can be very costly. However, the techniques of clustering designed so far all tend to be computationally demanding and have trouble scaling with the size of today's problems. The emergence of Graph Signal Processing, attempting to apply traditional signal processing techniques on graphs instead of time, provided additional tools for efficient graph analysis. By considering the clustering assignment as a signal lying on the nodes of the graph, one may now apply the tools of GSP to the improvement of graph clustering and more generally data clustering at large. In this thesis, we present several techniques using some of the latest developments of GSP in order to improve the scalability of clustering, while aiming for an accuracy resembling that of Spectral Clustering, a famous graph clustering technique that possess a solid mathematical intuition. On the one hand, we explore the benefits of random signal filtering on a practical and theoretical aspect for the determination of the eigenvectors of the graph Laplacian. In practice, this attempt requires the design of polynomial approximations of the step function for which we provided an accelerated heuristic. We used this series of work in order to reduce the complexity of dynamic graphs clustering, the problem of defining a partition to a graph which is evolving in time at each snapshot. We also used them to propose a fast method for the determination of the subspace generated by the first eigenvectors of any symmetrical matrix. This element is useful for clustering as it serves in Spectral Clustering but it goes beyond that since it also serves in graph visualization (with Laplacian Eigenmaps) and data mining (with Principal Components Projection). On the other hand, we were inspired by the latest works on graph filter localization in order to propose an extremely fast clustering technique. We tried to perform clustering by only using graph filtering and combining the results in order to obtain a partition of the nodes. These different contributions are completed by experiments using both synthetic datasets and real-world problems. Since we think that research should be shared in order to progress, all the experiments made in this thesis are publicly available on my personal Github account

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Computer Science for Continuous Data:Survey, Vision, Theory, and Practice of a Computer Analysis System

    Get PDF
    Building on George Boole's work, Logic provides a rigorous foundation for the powerful tools in Computer Science that underlie nowadays ubiquitous processing of discrete data, such as strings or graphs. Concerning continuous data, already Alan Turing had applied "his" machines to formalize and study the processing of real numbers: an aspect of his oeuvre that we transform from theory to practice.The present essay surveys the state of the art and envisions the future of Computer Science for continuous data: natively, beyond brute-force discretization, based on and guided by and extending classical discrete Computer Science, as bridge between Pure and Applied Mathematics

    Handbook of Mathematical Geosciences

    Get PDF
    This Open Access handbook published at the IAMG's 50th anniversary, presents a compilation of invited path-breaking research contributions by award-winning geoscientists who have been instrumental in shaping the IAMG. It contains 45 chapters that are categorized broadly into five parts (i) theory, (ii) general applications, (iii) exploration and resource estimation, (iv) reviews, and (v) reminiscences covering related topics like mathematical geosciences, mathematical morphology, geostatistics, fractals and multifractals, spatial statistics, multipoint geostatistics, compositional data analysis, informatics, geocomputation, numerical methods, and chaos theory in the geosciences

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data
    corecore