7 research outputs found

    Flexible estimation of temporal point processes and graphs

    Get PDF
    Handling complex data types with spatial structures, temporal dependencies, or discrete values, is generally a challenge in statistics and machine learning. In the recent years, there has been an increasing need of methodological and theoretical work to analyse non-standard data types, for instance, data collected on protein structures, genes interactions, social networks or physical sensors. In this thesis, I will propose a methodology and provide theoretical guarantees for analysing two general types of discrete data emerging from interactive phenomena, namely temporal point processes and graphs. On the one hand, temporal point processes are stochastic processes used to model event data, i.e., data that comes as discrete points in time or space where some phenomenon occurs. Some of the most successful applications of these discrete processes include online messages, financial transactions, earthquake strikes, and neuronal spikes. The popularity of these processes notably comes from their ability to model unobserved interactions and dependencies between temporally and spatially distant events. However, statistical methods for point processes generally rely on estimating a latent, unobserved, stochastic intensity process. In this context, designing flexible models and consistent estimation methods is often a challenging task. On the other hand, graphs are structures made of nodes (or agents) and edges (or links), where an edge represents an interaction or relationship between two nodes. Graphs are ubiquitous to model real-world social, transport, and mobility networks, where edges can correspond to virtual exchanges, physical connections between places, or migrations across geographical areas. Besides, graphs are used to represent correlations and lead-lag relationships between time series, and local dependence between random objects. Graphs are typical examples of non-Euclidean data, where adequate distance measures, similarity functions, and generative models need to be formalised. In the deep learning community, graphs have become particularly popular within the field of geometric deep learning. Structure and dependence can both be modelled by temporal point processes and graphs, although predominantly, the former act on the temporal domain while the latter conceptualise spatial interactions. Nonetheless, some statistical models combine graphs and point processes in order to account for both spatial and temporal dependencies. For instance, temporal point processes have been used to model the birth times of edges and nodes in temporal graphs. Moreover, some multivariate point processes models have a latent graph parameter governing the pairwise causal relationships between the components of the process. In this thesis, I will notably study such a model, called the Hawkes model, as well as graphs evolving in time. This thesis aims at designing inference methods that provide flexibility in the contexts of temporal point processes and graphs. This manuscript is presented in an integrated format, with four main chapters and two appendices. Chapters 2 and 3 are dedicated to the study of Bayesian nonparametric inference methods in the generalised Hawkes point process model. While Chapter 2 provides theoretical guarantees for existing methods, Chapter 3 also proposes, analyses, and evaluates a novel variational Bayes methodology. The other main chapters introduce and study model-free inference approaches for two estimation problems on graphs, namely spectral methods for the signed graph clustering problem in Chapter 4, and a deep learning algorithm for the network change point detection task on temporal graphs in Chapter 5. Additionally, Chapter 1 provides an introduction and background preliminaries on point processes and graphs. Chapter 6 concludes this thesis with a summary and critical thinking on the works in this manuscript, and proposals for future research. Finally, the appendices contain two supplementary papers. The first one, in Appendix A, initiated after the COVID-19 outbreak in March 2020, is an application of a discrete-time Hawkes model to COVID-related deaths counts during the first wave of the pandemic. The second work, in Appendix B, was conducted during an internship at Amazon Research in 2021, and proposes an explainability method for anomaly detection models acting on multivariate time series

    Multivariate Statistical Machine Learning Methods for Genomic Prediction

    Get PDF
    This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension. The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Classification Modeling for Malaysian Blooming Flower Images Using Neural Networks

    Get PDF
    Image processing is a rapidly growing research area of computer science and remains as a challenging problem within the computer vision fields. For the classification of flower images, the problem is mainly due to the huge similarities in terms of colour and texture. The appearance of the image itself such as variation of lights due to different lighting condition, shadow effect on the object’s surface, size, shape, rotation and position, background clutter, states of blooming or budding may affect the utilized classification techniques. This study aims to develop a classification model for Malaysian blooming flowers using neural network with the back propagation algorithms. The flower image is extracted through Region of Interest (ROI) in which texture and colour are emphasized in this study. In this research, a total of 960 images were extracted from 16 types of flowers. Each ROI was represented by three colour attributes (Hue, Saturation, and Value) and four textures attribute (Contrast, Correlation, Energy and Homogeneity). In training and testing phases, experiments were carried out to observe the classification performance of Neural Networks with duplication of difficult pattern to learn (referred to as DOUBLE) as this could possibly explain as to why some flower images were difficult to learn by classifiers. Results show that the overall performance of Neural Network with DOUBLE is 96.3% while actual data set is 68.3%, and the accuracy obtained from Logistic Regression with actual data set is 60.5%. The Decision Tree classification results indicate that the highest performance obtained by Chi-Squared Automatic Interaction Detection(CHAID) and Exhaustive CHAID (EX-CHAID) is merely 42% with DOUBLE. The findings from this study indicate that Neural Network with DOUBLE data set produces highest performance compared to Logistic Regression and Decision Tree. Therefore, NN has been potential in building Malaysian blooming flower model. Future studies can be focused on increasing the sample size and ROI thus may lead to a higher percentage of accuracy. Nevertheless, the developed flower model can be used as part of the Malaysian Blooming Flower recognition system in the future where the colours and texture are needed in the flower identification process

    Multivariate Statistical Machine Learning Methods for Genomic Prediction

    Get PDF
    This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension. The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool
    corecore