Search CORE

5 research outputs found

A Bayesian Perspective of Statistical Machine Learning for Big Data

Author: Sambasivan Rajiv
Das Sourish
Sahu Sujit K
Publication venue
Publication date: 15/10/2015
Field of study

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword `learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view -- where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.Comment: 26 pages, 3 figures, Review pape

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Il ruolo della statistica nel mondo dei big data: strumenti ed applicazioni

Author
Publication venue
Publication date
Field of study

Padua Thesis and Dissertation Archive

Applying Bayesian Growth Modeling In Machine Learning For Longitudinal Data

Author: Udomvisawakul Alisa
Publication venue: Scholarship & Creative Works @ Digital UNC
Publication date: 01/05/2021
Field of study

There has been increasing interest in the use of Bayesian growth modeling in machine learning environment to answer the questions relating to the patterns of change in trends of social and human behavior in longitudinal data. It is well understood that machine learning works properly with “big data,” because large sample sizes offer machines the better opportunity to “learn” the pattern/structure of data from a training data set to predict the performance in an unseen testing data set. Unfortunately, not all researchers have access to large samples and there is a lack of methodological research addressing the utility of using machine learning with longitudinal data based on small sample size. Additionally, there is limited methodological research conducted around moderation effect that priors have on other data conditions. Therefore, the purpose of the current study was to understand: (a) the interactive relationship between priors and sample sizes in longitudinal predictive modeling, (b) the interactive relationship between priors and number of waves of data, and (c) the interactive relationship between priors and the proportion of cases in the two levels of a dichotomous time-invariant predictor for Bayesian growth modeling in a machine learning environment. Monte Carlo simulation was adopted to answer assess the above aspects and data were generated based on alumni donation data from a university in the mid-Atlantic region where model parameters were set to mimic “real life” data as closely as possible. Results from the study show that although all main and interaction effects are statistically significant, only main effect of sample size, wave of data, and interaction between waves of data and sample sizes show meaningful effect size. Additionally, given the condition of prior of the study, informative priors did not show any higher prediction accuracy compared to non-informative priors. The reason behind indifferent between choices of informative and non-informative prior associated with model complexity, competition between strong informative and weakly informative prior. This study was one of the first known study to examine Bayesian estimation in the context of machine learning. Results of the current study suggest that capitalizing on the advantages offered jointly by these two modeling approaches shows promise. Although much is still unknown and in need of investigation regarding the conditions under which a combination of Bayesian modeling and machine learning affects prediction accuracy, the current dissertation provides a first step in that direction

University of Northern Colorado

Advances in Molecular Simulation

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Molecular simulations are commonly used in physics, chemistry, biology, material science, engineering, and even medicine. This book provides a wide range of molecular simulation methods and their applications in various fields. It reflects the power of molecular simulation as an effective research tool. We hope that the presented results can provide an impetus for further fruitful studies

Directory of Open Access Books (DOAB)