264,669 research outputs found

    A Note on Combining Machine Learning with Statistical Modeling for Financial Data Analysis

    Get PDF
    This note revisits the ideas of the so-called semiparametric methods that we consider to be very useful when applying machine learning in insurance. To this aim, we first recall the main essence of semiparametrics like the mixing of global and local estimation and the combining of explicit modeling with purely data adaptive inference. Then, we discuss stepwise approaches with different ways of integrating machine learning. Furthermore, for the modeling of prior knowledge, we introduce classes of distribution families for financial data. The proposed procedures are illustrated with data on stock returns for five companies of the Spanish value-weighted index IBEX35.The authors thank the Institute and Faculty of Actuaries in the U.K. for funding their research through the grant “Minimizing Longevity and Investment Risk while Optimizing Future Pension Plans” and the Spanish Ministerio de Economía y Competitividad, Project ECO2016-76203-C2-1-P, for partial support of this work

    Distributed Parameter Estimation via Pseudo-likelihood

    Full text link
    Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on combining local estimators defined by pseudo-likelihood components, encompassing a number of combination methods, and provide both theoretical and experimental analysis. We show that simple linear combination or max-voting methods, when combined with second-order information, are statistically competitive with more advanced and costly joint optimization. Our algorithms have many attractive properties including low communication and computational cost and "any-time" behavior.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

    Combining statistical and machine learning methods to explore German students’ attitudes towards ICT in PISA

    Get PDF
    In our age of big data and growing computational power, versatility in data analysis is important. This study presents a flexible way to combine statistics and machine learning for data analysis of a large-scale educational survey. The authors used statistical and machine learning methods to explore German students’ attitudes towards information and communication technology (ICT) in relation to mathematical and scientific literacy measured by the Programme for International Student Assessment (PISA) in 2015 and 2018. Implementations of the random forest (RF) algorithm were applied to impute missing data and to predict students’ proficiency levels in mathematics and science. Hierarchical linear models (HLM) were built to explore relationships between attitudes towards ICT and mathematical and scientific literacy with the focus on the nested structure of the data. ICT autonomy was an important variable in RF models, and associations between this attitude and literacy scores in HLM were significant and positive, while for other ICT attitudes the associations were negative (ICT in social interaction) or non-significant (ICT competence and ICT interest). The need for further research on ICT autonomy is discussed, and benefits of combining statistical and machine learning approaches are outlined

    Positive Definite Kernels in Machine Learning

    Full text link
    This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions {k(x,⋅),x∈X}\{k(x,\cdot),x\in\mathcal{X}\} associated with a kernel kk defined on a space X\mathcal{X}. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.Comment: draft. corrected a typo in figure

    Bayesian networks to explain the effect of label information on product perception

    Get PDF
    Interdisciplinary approaches in food research require new methods in data analysis that are able to deal with complexity and facilitate the communication among model users. Four parallel full factorial within-subject designs were performed to examine the relative contribution to consumer product evaluation of intrinsic product properties and information given on packaging. Detailed experimental designs and results obtained from analyses of variance were published [1]. The data was analyzed again with the machine learning modelling technique Bayesian networks. The objective of the current paper is to explain basic features of this technique and its advantages over the standard statistical approach regarding handling of complexity and communication of results. With analysis of variance, visualization and interpretation of main effects and interactions effects becomes difficult in complex systems. The Bayesian network model offers the possibility to formally incorporate (domain) experts knowledge. By combining empirical data with the pre-defined network structure, new relationships can be learned, thus generating an update of current knowledge. Probabilistic inference in Bayesian networks allows instant and global use of the model; its graphical representation makes it easy to visualize and communicate the results. Making use of the most of data from one single experiment, as well as combining data of independent experiments makes Bayesian networks for analysing these and similarly complex and rich data set

    Game Plan: What AI can do for Football, and What Football can do for AI

    Get PDF
    The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players’ and coordinated teams’ behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-theart and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual)

    An SHM approach using machine learning and statistical indicators extracted from raw dynamic measurements

    Get PDF
    Structural Health Monitoring using raw dynamic measurements is the subject of several studies aimed at identifying structural modifications or, more specifically, focused on damage assessment. Traditional damage detection methods associate structural modal deviations to damage. Nevertheless, the process used to determine modal characteristics can influence the results of such methods, which could lead to additional uncertainties. Thus, techniques combining machine learning and statistical analysis applied directly to raw measurements are being discussed in recent researches. The purpose of this paper is to investigate statistical indicators, little explored in damage identification methods, to characterize acceleration measurements directly in the time domain. Hence, the present work compares two machine learning algorithms to identify structural changes using statistics obtained from raw dynamic data. The algorithms are based on Artificial Neural Networks and Support Vector Machines. They are initially evaluated through numerical simulations using a simply supported beam model. Then, they are assessed through experimental tests performed on a laboratory beam structure and an actual railway bridge, in France. For all cases, different damage scenarios were considered. The obtained results encourage the development of computational tools using statistical indicators of acceleration measurements for structural alteration assessment.

    Machine Learning Models for Educational Platforms

    Get PDF
    Scaling up education online and onlife is presenting numerous key challenges, such as hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely. However, thanks to the wider availability of learning-related data and increasingly higher performance computing, Artificial Intelligence has the potential to turn such challenges into an unparalleled opportunity. One of its sub-fields, namely Machine Learning, is enabling machines to receive data and learn for themselves, without being programmed with rules. Bringing this intelligent support to education at large scale has a number of advantages, such as avoiding manual error-prone tasks and reducing the chance that learners do any misconduct. Planning, collecting, developing, and predicting become essential steps to make it concrete into real-world education. This thesis deals with the design, implementation, and evaluation of Machine Learning models in the context of online educational platforms deployed at large scale. Constructing and assessing the performance of intelligent models is a crucial step towards increasing reliability and convenience of such an educational medium. The contributions result in large data sets and high-performing models that capitalize on Natural Language Processing, Human Behavior Mining, and Machine Perception. The model decisions aim to support stakeholders over the instructional pipeline, specifically on content categorization, content recommendation, learners’ identity verification, and learners’ sentiment analysis. Past research in this field often relied on statistical processes hardly applicable at large scale. Through our studies, we explore opportunities and challenges introduced by Machine Learning for the above goals, a relevant and timely topic in literature. Supported by extensive experiments, our work reveals a clear opportunity in combining human and machine sensing for researchers interested in online education. Our findings illustrate the feasibility of designing and assessing Machine Learning models for categorization, recommendation, authentication, and sentiment prediction in this research area. Our results provide guidelines on model motivation, data collection, model design, and analysis techniques concerning the above applicative scenarios. Researchers can use our findings to improve data collection on educational platforms, to reduce bias in data and models, to increase model effectiveness, and to increase the reliability of their models, among others. We expect that this thesis can support the adoption of Machine Learning models in educational platforms even more, strengthening the role of data as a precious asset. The thesis outputs are publicly available at https://www.mirkomarras.com
    • 

    corecore