Search CORE

4 research outputs found

Algorithms for estimating the parameters of factorisation machines

Author: de Jongh Riaan
Slabber Erika
Verster Tanja
Publication venue: 'South African Statistical Association (SASA)'
Publication date: 28/09/2022
Field of study

Since the introduction of factorisation machines in 2010, it became a popular prediction technique amongst machine learners who applied the method with success in several data science challenges such as Kaggle or KDD Cup. Despite these successes, factorisation machines are not often considered as a modelling technique in business, partly because large companies prefer tried and tested software for model implementation. Popular modelling techniques for prediction problems, such as generalised linear models, neural networks, and classification and regression trees, have been implemented in commercial software such as SAS which is widely used by banks, insurance, pharmaceutical and telecommunication companies. To popularise the use of factorisation machines in business, we implement algorithms for fitting factorisation machines in SAS. These algorithms minimise two loss functions, namely the weighted sum of squared errors and the weighted sum of absolute deviations using coordinate descent and nonlinear programming procedures. Using a simulation study, the above-mentioned routines are tested in terms of accuracy and efficiency. The prediction power of factorisation machines is then illustrated by analysing two data sets

Stellenbosch University: SUNJournals

Recommended from our members

Hybrid-Parallel Parameter Estimation for Frequentist and Bayesian Models

Author: Raman Parameswaran
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Distributed algorithms in machine learning follow two main flavors: horizontal partitioning, where the data is distributed across multiple slaves and vertical partitioning, where the model parameters are partitioned across multiple machines. The main drawback of the former strategy is that the model parameters need to be replicated on every machine. This is problematic when the number of parameters is very large, and hence cannot fit in a single machine. This drawback of the latter strategy is that the data needs to be replicated on each machine, thus failing to scale to massive datasets.The goal of this thesis is to achieve the best of both worlds by partitioning both - the data as well as the model parameters, thus enabling the training of more sophisticated models on massive datasets. In order to do so, we exploit a structure that is observed in several machine learning models, which we term as \textit{Double-Separability}. Double-Separability basically means that the objective function of the model can be decomposed into independent sub-functions which can be computed independently. For distributed machine learning, this implies that both data and model parameters can partitioned across machines and stochastic updates for parameters can be carried out independently and without any locking. Furthermore, double-separability naturally lends itself to developing efficient asynchronous algorithms which enable computation and communication to happen in parallel, offering further speedup. Some machine learning models such as Matrix Factorization directly exhibit double-separability in their objective function, however the majority of models do not. My work explores techniques to reformulate the objective function of such models to cast them into double-separable form. Often this involves introducing additional auxiliary variables that have nice interpretations. In this direction, I have developed Hybrid Parallel algorithms for machine learning tasks that include {\it Latent Collaborative Retrieval}, {\it Multinomial Logistic Regression}, {\it Variational Inference for Mixture of Exponential Families} and {\it Factorization Machines}. The software resulting from this work are available for public use under an open-source license

eScholarship - University of California

Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives

Author: KITADA Shunsuke
Publication venue
Publication date: 24/03/2023
Field of study

With the dramatic advances in deep learning technology, machine learning research is focusing on improving the interpretability of model predictions as well as prediction performance in both basic and applied research. While deep learning models have much higher prediction performance than conventional machine learning models, the specific prediction process is still difficult to interpret and/or explain. This is known as the black-boxing of machine learning models and is recognized as a particularly important problem in a wide range of research fields, including manufacturing, commerce, robotics, and other industries where the use of such technology has become commonplace, as well as the medical field, where mistakes are not tolerated.Focusing on natural language processing tasks, we consider interpretability as the presentation of the contribution of a prediction to an input word in a recurrent neural network. In interpreting predictions from deep learning models, much work has been done mainly on visualization of importance mainly based on attention weights and gradients for the inference results. However, it has become clear in recent years that there are not negligible problems with these mechanisms of attention mechanisms and gradients-based techniques. The first is that the attention weight learns which parts to focus on, but depending on the task or problem setting, the relationship with the importance of the gradient may be strong or weak, and these may not always be strongly related. Furthermore, it is often unclear how to integrate both interpretations. From another perspective, there are several unclear aspects regarding the appropriate application of the effects of attention mechanisms to real-world problems with large datasets, as well as the properties and characteristics of the applied effects. This dissertation discusses both basic and applied research on how attention mechanisms improve the performance and interpretability of machine learning models.From the basic research perspective, we proposed a new learning method that focuses on the vulnerability of the attention mechanism to perturbations, which contributes significantly to prediction performance and interpretability. Deep learning models are known to respond to small perturbations that humans cannot perceive and may exhibit unintended behaviors and predictions. Attention mechanisms used to interpret predictions are no exception. This is a very serious problem because current deep learning models rely heavily on this mechanism. We focused on training techniques using adversarial perturbations, i.e., perturbations that dares to deceive the attention mechanism. We demonstrated that such an adversarial training technique makes the perturbation-sensitive attention mechanism robust and enables the presentation of highly interpretable predictive evidence. By further extending the proposed technique to semi-supervised learning, a general-purpose learning model with a more robust and interpretable attention mechanism was achieved.From the applied research perspective, we investigated the effectiveness of the deep learning models with attention mechanisms validated in the basic research, are in real-world applications. Since deep learning models with attention mechanisms have mainly been evaluated using basic tasks in natural language processing and computer vision, their performance when used as core components of applications and services has often been unclear. We confirm the effectiveness of the proposed framework with an attention mechanism by focusing on the real world of applications, particularly in the field of computational advertising, where the amount of data is large, and the interpretation of predictions is necessary. The proposed frameworks are new attempts to support operations by predicting the nature of digital advertisements with high serving effectiveness, and their effectiveness has been confirmed using large-scale ad-serving data.In light of the above, the research summarized in this dissertation focuses on the attention mechanism, which has been the focus of much attention in recent years, and discusses its potential for both basic research in terms of improving prediction performance and interpretability, and applied research in terms of evaluating it for real-world applications using large data sets beyond the laboratory environment. The dissertation also concludes with a summary of the implications of these findings for subsequent research and future prospects in the field.博士（工学）法政大学 (Hosei University

Hosei University Repository