66,673 research outputs found
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
Hybrid Collaborative Filtering with Autoencoders
Collaborative Filtering aims at exploiting the feedback of users to provide
personalised recommendations. Such algorithms look for latent variables in a
large sparse matrix of ratings. They can be enhanced by adding side information
to tackle the well-known cold start problem. While Neu-ral Networks have
tremendous success in image and speech recognition, they have received less
attention in Collaborative Filtering. This is all the more surprising that
Neural Networks are able to discover latent variables in large and
heterogeneous datasets. In this paper, we introduce a Collaborative Filtering
Neural network architecture aka CFN which computes a non-linear Matrix
Factorization from sparse rating inputs and side information. We show
experimentally on the MovieLens and Douban dataset that CFN outper-forms the
state of the art and benefits from side information. We provide an
implementation of the algorithm as a reusable plugin for Torch, a popular
Neural Network framework
Recommended from our members
Big Data in the Oil and Gas Industry: A Promising Courtship
The energy industry remains one of the highest money-producing and investment industries in the world. The United States’ own economic stability depends greatly on the stability of oil and gas prices. Various factors affect the amount of money that will continue to be invested in producing oil. A main disadvantage to the oil and gas industry is its lack of technological adaptation. This weakens the industry because the surest measures are not currently being taken to produce oil in optimally efficient, safe, and cost-effective ways. Big data has gained global recognition as an opportunity to gather large volumes of information in real-time and translate data sets into actionable insights. In a low commodity price environment, saving time, reducing costs, and improving safety are crucial outcomes that can be realized using machine learning in oil and gas operations. Big data provides the opportunity to use unsupervised learning. For example, with this approach, engineers can predict oil wells’ optimal barrels of production given the completion data in a specific area. However, a caveat to utilizing big data in the oil and gas industry is that there simply is neither enough physical data nor data velocity in the industry to be properly referred to as “big data.” Big data, as it develops, will nonetheless significantly change the energy business in the future, as it already has in various other industries.Petroleum and Geosystems Engineerin
- …