Search CORE

6,907 research outputs found

Combination of Diverse Ranking Models for Personalized Expedia Hotel Searches

Author: Li Qiang
Liu Xudong
Pang Liang
Sun Hanxiao
Wang Bin
Xu Bing
Yan Qiang
Zhang Yuyu
Publication venue
Publication date: 29/11/2013
Field of study

The ICDM Challenge 2013 is to apply machine learning to the problem of hotel ranking, aiming to maximize purchases according to given hotel characteristics, location attractiveness of hotels, user's aggregated purchase history and competitive online travel agency information for each potential hotel choice. This paper describes the solution of team "binghsu & MLRush & BrickMover". We conduct simple feature engineering work and train different models by each individual team member. Afterwards, we use listwise ensemble method to combine each model's output. Besides describing effective model and features, we will discuss about the lessons we learned while using deep learning in this competition.Comment: 6 pages, 3 figure

arXiv.org e-Print Archive

Regression and Learning to Rank Aggregation for User Engagement Evaluation

Author: Moradi Pooya
Shakery Azadeh
Zamani Hamed
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

User engagement refers to the amount of interaction an instance (e.g., tweet, news, and forum post) achieves. Ranking the items in social media websites based on the amount of user participation in them, can be used in different applications, such as recommender systems. In this paper, we consider a tweet containing a rating for a movie as an instance and focus on ranking the instances of each user based on their engagement, i.e., the total number of retweets and favorites it will gain. For this task, we define several features which can be extracted from the meta-data of each tweet. The features are partitioned into three categories: user-based, movie-based, and tweet-based. We show that in order to obtain good results, features from all categories should be considered. We exploit regression and learning to rank methods to rank the tweets and propose to aggregate the results of regression and learning to rank methods to achieve better performance. We have run our experiments on an extended version of MovieTweeting dataset provided by ACM RecSys Challenge 2014. The results show that learning to rank approach outperforms most of the regression models and the combination can improve the performance significantly.Comment: In Proceedings of the 2014 ACM Recommender Systems Challenge, RecSysChallenge '1

arXiv.org e-Print Archive

A proposal project for a blind image quality assessment by learning distortions from the full reference image quality assessments

Author: Paris Stéfane
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/11/2015
Field of study

This short paper presents a perspective plan to build a null reference image quality assessment. Its main goal is to deliver both the objective score and the distortion map for a given distorted image without the knowledge of its reference image.Comment: International Workshop on Quality of Multimedia Experience, 2012, Melbourne, Australi

arXiv.org e-Print Archive

Some variations on Ensembled Random Survival Forest with application to Cancer Research

Author: Dey Arabin Kumar
Juneja Anshul
N. Suhas
Teja Talasila Sai
Publication venue
Publication date: 12/06/2018
Field of study

In this paper we describe a novel implementation of adaboost for prediction of survival function. We take different variations of the algorithm and compare the algorithms based on system run time and root mean square error. Our construction includes right censoring data and competing risk data too. We take different data set to illustrate the performance of the algorithms.Comment: 16 pages; 10 figure

arXiv.org e-Print Archive

Scalable Multilabel Prediction via Randomized Methods

Author: Karampatziakis Nikos
Mineiro Paul
Publication venue
Publication date: 20/04/2015
Field of study

Modeling the dependence between outputs is a fundamental challenge in multilabel classification. In this work we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve state-of-the-art performance on a variety of benchmark problems. Crucially, we compute the joint predictions without ever obtaining any independent predictions, while incorporating low-rank and smoothness regularization. We achieve this by leveraging randomized algorithms for matrix decomposition and kernel approximation. Furthermore, our techniques are applicable to the multiclass setting. We apply our method to a variety of multiclass and multilabel data sets, obtaining state-of-the-art results

arXiv.org e-Print Archive

Non-uniform Feature Sampling for Decision Tree Ensembles

Author: Kyrillidis Anastasios
Zouzias Anastasios
Publication venue
Publication date: 24/03/2014
Field of study

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset:

(i)

\emph{leverage scores-based} and

(ii)

\emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]Comment: 7 pages, 7 figures, 1 tabl

arXiv.org e-Print Archive

Comparing various regression methods on ensemble strategies in differential evolution

Author: Brest Janez
Fister Jr. Iztok
Fister Iztok
Publication venue
Publication date: 02/07/2013
Field of study

Differential evolution possesses a multitude of various strategies for generating new trial solutions. Unfortunately, the best strategy is not known in advance. Moreover, this strategy usually depends on the problem to be solved. This paper suggests using various regression methods (like random forest, extremely randomized trees, gradient boosting, decision trees, and a generalized linear model) on ensemble strategies in differential evolution algorithm by predicting the best differential evolution strategy during the run. Comparing the preliminary results of this algorithm by optimizing a suite of five well-known functions from literature, it was shown that using the random forest regression method substantially outperformed the results of the other regression methods

arXiv.org e-Print Archive

Predicting the Behavior of the Supreme Court of the United States: A General Approach

Author: Blackman Josh
Bommarito II Michael J
Katz Daniel Martin
Publication venue
Publication date: 23/07/2014
Field of study

Building upon developments in theoretical and applied machine learning, as well as the efforts of various scholars including Guimera and Sales-Pardo (2011), Ruger et al. (2004), and Martin et al. (2004), we construct a model designed to predict the voting behavior of the Supreme Court of the United States. Using the extremely randomized tree method first proposed in Geurts, et al. (2006), a method similar to the random forest approach developed in Breiman (2001), as well as novel feature engineering, we predict more than sixty years of decisions by the Supreme Court of the United States (1953-2013). Using only data available prior to the date of decision, our model correctly identifies 69.7% of the Court's overall affirm and reverse decisions and correctly forecasts 70.9% of the votes of individual justices across 7,700 cases and more than 68,000 justice votes. Our performance is consistent with the general level of prediction offered by prior scholars. However, our model is distinctive as it is the first robust, generalized, and fully predictive model of Supreme Court voting behavior offered to date. Our model predicts six decades of behavior of thirty Justices appointed by thirteen Presidents. With a more sound methodological foundation, our results represent a major advance for the science of quantitative legal prediction and portend a range of other potential applications, such as those described in Katz (2013).Comment: 17 pages, 6 figures; source available at https://github.com/mjbommar/scotus-predic

arXiv.org e-Print Archive

Consistency of Random Survival Forests

Author: Ishwaran Hemant
Kogalur Udaya B.
Publication venue
Publication date: 18/11/2008
Field of study

We prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables--that is, under true implementation of the methodology. A key assumption made is that all variables are factors. Although this assumes that the feature space has finite cardinality, in practice the space can be a extremely large--indeed, current computational procedures do not properly deal with this setting. An indirect consequence of this work is the introduction of new computational methodology for dealing with factors with unlimited number of labels.Comment: Submitted to the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Randomized Nonnegative Matrix Factorization

Author: Erichson N. Benjamin
Kutz J. Nathan
Mendible Ariana
Wihlborn Sophie
Publication venue: 'Elsevier BV'
Publication date: 30/04/2018
Field of study

Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of `big data' has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a near-optimal factorization. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS.Comment: This is an extended and revised version of the paper which appeared in JPR

arXiv.org e-Print Archive