417 research outputs found

    Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

    Full text link
    Crowdsourcing platforms are now extensively used for conducting subjective pairwise comparison studies. In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement. In this paper, we use tools from random graph theory to analyze these two random sampling methods for the HodgeRank estimator. Using the Fiedler value of the graph as a measurement for estimator stability (informativeness), we provide a new estimate of the Fiedler value for these two random graph models. In the asymptotic limit as the number of vertices tends to infinity, we prove the validity of the estimate. Based on our findings, for a small number of items to be compared, we recommend a two-stage sampling strategy where a greedy sampling method is used initially and random sampling \emph{without} replacement is used in the second stage. When a large number of items is to be compared, we recommend random sampling with replacement as this is computationally inexpensive and trivially parallelizable. Experiments on synthetic and real-world datasets support our analysis

    HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation

    Full text link
    Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains. However, task requester usually has a limited amount of budget, thus it is desirable to have a policy to wisely allocate the budget to achieve better quality. In this paper, we study the principle of information maximization for active sampling strategies in the framework of HodgeRank, an approach based on Hodge Decomposition of pairwise ranking data with multiple workers. The principle exhibits two scenarios of active sampling: Fisher information maximization that leads to unsupervised sampling based on a sequential maximization of graph algebraic connectivity without considering labels; and Bayesian information maximization that selects samples with the largest information gain from prior to posterior, which gives a supervised sampling involving the labels collected. Experiments show that the proposed methods boost the sampling efficiency as compared to traditional sampling schemes and are thus valuable to practical crowdsourcing experiments.Comment: Accepted by AAAI201

    The relationships between PM2.5 and meteorological factors in China: Seasonal and regional variations

    Full text link
    The interactions between PM2.5 and meteorological factors play a crucial role in air pollution analysis. However, previous studies that have researched the relationships between PM2.5 concentration and meteorological conditions have been mainly confined to a certain city or district, and the correlation over the whole of China remains unclear. Whether or not spatial and seasonal variations exit deserves further research. In this study, the relationships between PM2.5 concentration and meteorological factors were investigated in 74 major cities in China for a continuous period of 22 months from February 2013 to November 2014, at season, year, city, and regional scales, and the spatial and seasonal variations were analyzed. The meteorological factors were relative humidity (RH), temperature (TEM), wind speed (WS), and surface pressure (PS). We found that spatial and seasonal variations of their relationships with PM2.5 do exist. Spatially, RH is positively correlated with PM2.5 concentration in North China and Urumqi, but the relationship turns to negative in other areas of China. WS is negatively correlated with PM2.5 everywhere expect for Hainan Island. PS has a strong positive relationship with PM2.5 concentration in Northeast China and Mid-south China, and in other areas the correlation is weak. Seasonally, the positive correlation between PM2.5 concentration and RH is stronger in winter and spring. TEM has a negative relationship with PM2.5 in autumn and the opposite in winter. PS is more positively correlated with PM2.5 in autumn than in other seasons. Our study investigated the relationships between PM2.5 and meteorological factors in terms of spatial and seasonal variations, and the conclusions about the relationships between PM2.5 and meteorological factors are more comprehensive and precise than before.Comment: 3 tables, 13 figure

    Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

    Full text link
    Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic \textit{non-convex} optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate O(1T)\mathcal{O}(\frac{1}{T}) in our setting, where TT is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.Comment: 11 pages, 3 figures, 2 tables, accepted by AAAI201
    corecore