14 research outputs found

    When Sheep Shop: Measuring Herding Effects in Product Ratings with Natural Experiments

    Full text link
    As online shopping becomes ever more prevalent, customers rely increasingly on product rating websites for making purchase decisions. The reliability of online ratings, however, is potentially compromised by the so-called herding effect: when rating a product, customers may be biased to follow other customers' previous ratings of the same product. This is problematic because it skews long-term customer perception through haphazard early ratings. The study of herding poses methodological challenges. In particular, observational studies are impeded by the lack of counterfactuals: simply correlating early with subsequent ratings is insufficient because we cannot know what the subsequent ratings would have looked like had the first ratings been different. The methodology introduced here exploits a setting that comes close to an experiment, although it is purely observational---a natural experiment. Our key methodological device consists in studying the same product on two separate rating sites, focusing on products that received a high first rating on one site, and a low first rating on the other. This largely controls for confounds such as a product's inherent quality, advertising, and producer identity, and lets us isolate the effect of the first rating on subsequent ratings. In a case study, we focus on beers as products and jointly study two beer rating sites, but our method applies to any pair of sites across which products can be matched. We find clear evidence of herding in beer ratings. For instance, if a beer receives a very high first rating, its second rating is on average half a standard deviation higher, compared to a situation where the identical beer receives a very low first rating. Moreover, herding effects tend to last a long time and are noticeable even after 20 or more ratings. Our results have important implications for the design of better rating systems.Comment: Submitted at WWW2018 - April 2018 (10 pages, 6 figures, 6 tables); Added Acknowledgement

    A macroscopic loading model for dynamic, anisotropic and congested pedestrian flows: Implementation, calibration and case study analysis

    Get PDF
    Pedestrian facilities are more and more congested. It is important to understand the modeling of pedestrian flows to assure safety and comfort. Many models have already been developed such as the social force modle and PedCTM. The problem is that either they allow for population heterogeneity or they are fast. The goal of this project is to implement a new model that has both of these characteristics. The new model presented here is fast due to the use of a fundamental diagram. The anisotropy of this model comes from a new formulation of a fundamental diagram based on the literature, SbFD. This project gives a summary of the model and presents the new fundamental diagram. It also compares this fundamental diagram to the state-of-the-practice, Weidmann (1992). The different parameters of the model and the fundamental diagrams are calibrated using the simu- lated annealing algorithm. Finally, a case study analysis on two set of experimental data is done to compare the performance of the fundamental diagrams

    DATGAN: Integrating expert knowledge into deeplearning for population synthesis

    Get PDF
    Agent-based simulations and activity-based models used to analyse nationwide transport networks require detailed synthetic populations. These applications are becoming more and more complex and thus require more precise synthetic data. However, standard statistical techniques such as Iterative Proportional Fitting (IPF) or Gibbs sampling fail to provide data with a high enough standard, e.g. these techniques fail to generate rare combinations of attributes, also known as sampling zeros in the literature. Researchers have, thus, been investigating new deep learning techniques such as Generative Adversarial Networks (GANs) for population synthesis. These methods have already shown great success in other fields. However, one fundamental limitation is that GANs are data-driven techniques, and it is thus not possible to integrate expert knowledge in the data generation process. This can lead to the following issues: lack of representativity in the generated data, the introduction of bias, and the possibility of overfitting the sample’s noise. To address these limitations, we present the Directed Acyclic Tabular GAN (DATGAN) to integrate expert knowledge in deep learning models for synthetic populations. This approach allows the interactions between variables to be specified explicitly using a Directed Acyclic Graph (DAG). The DAG is then converted to a network of modified Long Short-Term Memory (LSTM) cells. Two types of multi-input LSTM cells have been developed to allow such structure in the generator. The DATGAN is then tested on the Chicago travel survey dataset. We show that our model outperforms state-of-the-art methods on Machine Learning efficacy and statistical metrics

    Stochastic Optimization with Adaptive Batch Size: Discrete Choice Models as a Case Study

    Get PDF
    The 2.5 quintillion bytes of data created each day brings new opportunities, but also new stimulating challenges for the discrete choice community. Opportunities because more and more new and larger data sets will undoubtedly become available in the future. Challenging because insights can only be discovered if models can be estimated, which is not simple on these large datasets. In this paper, inspired by the good practices and the intensive use of stochastic gradient methods in the ML field, we introduce the algorithm called Window Moving Average - Adaptive Batch Size (WMA-ABS) which is used to improve the efficiency of stochastic second-order methods. We present preliminary results that indicate that our algorithms outperform the standard secondorder methods, especially for large datasets. It constitutes a first step to show that stochastic algorithms can finally find their place in the optimization of Discrete Choice Models

    Estimation of discrete choice models with hybrid stochastic adaptive batch size algorithms

    Full text link
    The emergence of Big Data has enabled new research perspectives in the discrete choice community. While the techniques to estimate Machine Learning models on a massive amount of data are well established, these have not yet been fully explored for the estimation of statistical Discrete Choice Models based on the random utility framework. In this article, we provide new ways of dealing with large datasets in the context of Discrete Choice Models. We achieve this by proposing new efficient stochastic optimization algorithms and extensively testing them alongside existing approaches. We develop these algorithms based on three main contributions: the use of a stochastic Hessian, the modification of the batch size, and a change of optimization algorithm depending on the batch size. A comprehensive experimental comparison of fifteen optimization algorithms is conducted across ten benchmark Discrete Choice Model cases. The results indicate that the HAMABS algorithm, a hybrid adaptive batch size stochastic method, is the best performing algorithm across the optimization benchmarks. This algorithm speeds up the optimization time by a factor of 23 on the largest model compared to existing algorithms used in practice. The integration of the new algorithms in Discrete Choice Models estimation software will significantly reduce the time required for model estimation and therefore enable researchers and practitioners to explore new approaches for the specification of choice models.Comment: 43 page

    An Overtaking Decision Algorithm for Networked Intelligent Vehicles Based on Cooperative Perception

    Get PDF
    This paper presents an overtaking decision algorithm for networked intelligent vehicles. The algorithm is based on a cooperative tracking and sensor fusion algorithm that we previously developed. The ego vehicle is equipped with lane keeping and lane changing capabilities, as well as a forward-looking lidar sensor. The lidar data are fed to the tracking module which detects other vehicles, such as the vehicle that is to be overtaken (leading) and the oncoming traffic. Based on the estimated distances to the leading and the oncoming vehicles and their speeds, a risk is calculated and a corresponding overtaking decision is made. We compare the performance of the overtaking algorithm between the case when the ego vehicle only relies on its lidar sensor, and the case in which it fuses object estimates received from the leading car which also has a forward-looking lidar. Systematic evaluations are performed in Webots, a calibrated high-fidelity simulator

    A dynamic network loading model for anisotropic and congested pedestrian flows

    Get PDF
    A macroscopic loading model for multi-directional, time-varying and congested pedestrian flows is proposed in this paper. Walkable space is represented by a network of streams that are each associated with an area in which they interact. To describe this interaction, a stream-based pedestrian fundamental diagram is used that relates density and walking speed in multi-directional flow. The proposed model is applied to two different case studies. The explicit modeling of anisotropy in walking speed is shown to significantly improve the ability of the model to reproduce empirically observed walking time distributions. Moreover, the obtained model parametrization is in excellent agreement with the literature

    Bridging the gap between model-driven and data-driven methods in the era of Big Data

    No full text
    Data-driven and model-driven methodologies can be regarded as competitive fields since they tackle similar problems such as prediction. However, these two fields can learn from each other to improve themselves. Indeed, data-driven methodologies have been developed to use advanced methodologies based on Big Data technologies. On the other hand, model-driven methodologies concentrate on developing mathematical models based on theory and expert knowledge to allow for interpretability and control. Through three main contributions, this thesis aims to bridge the gap between these two fields by using their strengths and applying them to its counterpart. Discrete Choice Models (DCMs) have shown tremendous success in many fields, such as transportation. However, they have not evolved to tackle the growing amount of available data. On the other hand, Machine Learning (ML) researchers have developed optimization algorithms to efficiently estimate complex models on large datasets. Similarly, faster estimation of DCMs on larger datasets would improve the efficiency of modelers as well as enable new research axes. Thus, we take inspiration from the large body of existing research in efficient parameter estimation with extensive data and large numbers of parameters in deep learning and apply it to DCMs. The first chapter of this thesis introduces the HAMABS algorithm, which combines three fundamental principles to enable faster parameter estimation of DCMs (20x speedup compared to standard estimation) without compromising the precision of the parameter estimates. Collecting large amounts of data can be cumbersome and costly, even in the era of Big Data. For example, ML researchers in Computer Vision have been developing generative deep learning models to augment datasets. DCM researchers face similar issues with tabular data, e.g. travel surveys. In addition, if the collection process is not performed correctly, these datasets can contain bias, lack consistency, or be unrepresentative of the actual population. The second chapter of this thesis introduces the DATGAN, a Generative Adversarial Network (GAN) integrating expert knowledge to control the generation process. This new architecture allows modelers to generate controlled and representative synthetic data, outperforming similar state-of-the-art generative models. Finally, researchers are increasingly developing fully disaggregate agent-based simulation models, which use detailed synthetic populations to generate aggregate passenger flows. However, detailed disaggregate socioeconomic data is usually expensive to collect and heavily restricted in terms of access and usage. As such, synthetic populations are typically either drawn randomly from aggregate level control totals, limiting their quality, or tightly controlled, limiting their application and usefulness. To combat this, the third chapter extends the DATGAN methodology to generate highly detailed and consistent synthetic populations from small sample data. First, ciDATGAN learns to generate the variables in a low-sample highly detailed dataset, e.g. household travel survey. It then completes a high-sample dataset with few variables, e.g. microdata census, by generating the previously learned variables. The results show that this methodology can correct for bias and may enable the transfer of synthetic populations to new areas/contexts
    corecore