57,397 research outputs found

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Get PDF
    Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

    A Cost-based Optimizer for Gradient Descent Optimization

    Full text link
    As the use of machine learning (ML) permeates into diverse application domains, there is an urgent need to support a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. We propose a cost-based GD optimizer that selects the best GD plan for a given ML task. To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201

    Event tracking for real-time unaware sensitivity analysis (EventTracker)

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.This paper introduces a platform for online Sensitivity Analysis (SA) that is applicable in large scale real-time data acquisition (DAQ) systems. Here we use the term real-time in the context of a system that has to respond to externally generated input stimuli within a finite and specified period. Complex industrial systems such as manufacturing, healthcare, transport, and finance require high quality information on which to base timely responses to events occurring in their volatile environments. The motivation for the proposed EventTracker platform is the assumption that modern industrial systems are able to capture data in real-time and have the necessary technological flexibility to adjust to changing system requirements. The flexibility to adapt can only be assured if data is succinctly interpreted and translated into corrective actions in a timely manner. An important factor that facilitates data interpretation and information modelling is an appreciation of the affect system inputs have on each output at the time of occurrence. Many existing sensitivity analysis methods appear to hamper efficient and timely analysis due to a reliance on historical data, or sluggishness in providing a timely solution that would be of use in real-time applications. This inefficiency is further compounded by computational limitations and the complexity of some existing models. In dealing with real-time event driven systems, the underpinning logic of the proposed method is based on the assumption that in the vast majority of cases changes in input variables will trigger events. Every single or combination of events could subsequently result in a change to the system state. The proposed event tracking sensitivity analysis method describes variables and the system state as a collection of events. The higher the numeric occurrence of an input variable at the trigger level during an event monitoring interval, the greater is its impact on the final analysis of the system state. Experiments were designed to compare the proposed event tracking sensitivity analysis method with a comparable method (that of Entropy). An improvement of 10% in computational efficiency without loss in accuracy was observed. The comparison also showed that the time taken to perform the sensitivity analysis was 0.5% of that required when using the comparable Entropy based method.EPSR

    Bayesian Fused Lasso regression for dynamic binary networks

    Full text link
    We propose a multinomial logistic regression model for link prediction in a time series of directed binary networks. To account for the dynamic nature of the data we employ a dynamic model for the model parameters that is strongly connected with the fused lasso penalty. In addition to promoting sparseness, this prior allows us to explore the presence of change points in the structure of the network. We introduce fast computational algorithms for estimation and prediction using both optimization and Bayesian approaches. The performance of the model is illustrated using simulated data and data from a financial trading network in the NYMEX natural gas futures market. Supplementary material containing the trading network data set and code to implement the algorithms is available online
    • 

    corecore