181,352 research outputs found

    Estimation methods for ranking recent information

    Full text link

    Causal Inference with Ranking Data: Application to Blame Attribution in Police Violence and Ballot Order Effects in Ranked-Choice Voting

    Full text link
    While rankings are at the heart of social science research, little is known about how to analyze ranking data in experimental studies. This paper introduces a potential outcomes framework to perform causal inference when outcome data are ranking data. It clarifies the structure and multi-dimensionality of ranking data, introduces causal estimands tailored to ranked outcomes, and develops methods for estimation and inference. Furthermore, it extends the framework to partially ranked data by building on principal stratification. I show that partial rankings can be considered a selection problem and propose nonparametric sharp bounds for the treatment effects. Using the methods, I reanalyze the recent study on blame attribution in the Stephon Clark shooting, finding that people's responses to officer-involved shootings are robust to the contextual information about police brutality and reform. I also apply the methods to an experimental design for quantifying ballot order effects in ranked-choice voting

    Diagnostic Evaluation of Policy-Gradient-Based Ranking

    Get PDF
    Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient

    A Unified and Optimal Multiple Testing Framework based on rho-values

    Full text link
    Multiple testing is an important research direction that has gained major attention in recent years. Currently, most multiple testing procedures are designed with p-values or Local false discovery rate (Lfdr) statistics. However, p-values obtained by applying probability integral transform to some well-known test statistics often do not incorporate information from the alternatives, resulting in suboptimal procedures. On the other hand, Lfdr based procedures can be asymptotically optimal but their guarantee on false discovery rate (FDR) control relies on consistent estimation of Lfdr, which is often difficult in practice especially when the incorporation of side information is desirable. In this article, we propose a novel and flexibly constructed class of statistics, called rho-values, which combines the merits of both p-values and Lfdr while enjoys superiorities over methods based on these two types of statistics. Specifically, it unifies these two frameworks and operates in two steps, ranking and thresholding. The ranking produced by rho-values mimics that produced by Lfdr statistics, and the strategy for choosing the threshold is similar to that of p-value based procedures. Therefore, the proposed framework guarantees FDR control under weak assumptions; it maintains the integrity of the structural information encoded by the summary statistics and the auxiliary covariates and hence can be asymptotically optimal. We demonstrate the efficacy of the new framework through extensive simulations and two data applications

    Geospatial and statistical methods to model intracity truck crashes

    Get PDF
    In recent years, there has been a renewed interest in statistical ranking criteria to identify hot spots on road networks. These criteria potentially represent high crash risk zones for further engineering evaluation and safety improvement. Many studies also focused on the development of crash estimation models to quantify the safety effects of geometric, traffic, and environmental factors on expected number of total, fatal, injury, and/or property damage crashes at specific locations. However, freight safety, specifically truck safety, was meagerly addressed. Trucks and long-combination vehicles (LCVs) that carry approximately 70% freight have significant potential in triggering crash occurrences on roads, mostly severe crashes. Truck transportation is therefore attracting more and more attention due to its effect on safety and operational performance as well as rapid industrial growth. Most of the past research on truck safety focused on intercity or Interstate truck trips. Intracity truck safety related studies or research was hardly pursued. The major research objectives of this dissertation are: 1) to develop a geospatial method to identify high truck crash zones, 2) to evaluate the use of different ranking methods for prioritization and allocation of resources, 3) to investigate the relations between intracity truck crash occurrences and various predictor variables (on- and off-network characteristics) to provide greater insights regarding crash occurrence and effective countermeasures, and 4) to develop truck crash prediction models. The prioritization of high truck crash zones was performed by identifying truck crash hot spots and ranking them based on several parameters. Geospatial methods along with statistical methods were deployed to understand the relationships between geometric road conditions, land use characteristics, demographic, and socio-economic characteristics and truck crashes. Truck crash estimation models were then developed using selected on- and off- network characteristics data. To assess the suitability of these models, several goodness-of-fit statistics were computed. The geospatial methods and development of truck crash estimation models are illustrated using data for the city of Charlotte, North Carolina for the year 2008. It was found that on-off network characteristics, socio-economic characteristics and demographic characteristics that are within the 0.5-mile proximity have a vital influence on truck crash occurrence. The findings from the research are expected to provide information and methods on identifying truck crash zones and the likelihood of a truck crash occurrence due to intracity trips and its relationship with on- and off-network characteristics of a region. Furthermore, this research is expected to aid significantly in the process of selecting meaningful countermeasures to improve safety of users on roads

    RankME: Reliable Human Ratings for Natural Language Generation

    Full text link
    Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments. We show that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods. In addition, we show that it is possible to evaluate NLG systems according to multiple, distinct criteria, which is important for error analysis. Finally, we demonstrate that RankME, in combination with Bayesian estimation of system quality, is a cost-effective alternative for ranking multiple NLG systems.Comment: Accepted to NAACL 2018 (The 2018 Conference of the North American Chapter of the Association for Computational Linguistics

    Modeling Temporal Evidence from External Collections

    Full text link
    Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.Comment: To appear in WSDM 201

    Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

    Full text link
    How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.Comment: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed, and minor updates made in the Appendix (v2
    • …
    corecore