92 research outputs found

    Footballonomics: The Anatomy of American Football; Evidence from 7 years of NFL game data

    Get PDF
    Do NFL teams make rational decisions? What factors potentially affect the probability of wining a game in NFL? How can a team come back from a demoralizing interception? In this study we begin by examining the hypothesis of rational coaching, that is, coaching decisions are always rational with respect to the maximization of the expected points scored. We reject this hypothesis by analyzing the decisions made in the past 7 NFL seasons for two particular plays; (i) the Point(s) After Touchdown (PAT) and (ii) the fourth down decisions. Having rejected the rational coaching hypothesis we move on to examine how the detailed game data collected can potentially inform game-day decisions. While NFL teams personnel definitely have an intuition on which factors are crucial for winning a game, in this work we take a data-driven approach and provide quantifiable evidence using a large dataset of NFL games for the 7-year period between 2009 and 2015. In particular, we use a logistic regression model to identify the impact and the corresponding statistical significance of factors such as possession time, number of penalty yards, balance between passing and rushing offense etc. Our results clearly imply that avoiding turnovers is the best strategy for winning a game but turnovers can be overcome with letting the offense on the field for more time. Finally we combine our descriptive model with statistical bootstrap in order to provide a prediction engine for upcoming NFL games. Our evaluations indicate that even by only considering a small number of (straightforward) factors, we can achieve a very good prediction accuracy. In particular, the average accuracy during seasons 2014 and 2015 is approximately 63%. This performance is comparable to the more complicated state-of-the-art prediction systems, while it outperforms expert analysts 60% of the time.Comment: Working study - Papers has been presented at the Machine Learning and Data Mining for Sports Analytics 2016 workshop and accepted at PLOS ON

    SamBaTen: Sampling-based Batch Incremental Tensor Decomposition

    Full text link
    Tensor decompositions are invaluable tools in analyzing multimodal datasets. In many real-world scenarios, such datasets are far from being static, to the contrary they tend to grow over time. For instance, in an online social network setting, as we observe new interactions over time, our dataset gets updated in its "time" mode. How can we maintain a valid and accurate tensor decomposition of such a dynamically evolving multimodal dataset, without having to re-compute the entire decomposition after every single update? In this paper we introduce SaMbaTen, a Sampling-based Batch Incremental Tensor Decomposition algorithm, which incrementally maintains the decomposition given new updates to the tensor dataset. SaMbaTen is able to scale to datasets that the state-of-the-art in incremental tensor decomposition is unable to operate on, due to its ability to effectively summarize the existing tensor and the incoming updates, and perform all computations in the reduced summary space. We extensively evaluate SaMbaTen using synthetic and real datasets. Indicatively, SaMbaTen achieves comparable accuracy to state-of-the-art incremental and non-incremental techniques, while being 25-30 times faster. Furthermore, SaMbaTen scales to very large sparse and dense dynamically evolving tensors of dimensions up to 100K x 100K x 100K where state-of-the-art incremental approaches were not able to operate

    Ensemble Node Embeddings using Tensor Decomposition: A Case-Study on DeepWalk

    Full text link
    Node embeddings have been attracting increasing attention during the past years. In this context, we propose a new ensemble node embedding approach, called TenSemble2Vec, by first generating multiple embeddings using the existing techniques and taking them as multiview data input of the state-of-art tensor decomposition model namely PARAFAC2 to learn the shared lower-dimensional representations of the nodes. Contrary to other embedding methods, our TenSemble2Vec takes advantage of the complementary information from different methods or the same method with different hyper-parameters, which bypasses the challenge of choosing models. Extensive tests using real-world data validates the efficiency of the proposed method

    REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums

    Get PDF
    How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest
    corecore