Search CORE

1,605 research outputs found

Recommended from our members

Cross-State Substitution: Estimating the Effect of the 2003 Illinois Gaming Tax Restructuring on Indiana Riverboat Gaming Volume in the Chicagoland Region

Author: Ahlgren Mike
Singh Dipendra
Publication venue: ScholarWorks@UMass Amherst
Publication date: 07/01/2011
Field of study

This paper analyzes the effect of the 2003 Illinois gaming tax increase on Indiana riverboat gaming demand. The four riverboats located in Indiana’s Northeast corner are examined. Slot machine coin-in from January 2000 to December 2006 is chosen to represent gaming demand. Multiple regression analysis is used to model both the tax increase and account for seasonality in the data. The findings reveal that a segment of Indiana riverboat operators experienced an increase in gaming demand when the tax increases took effect. The findings suggest that legislators should acknowledge and evaluate the negative economic pressures that tax increases have on their own state’s commercial gaming operators and recognize the benefits tax increase bring to the gaming industry in competing states

ScholarWorks@UMass Amherst

Book Reviews

Author: Hinchy Mike
Moran Dominic
Perry Neil
Simmons Phil
Singh Satbir
Publication venue
Publication date
Field of study

Teaching/Communication/Extension/Profession,

Research Papers in Economics

Interpretation of Natural Language Rules in Conversational Machine Reading

Author: Bartolo Max
Bouchard Guillaume
Lewis Patrick
Riedel Sebastian
Rocktäschel Tim
Saeidi Marzieh
Sheldon Mike
Singh Sameer
Publication venue
Publication date: 01/01/2018
Field of study

Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. One example is the task of interpreting regulations to answer "Can I...?" or "Do I have to...?" questions such as "I am working in Canada. Do I have to carry on paying UK National Insurance?" after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as "How long have you been working abroad?" when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

UCL Discovery

Synote: development of a Web-based tool for synchronized annotations

Author: Gilbert Lester
Kajaba Jiri
Khoja Shakeel
Li Yunjia
Millard David
Singh Priyanka
Wald Mike
Wills Gary
Publication venue
Publication date: 01/12/2011
Field of study

This paper discusses the development of a Web-based media annotation application named Synote, which addresses the important issue that while the whole of a multimedia resource on the Web can be easily bookmarked, searched, linked to and tagged, it is still difficult to search or associate notes or other resources with a certain part of a resource. Synote supports the creation of synchronized notes, bookmarks, tags, links, images and text captions. It is a freely available application that enables any user to make annotations in and search annotations to any fragment of a continuous multimedia resource in the most used browsers and operating systems. In the implementation, Synote categorized different media resources and synchronized them via time line. The presentation of synchronized resources makes full use of Web 2.0 AJAX technology to enrich interoperability for the user experience. Positive evaluation results about the performance, efficiency and effectiveness of Synote were returned when using it with students and teachers for a number of undergraduate courses

Southampton (e-Prints Soton)

Crossref

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Author: Cohan Arman
D'Arcy Mike
Downey Doug
Feldman Sergey
Singh Amanpreet
Publication venue
Publication date: 23/11/2022
Field of study

Learned representations of scientific documents can serve as valuable input features for downstream tasks, without the need for further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 25 challenging and realistic tasks, 11 of which are new, across four formats: classification, regression, ranking and search. We then use the benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models struggle to generalize across task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters in a multi-task setting and find that they outperform the existing single-embedding state-of-the-art by up to 1.5 points absolute.Comment: 21 pages, 2 figures, 9 tables. For associated code, see https://github.com/allenai/scirepeva

arXiv.org e-Print Archive

Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset

Author: Ma Po-Lun
Pritchard Mike
Silva Sam
Singh Balwinder
Yu Sungduk
Publication venue
Publication date: 07/09/2023
Field of study

Hyperparameter optimization (HPO) is an important step in machine learning (ML) model development, but common practices are archaic -- primarily relying on manual or grid searches. This is partly because adopting advanced HPO algorithms introduces added complexity to the workflow, leading to longer computation times. This poses a notable challenge to ML applications, as suboptimal hyperparameter selections curtail the potential of ML model performance, ultimately obstructing the full exploitation of ML techniques. In this article, we present a two-step HPO method as a strategic solution to curbing computational demands and wait times, gleaned from practical experiences in applied ML parameterization work. The initial phase involves a preliminary evaluation of hyperparameters on a small subset of the training dataset, followed by a re-evaluation of the top-performing candidate models post-retraining with the entire training dataset. This two-step HPO method is universally applicable across HPO search algorithms, and we argue it has attractive efficiency gains. As a case study, we present our recent application of the two-step HPO method to the development of neural network emulators for aerosol activation. Although our primary use case is a data-rich limit with many millions of samples, we also find that using up to 0.0025% of the data (a few thousand samples) in the initial step is sufficient to find optimal hyperparameter configurations from much more extensive sampling, achieving up to 135-times speedup. The benefits of this method materialize through an assessment of hyperparameters and model performance, revealing the minimal model complexity required to achieve the best performance. The assortment of top-performing models harvested from the HPO process allows us to choose a high-performing model with a low inference cost for efficient use in global climate models (GCMs)

arXiv.org e-Print Archive

Questions Are All You Need to Train a Dense Passage Retriever

Author: Lewis Mike
Pineau Joelle
Sachan Devendra Singh
Yogatama Dani
Zaheer Manzil
Zettlemoyer Luke
Publication venue
Publication date: 01/01/2023
Field of study

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.Comment: Accepted to TACL, pre MIT Press publication versio

arXiv.org e-Print Archive

Directory of Open Access Journals

Synthetic Dataset Generation for Adversarial Machine Learning Research

Author: Busho Colin
Cornelius Cory
Liu Xiruo
Martin Jason
Paul Anindya
Singh Shibani
Tan Mike
Publication venue
Publication date: 21/07/2022
Field of study

Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical systems, we propose approximating the real-world through simulation. In this paper we describe our synthetic dataset generation tool that enables scalable collection of such a synthetic dataset with realistic adversarial examples. We use the CARLA simulator to collect such a dataset and demonstrate simulated attacks that undergo the same environmental transforms and processing as real-world images. Our tools have been used to collect datasets to help evaluate the efficacy of adversarial examples, and can be found at https://github.com/carla-simulator/carla/pull/4992

arXiv.org e-Print Archive