3,525 research outputs found

    Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

    Full text link
    Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model's predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.Comment: ACL 201

    Tracking Data Acquisition System (TDAS) for the 1990's. Volume 6: TDAS navigation system architecture

    Get PDF
    One-way range and Doppler methods for providing user orbit and time determination are examined. Forward link beacon tracking, with on-board processing of independent navigation signals broadcast continuously by TDAS spacecraft; forward link scheduled tracking; with on-board processing of navigation data received during scheduled TDAS forward link service intervals; and return link scheduled tracking; with ground-based processing of user generated navigation data during scheduled TDAS return link service intervals are discussed. A system level definition and requirements assessment for each alternative, an evaluation of potential navigation performance and comparison with TDAS mission model requirements is included. TDAS satellite tracking is also addressed for two alternatives: BRTS and VLBI tracking

    Neuroconductor: an R platform for medical imaging analysis

    Get PDF
    Neuroconductor (https://neuroconductor.org) is an open-source platform for rapid testing and dissemination of reproducible computational imaging software. The goals of the project are to: (i) provide a centralized repository of R software dedicated to image analysis, (ii) disseminate software updates quickly, (iii) train a large, diverse community of scientists using detailed tutorials and short courses, (iv) increase software quality via automatic and manual quality controls, and (v) promote reproducibility of image data analysis. Based on the programming language R (https://www.r-project.org/), Neuroconductor starts with 51 inter-operable packages that cover multiple areas of imaging including visualization, data processing and storage, and statistical inference. Neuroconductor accepts new R package submissions, which are subject to a formal review and continuous automated testing. We provide a description of the purpose of Neuroconductor and the user and developer experience

    Manipulating Attributes of Natural Scenes via Hallucination

    Full text link
    In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes, the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result. As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly utilized in most of the appearance or style transfer approaches. Moreover, it allows to simultaneously manipulate a given scene according to a diverse set of transient attributes within a single model, eliminating the need of training multiple networks per each translation task. Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic

    Learning visual contexts for image annotation from Flickr groups

    Get PDF

    Learning visual contexts for image annotation from Flickr groups

    Get PDF

    GCKP84-general chemical kinetics code for gas-phase flow and batch processes including heat transfer effects

    Get PDF
    A general chemical kinetics code is described for complex, homogeneous ideal gas reactions in any chemical system. The main features of the GCKP84 code are flexibility, convenience, and speed of computation for many different reaction conditions. The code, which replaces the GCKP code published previously, solves numerically the differential equations for complex reaction in a batch system or one dimensional inviscid flow. It also solves numerically the nonlinear algebraic equations describing the well stirred reactor. A new state of the art numerical integration method is used for greatly increased speed in handling systems of stiff differential equations. The theory and the computer program, including details of input preparation and a guide to using the code are given

    Towards Engineering Reliable Keystroke Biometrics Systems

    Get PDF
    In this thesis, we argue that most of the work in the literature on behavioural-based biometric systems using AI and machine learning is immature and unreliable. Our analysis and experimental results show that designing reliable behavioural-based biometric systems requires a systematic and complicated process. We first discuss the limitation in existing work and the use of conventional machine learning methods. We use the biometric zoos theory to demonstrate the challenge of designing reliable behavioural-based biometric systems. Then, we outline the common problems in engineering reliable biometric systems. In particular, we focus on the need for novelty detection machine learning models and adaptive machine learning algorithms. We provide a systematic approach to design and build reliable behavioural-based biometric systems. In our study, we apply the proposed approach to keystroke dynamics. Keystroke dynamics is behavioural-based biometric that identify individuals by measuring their unique typing behaviours on physical or soft keyboards. Our study shows that it is possible to design reliable behavioral-based biometrics and address the gaps in the literature

    AntispamLab - A Tool for Realistic Evaluation of Email Spam Filters

    Get PDF
    The existing tools for testing spam filters evaluate a filter instance by simply feeding it with a stream of emails, possibly also providing a feedback to the filter about the correctness of the detection. In such a scenario the evaluated filter is disconnected from the network of email servers, filters, and users, which makes the approach inappropriate for testing many of the filters that exploit some of the information about spam bulkiness, users' actions and social relations among the users. Corresponding evaluation results might be wrong, because the information that is normally used by the filter is missing, incomplete or inappropriate. In this paper we present a tool for testing spam filters in a very realistic scenario. Our tool consists of a set of Python scripts for unix/linux environment. The tool takes as inputs the filter to be tested and an affordable set of interconnected machines (e.g., PlanetLab machines, or locally created virtual machines). When started from a central place, the tool uses the provided machines to build a network of real email servers, installs instances of the filter, deploys and runs simulated email users and spammers, and computes the detection results statistic. Email servers are implemented using Postfix, a standard linux email server. Only per-email-server filters are currently supported, whereas per-email-client filters testing would require additional tool development. The size of the created emailing network is constrained only by the number of available PlanetLab or virtual machines. The run time is much shorter then the simulated system time, due to a time scaling mechanism. Testing a new filter is as simple as installing one copy of it in a real emailing network, which unifies the jobs of a new filter development, testing and prototyping. As a usage example, we test the SpamAssassin filter

    NASA Lewis Research Center Futuring Workshop

    Get PDF
    On October 21 and 22, 1986, the Futures Group ran a two-day Futuring Workshop on the premises of NASA Lewis Research Center. The workshop had four main goals: to acquaint participants with the general history of technology forecasting; to familiarize participants with the range of forecasting methodologies; to acquaint participants with the range of applicability, strengths, and limitations of each method; and to offer participants some hands-on experience by working through both judgmental and quantitative case studies. Among the topics addressed during this workshop were: information sources; judgmental techniques; quantitative techniques; merger of judgment with quantitative measurement; data collection methods; and dealing with uncertainty
    corecore