64,925 research outputs found

    A Shared Task on Bandit Learning for Machine Translation

    Full text link
    We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On each of a sequence of rounds, a machine translation system is required to propose a translation for an input, and receives a real-valued estimate of the quality of the proposed translation for learning. This paper describes the shared task's learning and evaluation setup, using services hosted on Amazon Web Services (AWS), the data and evaluation metrics, and the results of various machine translation architectures and learning protocols.Comment: Conference on Machine Translation (WMT) 201

    The step project:societal and political engagement of young people in environmental issues

    Get PDF
    Decisions on environmental topics taken today are going to have long-term consequences that will affect future generations. Young people will have to live with the consequences of these decisions and undertake special responsibilities. Moreover, as tomorrowā€™s decision makers, they themselves should learn how to negotiate and debate issues before final decisions are made. Therefore, any participation they can have in environmental decision making processes will prove essential in developing a sustainable future for the community.However, recent data indicate that the young distance themselves from community affairs, mainly because the procedures involved are ā€˜woodenā€™, politiciansā€™ discourse alienates the young and the whole experience is too formalized to them. Authorities are aware of this fact and try to establish communication channels to ensure transparency and use a language that speaks to new generations of citizens. This is where STEP project comes in.STEP (www.step4youth.eu) is a digital Platform (web/mobile) enabling youth Societal and Political e-Participation in decision-making procedures concerning environmental issues. STEP is enhanced with web/social media mining, gamification, machine translation, and visualisation features.Six pilots in real contexts are being organised for the deployment of the STEP solution in 4 European Countries: Italy, Spain, Greece, and Turkey. Pilots are implemented with the direct participation of one regional authority, four municipalities, and one association of municipalities, and include decision-making procedures on significant environmental questions.</p

    Model-Based Mitigation of Availability Risks

    Get PDF
    The assessment and mitigation of risks related to the availability of the IT infrastructure is becoming increasingly important in modern organizations. Unfortunately, present standards for Risk Assessment and Mitigation show limitations when evaluating and mitigating availability risks. This is due to the fact that they do not fully consider the dependencies between the constituents of an IT infrastructure that are paramount in large enterprises. These dependencies make the technical problem of assessing availability issues very challenging. In this paper we define a method and a tool for carrying out a Risk Mitigation activity which allows to assess the global impact of a set of risks and to choose the best set of countermeasures to cope with them. To this end, the presence of a tool is necessary due to the high complexity of the assessment problem. Our approach can be integrated in present Risk Management methodologies (e.g. COBIT) to provide a more precise Risk Mitigation activity. We substantiate the viability of this approach by showing that most of the input required by the tool is available as part of a standard business continuity plan, and/or by performing a common tool-assisted Risk Management

    Fighting Authorship Linkability with Crowdsourcing

    Full text link
    Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy. In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on the results

    An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries

    Get PDF
    Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by userā€™s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries. Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. Userā€™s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of userā€™s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of userā€™s search terms for objects within the foreground of an image. Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository. Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying userā€™s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p

    Automatic Quality Estimation for ASR System Combination

    Get PDF
    Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%
    • ā€¦
    corecore