88,794 research outputs found

    Digital libraries and minority languages

    Get PDF
    Digital libraries have a pivotal role to play in the preservation and maintenance of international cultures in general and minority languages in particular. This paper outlines a software tool for building digital libraries that is well adapted for creating and distributing local information collections in minority languages, and describes some contexts in which it is used. The system can make multilingual documents available in structured collections and allows them to be accessed via multilingual interfaces. It is issued under a free open-source licence, which encourages participatory design of the software, and an end-user interface allows community-based localization of the various language interfaces - of which there are many

    SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

    Full text link
    We present SpeakingFaces as a publicly-available large-scale dataset developed to support multimodal machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human-computer interaction (HCI), biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of well-aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (~3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer.Comment: 6 pages, 4 figures, 3 table

    Exploiting source similarity for SMT using context-informed features

    Get PDF
    In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We present a memory-based classification framework that enables the estimation of these features while avoiding sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features

    Feature-based and Model-based Semantics for English, French and German Verb Phrases

    Get PDF
    This paper considers the relative merits of using features and formal event models to characterise the semantics of English, French and German verb phrases, and con- siders the application of such semantics in machine translation. The feature-based ap- proach represents the semantics in terms of feature systems, which have been widely used in computational linguistics for representing complex syntactic structures. The paper shows how a simple intuitive semantics of verb phrases may be encoded as a feature system, and how this can be used to support modular construction of au- tomatic translation systems through feature look-up tables. This is illustrated by automated translation of English into either French or German. The paper contin- ues to formalise the feature-based approach via a model-based, Montague semantics, which extends previous work on the semantics of English verb phrases. In so doing, repercussions of and to this framework in conducting a contrastive semantic study are considered. The model-based approach also promises to provide support for a more sophisticated approach to translation through logical proof; the paper indicates further work required for the fulfilment of this promise

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Observing Users - Designing clarity a case study on the user-centred design of a cross-language information retrieval system

    Get PDF
    This paper presents a case study of the development of an interface to a novel and complex form of document retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. A study involving users (with such searching needs) from the start of the design process is described covering initial examination of user needs and tasks; preliminary design and testing of interface components; building, testing, and further refining an interface; before finally conducting usability tests of the system. Lessons are learned at every stage of the process leading to a much more informed view of how such an interface should be built

    An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries

    Get PDF
    Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by user’s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries. Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. User’s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of user’s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user’s search terms for objects within the foreground of an image. Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository. Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying user’s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p
    corecore