260 research outputs found

    Transitive probabilistic CLIR models.

    Get PDF
    Transitive translation could be a useful technique to enlarge the number of supported language pairs for a cross-language information retrieval (CLIR) system in a cost-effective manner. The paper describes several setups for transitive translation based on probabilistic translation models. The transitive CLIR models were evaluated on the CLEF test collection and yielded a retrieval effectiveness\ud up to 83% of monolingual performance, which is significantly better than a baseline using the synonym operator

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Development of Multilingual Resource Management Mechanisms for Libraries

    Get PDF
    Multilingual is one of the important concept in any library. This study is create on the basis of global recommendations and local requirement for each and every libraries. Select the multilingual components for setting up the multilingual cluster in different libraries to each user. Development of multilingual environment for accessing and retrieving the library resources among the users as well as library professionals. Now, the methodology of integration of Google Indic Transliteration for libraries have follow the five steps such as (i) selection of transliteration tools for libraries (ii) comparison of tools for libraries (iii) integration Methods in Koha for libraries (iv) Development of Google indic transliteration in Koha for users (v) testing for libraries (vi) results for libraries. Development of multilingual framework for libraries is also an important task in integrated library system and in this section have follow the some important steps such as (i) Bengali Language Installation in Koha for libraries (ii) Settings Multilingual System Preferences in Koha for libraries (iii) Translate the Modules for libraries (iv) Bengali Interface in Koha for libraries. Apart from these it has also shows the Bengali data entry process in Koha for libraries such as Data Entry through Ibus Avro Phonetics for libraries and Data Entry through Virtual Keyboard for libraries. Development of Multilingual Digital Resource Management for libraries by using the DSpace and Greenstone. Management of multilingual for libraries in different areas such as federated searching (VuFind Multilingual Discovery tool ; Multilingual Retrieval in OAI-PMH tool ; Multilingual Data Import through Z39.50 Server ). Multilingual bibliographic data edit through MarcEditor for the better management of integrated library management system. It has also create and editing the content by using the content management system tool for efficient and effective retrieval of multilingual digital content resources among the users

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Adaptation of machine translation for multilingual information retrieval in the medical domain

    Get PDF
    Objective. We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve eectiveness of cross-lingual IR. Methods and Data. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English. MT quality is evaluated on data sets created within the Khresmoi project and IR eectiveness is tested on the CLEF eHealth 2013 data sets. Results. The search query translation results achieved in our experiments are outstanding – our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech–English, from 23.03 to 40.82 for German–English, and from 32.67 to 40.82 for French–English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French–English. For Czech–English and German–English, the increased MT quality does not lead to better IR results. Conclusions. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance – better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions

    A study of the use of natural language processing for conversational agents

    Get PDF
    Language is a mark of humanity and conscience, with the conversation (or dialogue) as one of the most fundamental manners of communication that we learn as children. Therefore one way to make a computer more attractive for interaction with users is through the use of natural language. Among the systems with some degree of language capabilities developed, the Eliza chatterbot is probably the first with a focus on dialogue. In order to make the interaction more interesting and useful to the user there are other approaches besides chatterbots, like conversational agents. These agents generally have, to some degree, properties like: a body (with cognitive states, including beliefs, desires and intentions or objectives); an interactive incorporation in the real or virtual world (including perception of events, communication, ability to manipulate the world and communicate with others); and behavior similar to a human (including affective abilities). This type of agents has been called by several terms, including animated agents or embedded conversational agents (ECA). A dialogue system has six basic components. (1) The speech recognition component is responsible for translating the user’s speech into text. (2) The Natural Language Understanding component produces a semantic representation suitable for dialogues, usually using grammars and ontologies. (3) The Task Manager chooses the concepts to be expressed to the user. (4) The Natural Language Generation component defines how to express these concepts in words. (5) The dialog manager controls the structure of the dialogue. (6) The synthesizer is responsible for translating the agents answer into speech. However, there is no consensus about the necessary resources for developing conversational agents and the difficulties involved (especially in resource-poor languages). This work focuses on the influence of natural language components (dialogue understander and manager) and analyses, in particular the use of parsing systems as part of developing conversational agents with more flexible language capabilities. This work analyses what kind of parsing resources contributes to conversational agents and discusses how to develop them targeting Portuguese, which is a resource-poor language. To do so we analyze approaches to the understanding of natural language, and identify parsing approaches that offer good performance, based on which we develop a prototype to evaluate the impact of using a parser in a conversational agent.linguagem é uma marca da humanidade e da consciência, sendo a conversação (ou diálogo) uma das maneiras de comunicacão mais fundamentais que aprendemos quando crianças. Por isso uma forma de fazer um computador mais atrativo para interação com usuários é usando linguagem natural. Dos sistemas com algum grau de capacidade de linguagem desenvolvidos, o chatterbot Eliza é, provavelmente, o primeiro sistema com foco em diálogo. Com o objetivo de tornar a interação mais interessante e útil para o usuário há outras aplicações alem de chatterbots, como agentes conversacionais. Estes agentes geralmente possuem, em algum grau, propriedades como: corpo (com estados cognitivos, incluindo crenças, desejos e intenções ou objetivos); incorporação interativa no mundo real ou virtual (incluindo percepções de eventos, comunicação, habilidade de manipular o mundo e comunicar com outros agentes); e comportamento similar ao humano (incluindo habilidades afetivas). Este tipo de agente tem sido chamado de diversos nomes como agentes animados ou agentes conversacionais incorporados. Um sistema de diálogo possui seis componentes básicos. (1) O componente de reconhecimento de fala que é responsável por traduzir a fala do usuário em texto. (2) O componente de entendimento de linguagem natural que produz uma representação semântica adequada para diálogos, normalmente utilizando gramáticas e ontologias. (3) O gerenciador de tarefa que escolhe os conceitos a serem expressos ao usuário. (4) O componente de geração de linguagem natural que define como expressar estes conceitos em palavras. (5) O gerenciador de diálogo controla a estrutura do diálogo. (6) O sintetizador de voz é responsável por traduzir a resposta do agente em fala. No entanto, não há consenso sobre os recursos necessários para desenvolver agentes conversacionais e a dificuldade envolvida nisso (especialmente em línguas com poucos recursos disponíveis). Este trabalho foca na influência dos componentes de linguagem natural (entendimento e gerência de diálogo) e analisa em especial o uso de sistemas de análise sintática (parser) como parte do desenvolvimento de agentes conversacionais com habilidades de linguagem mais flexível. Este trabalho analisa quais os recursos do analisador sintático contribuem para agentes conversacionais e aborda como os desenvolver, tendo como língua alvo o português (uma língua com poucos recursos disponíveis). Para isto, analisamos as abordagens de entendimento de linguagem natural e identificamos as abordagens de análise sintática que oferecem um bom desempenho. Baseados nesta análise, desenvolvemos um protótipo para avaliar o impacto do uso de analisador sintático em um agente conversacional

    Machine Translation of Medical Texts in the Khresmoi Project

    Get PDF
    The WMT 2014 Medical Translation Task poses an interesting challenge for Machine Translation (MT). In the standard translation task, the end application is the translation itself. In this task, the MT system is considered a part of a larger system for cross-lingual information retrieval (IR)

    The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

    Get PDF
    Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009
    corecore