Search CORE

7,579 research outputs found

Searching and organizing images across languages

Author: Clough P.
Sanderson M.
Shou X.M.
Publication venue
Publication date: 01/01/2005
Field of study

With the continual growth of users on the Web from a wide range of countries, supporting such users in their search of cultural heritage collections will grow in importance. In the next few years, the growth areas of Internet users will come from the Indian sub-continent and China. Consequently, if holders of cultural heritage collections wish their content to be viewable by the full range of users coming to the Internet, the range of languages that they need to support will have to grow. This paper will present recent work conducted at the University of Sheffield (and now being implemented in BRICKS) on how to use automatic translation to provide search and organisation facilities for a historical image search engine. The system allows users to search for images in seven different languages, providing means for the user to examine translated image captions and browse retrieved images organised by categories written in their native language

White Rose Research Online

Computerization of African languages-French dictionaries

Author: Enguehard Chantal
Mangeot Mathieu
Publication venue
Publication date: 22/05/2014
Field of study

This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

HAL Université de Savoie

Corpus planning for Irish – dictionaries and terminology

Author: Nic Pháidín Caoilfhionn
Publication venue: Cois Life
Publication date: 01/01/2008
Field of study

A description of the evolution and current situation of corpus planning for Irish, which includes dictionaries, terminology and corpora

Irish Universities

DCU Online Research Access Service

Applying digital content management to support localisation

Author: Jones Gareth J.F.
Lawless Séamus
O'Connor Alexander
Wade Vincent
Zhou Dong
Publication venue: Localisation Research Centre
Publication date: 01/10/2009
Field of study

The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM

Irish Universities

DCU Online Research Access Service

Faclair na Gàidhlig and Corpas na Gàidhlig: New Approaches Make Sense

Author: O Maolalaigh Roibeard
Pike Lorna
Publication venue
Publication date: 01/01/2013
Field of study

For minority languages in the twenty-first century increasingly overshadowed by their global counterparts, language maintenance and revitalisation are of paramount importance. Closely linked to these issues is the question of corpus planning. This essay will focus on two projects in Scottish Gaelic which will play a major part in preserving and maintaining the language by providing it with high quality lexicographical and research resources: Faclair na Gàidhlig and Corpas na Gàidhlig respectively ; the essay concludes with a brief case study on Gaelic numerals which illustrates how Corpas na Gàidhlig can powerfully enhance our understanding of Gaelic

Enlighten

Domain-speciﬁc query translation for multilingual access to digital libraries

Author: Fantino Fabio
Fuller Marguerite
Jones Gareth J.F.
Newman Eamonn
Zhang Ying
Publication venue
Publication date: 15/06/2009
Field of study

Accurate high-coverage translation is a vital component of reliable cross language information access (CLIR) systems. This is particularly true of access to archives such as Digital Libraries which are often speciﬁc to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in information retrieval evaluation workshops, it is not well suited to specialized tasks where domain speciﬁc translations are required. We demonstrate that effective query translation in the domain of cultural heritage (CH) can be achieved by augmenting a standard MT system with domain-speciﬁc phrase dictionaries automatically mined from the online Wikipedia. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain speciﬁc phrase detection and translation

Irish Universities

DCU Online Research Access Service

General guidelines for designing bilingual low cost digital library services suitable for special library users in developing countries and the Arabic speaking world

Author: Elaiess Ramadan
Publication venue
Publication date: 01/10/2009
Field of study

The World is witnessing a considerable transformation from print based-formats to elec-tronic-based formats thanks to advanced computing technology, which has a profound impact on the dissemination of nearly all previous formats of publications into digital formats on computer networks. Text, still and moving images, sound tracks, music, and almost all known formats can be stored and retrieved on computer magnetic disk. Over the last two decades, a number of special libraries and information centres in the Arab world have introduced electronic resources into their library services. Very few have implemented automated and integrated systems. Despite the im-portance of designing digital libraries not merely for accessing to or retrieval of information but rather for the provision of electronic services, hardly any special library has started the design of digital library services. Managers of special libraries and information centres in developing countries in general and in the Arab world in particular should start building their local digital libraries, as the benefit of establishing such electronic services is considerably massive and well known for expansion of re-search activities and for delivering services that satisfy the needs of targeted end-users. The aim of this paper is to provide general guideline for design of special low cost digital library providing ser-vices that are most frequently required by various categories of special library users in developing countries. This paper also aims at illustrating strategies and method approaches that can be adopted for building such projects. Seeing the importance of designing an inexpensive digital li-brary as basic principle for the design accordingly, the utilisation of today's ICTs and freely avail-able open sources software is the right path for accomplishing such goal. The paper intends to de-scribe the phases and stages required for building such projects from scratch. It also aims at high-lighting the barriers and obstacles facing Arabic content and how could such problems overcome

University of Strathclyde Institutional Repository

Cross-lingual Distillation for Text Classification

Author: Xu Ruochen
Yang Yiming
Publication venue
Publication date: 01/01/2017
Field of study

Cross-lingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. This paper presents a novel approach to CLTC that builds on model distillation, which adapts and extends a framework originally proposed for model compression. Using soft probabilistic predictions for the documents in a label-rich language as the (induced) supervisory labels in a parallel corpus of documents, we train classifiers successfully for new languages in which labeled training data are not available. An adversarial feature adaptation technique is also applied during the model training to reduce distribution mismatch. We conducted experiments on two benchmark CLTC datasets, treating English as the source language and German, French, Japan and Chinese as the unlabeled target languages. The proposed approach had the advantageous or comparable performance of the other state-of-art methods.Comment: Accepted at ACL 2017; Code available at https://github.com/xrc10/cross-distil

arXiv.org e-Print Archive

Crossref