3 research outputs found

    Intercomprehension in Retrieval: User Perspectives on Six Related Scarce Resource Languages

    Get PDF
    The majority of web content is published in languages not accessible to many potential users who may only be able to read and understand their local languages. Prior research has focused on using translation to provide users with information written in other languages, yet there are still many languages with little or no such resources. In this paper, we propose the use of intercomprehension - a form of communication in which speakers of two different languages communicate using their own languages, mainly due to similarities between the languages. Accordingly, we conducted a user study to explore user interaction behaviour in a retrieval environment where intercomprehension is expected; to investigate the usefulness of search results, which assumes intelligibility and relevance; and investigate affective episodes associated with intercomprehension in retrieval through retrospection. Although intercomprehension may come with a cost to understand unfa- miliar languages, user preference of ranking of results in related languages incorporates intelligibility, which assumes intercomprehension. Our findings also suggest that intercomprehension is useful in retrieval for related languages - users are able to identify relevant documents as well as complete search tasks by applying intercomprehension. However, the negative emotions or frustration associated with intercomprehension suggest that this type of interaction should be used in extreme cases where there are no relevant or few documents available associated with the query

    Ranking by Language Similarity for Resource Scarce Southern Bantu Languages

    Get PDF
    Resource Scarce Languages (RSLs) lack sufficient resources to use Cross-Lingual Information Retrieval (CLIR) techniques and tools such as machine translation. Consequentially, searching using RSLs is frustrating and usually ends in unsuccessful struggling search. In such search tasks, search engines return low-quality results; relevant documents are either limited and lowly ranked or non-existent. Previous work has shown that alternative relevant results written in similar languages, including dialects, neighbouring and genetically related languages, can assist multilingual RSLs speakers to complete their search tasks. To improve the quality of search results in this context, we propose the re-ranking of documents based on the similarity between the language of the document and the language of the query. Accordingly, we created a dataset of four Southern Bantu languages that includes documents, topics, topical relevance and intelligibility features, and document utility annotations. To understand the intelligibility dimension of the studied languages, we conducted online intelligibility test experiments and used the data for feature selection and intelligibility prediction. We performed re-ranking of search results using offline evaluation, exploring Learning To Rank (LTR). Our results show that integrating topical relevance and intelligibility in ranking slightly improves retrieval effectiveness. Further, results on intelligibility prediction show that classification of intelligibility is feasible at a fair accuracy

    Search between Chinese and Japanese text collections

    No full text
    For NTCIR Workshop 6 UC Berkeley participated in Phase 1 of the bilingual task of the CLIR track. Our focus was upon Japanese topic search against the Chinese News Document Collection and upon Chinese topic searches retrieving from Japanese News document collection. We performed search experiments to segment and use Chinese search topics directly as if they were Japanese topics and vice versa. We also utilized Machine Translation (MT) software between Japanese and Chinese, with English as a pivot language. While Chinese search without translation against Japanese documents performed credibly well for title only runs, the reverse (Japanese topic search of Chinese documents without translation) was poor. We are investigating the reasons
    corecore