Search CORE

15,036 research outputs found

MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality. In: Advances in Information Retrieval

Author: Ienco Dino
Roche Mathieu
Romeo Salvatore
Rosso Paolo
Tagarelli Andrea
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-30671-1 83The increasing availability of text information coded in many different languages poses new challenges to modern information retrieval and mining systems in order to discover and exchange knowledge at a larger world-wide scale. The 1st International Workshop on Modeling, Learning and Mining for Cross/Multilinguality (dubbed MultiLingMine 2016) provides a venue to discuss research advances in cross-/multilingual related topics, focusing on new multidisciplinary research questions that have not been deeply investigated so far (e.g., in CLEF and related events relevant to CLIR). This includes theoretical and experimental on-going works about novel representation models, learning algorithms, and knowledge-based methodologies for emerging trends and applications, such as, e.g., cross-view cross-/multilingual information retrieval and document mining, (knowledge-based) translation-independent cross-/multilingual corpora, applications in social network contexts, and more.Ienco, D.; Roche, M.; Romeo, S.; Rosso, P.; Tagarelli, A. (2016). MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality. In: Advances in Information Retrieval. En Advances in Information Retrieval. Springer Verlag (Germany). 869-873. doi:10.1007/978-3-319-30671-1_83S869873Bandyopadhyay, S., Poibeau, T., Saggion, H., Yangarber, R.: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization (MMIES). ACL (2008)Chiarcos, C., McCrae J.P., Montiel, E., Simov, K., Branco, A., Calzolari, N., Osenova, P., Slavcheva, M., Vertan, C.: Proceedings of the 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and NLP (LDL) (2014)McCrae, J.P., Vulcu, G.: CEUR Proceedings of the 4th Workshop on the Multilingual Semantic Web (MSW4), vol. 1532 (2015)Moens, M.-F., Vulié, I.: Multilingual probabilistic topic modeling and its applications in web mining and search. In: Proceedings of the 7th ACM WSDM Conference (2014)Steichen, B., Ferro, N., Lewis, D., Chi, E.E.: Proceedings of the International Workshop on Multilingual Web Access (MWA) (2015)The CLEF Initiative. http://www.clef-initiative.eu

RiuNet

Agritrop

Introduction to the special issue on cross-language algorithms and applications

Author: Bangalore Srinivas
Lambert Patrik
Montiel-Ponsoda Elena
Màrquez Lluís
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Multilingual interactive experiments with Flickr

Author: Clough Paul
Gonzalo Julio
Karlgren Jussi
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents a proposal for iCLEF 2006, the interactive track of the CLEF cross-language evaluation campaign. In the past, iCLEF has addressed applications such as information retrieval and question answering. However, for 2006 the focus has turned to text-based image retrieval from Flickr. We describe Flickr, the challenges this kind of collection presents to cross-language researchers, and suggest initial iCLEF tasks

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

Author: Jaggi Martin
Josifoski Martin
Paskov Hristo S.
Paskov Ivan S.
West Robert
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/02/2019
Field of study

There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries

Author: Clough Paul
Petrelli Daniela
Publication venue: 'Emerald'
Publication date: 01/01/2012
Field of study

Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by user’s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries. Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. User’s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of user’s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user’s search terms for objects within the foreground of an image. Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository. Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying user’s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p

Sheffield Hallam University Research Archive

Cross-language Text Classification with Convolutional Neural Networks From Scratch

Author: Enweiji M. Z. (Musbah)
Glybovets А. (Аndrey)
Lehinevych T. (Taras)
Publication venue: Scientific Route OÜ
Publication date: 01/01/2017
Field of study

Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach

Neliti

EUREKA: Physics and Engineering

DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval

Author: Goeuriot Lorraine
Jones Gareth J.F.
Kelly Liadh
Leveling Johannes
Publication venue
Publication date: 09/11/2012
Field of study

This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us

Irish Universities

DCU Online Research Access Service

Lessons learned in multilingual grounded language learning

Author: Alishahi Afra
Chrupała Grzegorz
Côté Marc-Alexandre
Elliott Desmond
Kádár Ákos
Publication venue
Publication date: 01/01/2018
Field of study

Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate in detail which conditions affect the performance of this type of grounded language learning model. We show that multilingual training improves over bilingual training, and that low-resource languages benefit from training with higher-resource languages. We demonstrate that a multilingual model can be trained equally well on either translations or comparable sentence pairs, and that annotating the same set of images in multiple language enables further improvements via an additional caption-caption ranking objective.Comment: CoNLL 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Tilburg University Repository