18,526 research outputs found
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Overview of Translation Techniques in Cross-Language Question Answering during the last decade
The development of the Semantic Web requires great economic and human effort. Consequently, it is very useful to create mechanisms and tools that facilitate its expansion. From the standpoint of Information Retrieval, access to the contents of the Semantic Web can be favored by the use of natural language, as it is much simpler and faster for the user to engage in his habitual form of expression
Overview of Translation Techniques in Cross-Language Question Answering during the last decade
The development of the Semantic Web requires great economic and human effort. Consequently, it is very useful to create mechanisms and tools that facilitate its expansion. From the standpoint of Information Retrieval, access to the contents of the Semantic Web can be favored by the use of natural language, as it is much simpler and faster for the user to engage in his habitual form of expression
Special Libraries, January 1966
Volume 57, Issue 1https://scholarworks.sjsu.edu/sla_sl_1966/1000/thumbnail.jp
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models
Neural text ranking models have witnessed significant advancement and are
increasingly being deployed in practice. Unfortunately, they also inherit
adversarial vulnerabilities of general neural models, which have been detected
but remain underexplored by prior studies. Moreover, the inherit adversarial
vulnerabilities might be leveraged by blackhat SEO to defeat better-protected
search engines. In this study, we propose an imitation adversarial attack on
black-box neural passage ranking models. We first show that the target passage
ranking model can be transparentized and imitated by enumerating critical
queries/candidates and then train a ranking imitation model. Leveraging the
ranking imitation model, we can elaborately manipulate the ranking results and
transfer the manipulation attack to the target ranking model. For this purpose,
we propose an innovative gradient-based attack method, empowered by the
pairwise objective function, to generate adversarial triggers, which causes
premeditated disorderliness with very few tokens. To equip the trigger
camouflages, we add the next sentence prediction loss and the language model
fluency constraint to the objective function. Experimental results on passage
ranking demonstrate the effectiveness of the ranking imitation attack model and
adversarial triggers against various SOTA neural ranking models. Furthermore,
various mitigation analyses and human evaluation show the effectiveness of
camouflages when facing potential mitigation approaches. To motivate other
scholars to further investigate this novel and important problem, we make the
experiment data and code publicly available.Comment: 15 pages, 4 figures, accepted by ACM CCS 2022, Best Paper Nominatio
Special Libraries, September 1957
Volume 48, Issue 7https://scholarworks.sjsu.edu/sla_sl_1957/1006/thumbnail.jp
- …