6,102 research outputs found
A Comparative Study of the Effect of Word Segmentation On Chinese Terminology Extraction
PACLIC 20 / Wuhan, China / 1-3 November, 200
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions
Clinical Named Entity Recognition (CNER) aims to identify and classify
clinical terms such as diseases, symptoms, treatments, exams, and body parts in
electronic health records, which is a fundamental and crucial task for clinical
and translation research. In recent years, deep learning methods have achieved
significant success in CNER tasks. However, these methods depend greatly on
Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations
that are propagated through time, thus causing too much time to train models.
In this paper, we propose a Residual Dilated Convolutional Neural Network with
Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese
characters and dictionary features are first projected into dense vector
representations, then they are fed into the residual dilated convolutional
neural network to capture contextual features. Finally, a conditional random
field is employed to capture dependencies between neighboring tags.
Computational results on the CCKS-2017 Task 2 benchmark dataset show that our
proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based
methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE
International Conference on Bioinformatics and Biomedicine. arXiv admin note:
text overlap with arXiv:1804.0501
What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?
Purpose:
The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.
Design/methodology/approach:
A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel NaĂŻve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.
Findings:
The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.
Research limitations/implications:
The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.
Originality/value:
Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective
A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings
The recognition of abstracts is crucial for effectively locating the content
and clarifying the article. Existing move recognition algorithms lack the
ability to learn word position information to obtain contextual semantics. This
paper proposes a novel enhanced move recognition algorithm with an improved
pre-trained model and a gated network with attention mechanism for unstructured
abstracts of Chinese scientific and technological papers. The proposed
algorithm first performs summary data segmentation and vocabulary training. The
EP-ERNIEAT-GRU framework is leveraged to incorporate word positional
information, facilitating deep semantic learning and targeted feature
extraction. Experimental results demonstrate that the proposed algorithm
achieves 13.37 higher accuracy on the split dataset than on the original
dataset and a 7.55 improvement in accuracy over the basic comparison model
Structural Stability of Lexical Semantic Spaces: Nouns in Chinese and French
Many studies in the neurosciences have dealt with the semantic processing of
words or categories, but few have looked into the semantic organization of the
lexicon thought as a system. The present study was designed to try to move
towards this goal, using both electrophysiological and corpus-based data, and
to compare two languages from different families: French and Mandarin Chinese.
We conducted an EEG-based semantic-decision experiment using 240 words from
eight categories (clothing, parts of a house, tools, vehicles,
fruits/vegetables, animals, body parts, and people) as the material. A
data-analysis method (correspondence analysis) commonly used in computational
linguistics was applied to the electrophysiological signals.
The present cross-language comparison indicated stability for the following
aspects of the languages' lexical semantic organizations: (1) the
living/nonliving distinction, which showed up as a main factor for both
languages; (2) greater dispersion of the living categories as compared to the
nonliving ones; (3) prototypicality of the \emph{animals} category within the
living categories, and with respect to the living/nonliving distinction; and
(4) the existence of a person-centered reference gradient. Our
electrophysiological analysis indicated stability of the networks at play in
each of these processes. Stability was also observed in the data taken from
word usage in the languages (synonyms and associated words obtained from
textual corpora).Comment: 17 pages, 4 figure
- …