Search CORE

1,128,771 research outputs found

Word graphs: The third set

Author: Hoede C.
Zhang Lei
Publication venue: Department of Applied Mathematics, University of Twente
Publication date: 01/01/2000
Field of study

This is the third paper in a series of natural language processing in term of knowledge graphs. A word is a basic unit in natural language processing. This is why we study word graphs. Word graphs were already built for prepositions and adwords (including adjectives, adverbs and Chinese quantity words) in two other papers. In this paper, we propose the concept of the logic word and classify logic words into groups in terms of semantics and the way they are used in describing reasoning processes. A start is made with the building of the lexicon of logic words in terms of knowledge graphs

University of Twente Research Information

How strongly do word reading times and lexical decision times correlate? Combining data from eye movement corpora and megastudies

Author: Baayen R.H.
Balota D.A.
Balota D.A.
Binder K.S.
Britt M.A.
Brysbaert M.
Brysbaert M.
Brysbaert M.
Coltheart M.
Dambacher M.
Drieghe D.
Ferrand L.
Forster K.
Gibbs P.
Inhoff A.W.
Just M.A.
Kennedy A.
Keuleers E.
Keuleers E.
Kinoshita S.
Kliegl R.
Kuperman V.
Kuperman V.
Murray W.S.
Murray W.S.
New B.
Radach R.
Rayner K.
Rayner K.
Rayner K.
Reichle E.D.
Schilling H.E. H.
Wotschak C.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

We assess the amount of shared variance between three measures of visual word recognition latencies: eye movement latencies, lexical decision times and naming times. After partialling out the effects of word frequency and word length, two well-documented predictors of word recognition latencies, we see that 7-44% of the variance is uniquely shared between lexical decision times and naming times, depending on the frequency range of the words used. A similar analysis of eye movement latencies shows that the percentage of variance they uniquely share either with lexical decision times or with naming times is much lower. It is 5 – 17% for gaze durations and lexical decision times in studies with target words presented in neutral sentences, but drops to .2% for corpus studies in which eye movements to all words are analysed. Correlations between gaze durations and naming latencies are lower still. These findings suggest that processing times in isolated word processing and continuous text reading are affected by specific task demands and presentation format, and that lexical decision times and naming times are not very informative in predicting eye movement latencies in text reading once the effect of word frequency and word length are taken into account. The difference between controlled experiments and natural reading suggests that reading strategies and stimulus materials may determine the degree to which the immediacy-of-processing assumption and the eye-mind assumption apply. Fixation times are more likely to exclusively reflect the lexical processing of the currently fixated word in controlled studies with unpredictable target words rather than in natural reading of sentences or texts

Southampton (e-Prints Soton)

Crossref

Ghent University Academic Bibliography

Effects of word processing on text revision

Author: Piolat A
Publication venue
Publication date: 01/01/1991
Field of study

Revising is an evaluating and editing process that is an essential part of text production. Is text revising facilitated by the use of word processors? After examining the related research, it is difficult to conclude with certainty that the use of word processors is always effective in improving writers' revising skills, or that their use necessarily leads to the production of higher quality texts. Their effectiveness depends on a large number of parameters (computer equipment, writing skills, task execution conditions) which psychologists are now starting to measure

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Chart-driven Connectionist Categorial Parsing of Spoken Korean

Author: Lee Geunbae
Lee Jong-Hyeok
Lee WonIl
Publication venue
Publication date: 29/11/1995
Field of study

While most of the speech and natural language systems which were developed for English and other Indo-European languages neglect the morphological processing and integrate speech and natural language at the word level, for the agglutinative languages such as Korean and Japanese, the morphological processing plays a major role in the language processing since these languages have very complex morphological phenomena and relatively simple syntactic functionality. Obviously degenerated morphological processing limits the usable vocabulary size for the system and word-level dictionary results in exponential explosion in the number of dictionary entries. For the agglutinative languages, we need sub-word level integration which leaves rooms for general morphological processing. In this paper, we developed a phoneme-level integration model of speech and linguistic processings through general morphological analysis for agglutinative languages and a efficient parsing scheme for that integration. Korean is modeled lexically based on the categorial grammar formalism with unordered argument and suppressed category extensions, and chart-driven connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9

arXiv.org e-Print Archive

포항공과대학교

WordFence: Text Detection in Natural Images with Border Awareness

Author: Ablavatski Artsiom
Cai Jianfei
Escalera Sergio
Lu Shijian
Polzounov Andrei
Publication venue
Publication date: 15/05/2017
Field of study

In recent years, text recognition has achieved remarkable success in recognizing scanned document text. However, word recognition in natural images is still an open problem, which generally requires time consuming post-processing steps. We present a novel architecture for individual word detection in scene images based on semantic segmentation. Our contributions are twofold: the concept of WordFence, which detects border areas surrounding each individual word and a novel pixelwise weighted softmax loss function which penalizes background and emphasizes small text regions. WordFence ensures that each word is detected individually, and the new loss function provides a strong training signal to both text and word border localization. The proposed technique avoids intensive post-processing, producing an end-to-end word detection system. We achieve superior localization recall on common benchmark datasets - 92% recall on ICDAR11 and ICDAR13 and 63% recall on SVT. Furthermore, our end-to-end word recognition system achieves state-of-the-art 86% F-Score on ICDAR13.Comment: 5 pages, 2 figures, ICIP 201

arXiv.org e-Print Archive

Crossref

An implementation of Apertium based Assamese morphological analyzer

Author: Rahman Mirzanur
Sarma Shikhar Kumar
Publication venue
Publication date: 28/02/2015
Field of study

Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accurac

arXiv.org e-Print Archive

Crossref

Better Word Embeddings by Disentangling Contextual n-Gram Information

Author: Gupta Prakhar
Jaggi Martin
Pagliardini Matteo
Publication venue
Publication date: 01/01/2019
Field of study

Pre-trained word vectors are ubiquitous in Natural Language Processing applications. In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings. We claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the unigrams, resulting in better stand-alone word embeddings. We empirically show the validity of our hypothesis by outperforming other competing word representation models by a significant margin on a wide variety of tasks. We make our models publicly available.Comment: NAACL 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref