1,128,771 research outputs found
Word graphs: The third set
This is the third paper in a series of natural language processing in term of knowledge graphs. A word is a basic unit in natural language processing. This is why we study word graphs. Word graphs were already built for prepositions and adwords (including adjectives, adverbs and Chinese quantity words) in two other papers. In this paper, we propose the concept of the logic word and classify logic words into groups in terms of semantics and the way they are used in describing reasoning processes. A start is made with the building of the lexicon of logic words in terms of knowledge graphs
How strongly do word reading times and lexical decision times correlate? Combining data from eye movement corpora and megastudies
We assess the amount of shared variance between three measures of visual word recognition latencies: eye movement latencies, lexical decision times and naming times. After partialling out the effects of word frequency and word length, two well-documented predictors of word recognition latencies, we see that 7-44% of the variance is uniquely shared between lexical decision times and naming times, depending on the frequency range of the words used. A similar analysis of eye movement latencies shows that the percentage of variance they uniquely share either with lexical decision times or with naming times is much lower. It is 5 – 17% for gaze durations and lexical decision times in studies with target words presented in neutral sentences, but drops to .2% for corpus studies in which eye movements to all words are analysed. Correlations between gaze durations and naming latencies are lower still. These findings suggest that processing times in isolated word processing and continuous text reading are affected by specific task demands and presentation format, and that lexical decision times and naming times are not very informative in predicting eye movement latencies in text reading once the effect of word frequency and word length are taken into account. The difference between controlled experiments and natural reading suggests that reading strategies and stimulus materials may determine the degree to which the immediacy-of-processing assumption and the eye-mind assumption apply. Fixation times are more likely to exclusively reflect the lexical processing of the currently fixated word in controlled studies with unpredictable target words rather than in natural reading of sentences or texts
Effects of word processing on text revision
Revising is an evaluating and editing process that is an essential part of text production. Is text revising facilitated by the use of word processors? After examining the related research, it is difficult to conclude with certainty that the use of word processors is always effective in improving writers' revising skills, or that their use necessarily leads to the production of higher quality texts. Their effectiveness depends on a large number of parameters (computer equipment, writing skills, task execution conditions) which psychologists are now starting to measure
Chart-driven Connectionist Categorial Parsing of Spoken Korean
While most of the speech and natural language systems which were developed
for English and other Indo-European languages neglect the morphological
processing and integrate speech and natural language at the word level, for the
agglutinative languages such as Korean and Japanese, the morphological
processing plays a major role in the language processing since these languages
have very complex morphological phenomena and relatively simple syntactic
functionality. Obviously degenerated morphological processing limits the usable
vocabulary size for the system and word-level dictionary results in exponential
explosion in the number of dictionary entries. For the agglutinative languages,
we need sub-word level integration which leaves rooms for general morphological
processing. In this paper, we developed a phoneme-level integration model of
speech and linguistic processings through general morphological analysis for
agglutinative languages and a efficient parsing scheme for that integration.
Korean is modeled lexically based on the categorial grammar formalism with
unordered argument and suppressed category extensions, and chart-driven
connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9
WordFence: Text Detection in Natural Images with Border Awareness
In recent years, text recognition has achieved remarkable success in
recognizing scanned document text. However, word recognition in natural images
is still an open problem, which generally requires time consuming
post-processing steps. We present a novel architecture for individual word
detection in scene images based on semantic segmentation. Our contributions are
twofold: the concept of WordFence, which detects border areas surrounding each
individual word and a novel pixelwise weighted softmax loss function which
penalizes background and emphasizes small text regions. WordFence ensures that
each word is detected individually, and the new loss function provides a strong
training signal to both text and word border localization. The proposed
technique avoids intensive post-processing, producing an end-to-end word
detection system. We achieve superior localization recall on common benchmark
datasets - 92% recall on ICDAR11 and ICDAR13 and 63% recall on SVT.
Furthermore, our end-to-end word recognition system achieves state-of-the-art
86% F-Score on ICDAR13.Comment: 5 pages, 2 figures, ICIP 201
An implementation of Apertium based Assamese morphological analyzer
Morphological Analysis is an important branch of linguistics for any Natural
Language Processing Technology. Morphology studies the word structure and
formation of word of a language. In current scenario of NLP research,
morphological analysis techniques have become more popular day by day. For
processing any language, morphology of the word should be first analyzed.
Assamese language contains very complex morphological structure. In our work we
have used Apertium based Finite-State-Transducers for developing morphological
analyzer for Assamese Language with some limited domain and we get 72.7%
accurac
Better Word Embeddings by Disentangling Contextual n-Gram Information
Pre-trained word vectors are ubiquitous in Natural Language Processing
applications. In this paper, we show how training word embeddings jointly with
bigram and even trigram embeddings, results in improved unigram embeddings. We
claim that training word embeddings along with higher n-gram embeddings helps
in the removal of the contextual information from the unigrams, resulting in
better stand-alone word embeddings. We empirically show the validity of our
hypothesis by outperforming other competing word representation models by a
significant margin on a wide variety of tasks. We make our models publicly
available.Comment: NAACL 201
- …
