440,309 research outputs found

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    Consecutive retrieval with redundancy: an optimal linear and an optimal cyclic arrangement and their storage space requirements

    Get PDF
    Information retrieval, file organization, consecutive retrieval property, consecutive retrieval with redundancy, storage space requirements 1

    A magnetic stimulation examination of orthographic neighborhood effects in visual word recognition

    Get PDF
    The split-fovea theory proposes that visual word recognition is mediated by the splitting of the foveal image, with letters to the left of fixation projected to the right hemisphere (RH) and letters to the right of fixation projected to the left hemisphere (LH). We applied repetitive transcranial magnetic stimulation (rTMS) over the left and right occipital cortex during a lexical decision task to investigate the extent to which word recognition processes could be accounted for according to the split-fovea theory. Unilateral rTMS significantly impaired lexical decision latencies to centrally presented words, supporting the suggestion that foveal representation of words is split between the cerebral hemispheres rather than bilateral. Behaviorally, we showed that words that have many orthographic neighbors sharing the same initial letters ("lead neighbors") facilitated lexical decision more than words with few lead neighbors. This effect did not apply to end neighbors (orthographic neighbors sharing the same final letters). Crucially, rTMS over the RH impaired lead-, but not end-neighborhood facilitation. The results support the split-fovea theory, where the RH has primacy in representing lead neighbors of a written word

    Does Phenomenal Consciousness Overflow Attention? An Argument from Feature-Integration

    Get PDF
    In the past two decades a number of arguments have been given in favor of the possibility of phenomenal consciousness without attentional access, otherwise known as phenomenal overflow. This paper will show that the empirical data commonly cited in support of this thesis is, at best, ambiguous between two equally plausible interpretations, one of which does not posit phenomenology beyond attention. Next, after citing evidence for the feature-integration theory of attention, this paper will give an account of the relationship between consciousness and attention that accounts for both the empirical data and our phenomenological intuitions without positing phenomenal consciousness beyond attention. Having undercut the motivations for accepting phenomenal overflow along with having given reasons to think that phenomenal overflow does not occur, I end with the tentative conclusion that attention is a necessary condition for phenomenal consciousness

    The SLS-Berlin: Validation of a German Computer-Based Screening Test to Measure Reading Proficiency in Early and Late Adulthood

    Get PDF
    Reading proficiency, i.e., successfully integrating early word-based information and utilizing this information in later processes of sentence and text comprehension, and its assessment is subject to extensive research. However, screening tests for German adults across the life span are basically non-existent. Therefore, the present article introduces a standardized computerized sentence-based screening measure for German adult readers to assess reading proficiency including norm data from 2,148 participants covering an age range from 16 to 88 years. The test was developed in accordance with the children’s version of the Salzburger LeseScreening (SLS, Wimmer and Mayringer, 2014). The SLS-Berlin has a high reliability and can easily be implemented in any research setting using German language. We present a detailed description of the test and report the distribution of SLS-Berlin scores for the norm sample as well as for two subsamples of younger (below 60 years) and older adults (60 and older). For all three samples, we conducted regression analyses to investigate the relationship between sentence characteristics and SLS-Berlin scores. In a second validation study, SLS-Berlin scores were compared with two (pseudo)word reading tests, a test measuring attention and processing speed and eye-movements recorded during expository text reading. Our results confirm the SLS-Berlin’s sensitivity to capture early word decoding and later text related comprehension processes. The test distinguished very well between skilled and less skilled readers and also within less skilled readers and is therefore a powerful and efficient screening test for German adults to assess interindividual levels of reading proficiency

    What May Visualization Processes Optimize?

    Full text link
    In this paper, we present an abstract model of visualization and inference processes and describe an information-theoretic measure for optimizing such processes. In order to obtain such an abstraction, we first examined six classes of workflows in data analysis and visualization, and identified four levels of typical visualization components, namely disseminative, observational, analytical and model-developmental visualization. We noticed a common phenomenon at different levels of visualization, that is, the transformation of data spaces (referred to as alphabets) usually corresponds to the reduction of maximal entropy along a workflow. Based on this observation, we establish an information-theoretic measure of cost-benefit ratio that may be used as a cost function for optimizing a data visualization process. To demonstrate the validity of this measure, we examined a number of successful visualization processes in the literature, and showed that the information-theoretic measure can mathematically explain the advantages of such processes over possible alternatives.Comment: 10 page

    Individual differences in adult handwritten spelling-to-dictation

    Get PDF
    We report an investigation of individual differences in handwriting latencies and number of errors in a spelling-to-dictation task. Eighty adult participants wrote a list of 164 spoken words (presented in two sessions). The participants were also evaluated on a vocabulary test (Deltour, 1993). Various multiple regression analyses were performed (on both writing latency and errors). The analysis of the item means showed that the reliable predictors of spelling latencies were acoustic duration, cumulative word frequency, phonology-to-orthographic (PO) consistency, the number of letters in the word and the interaction between cumulative word frequency, PO consistency and imageability. (Error rates were also predicted by frequency, consistency, length and the interaction between cumulative word frequency, PO consistency and imageability.) The analysis of the participant means (and trials) showed that (1) there was both within- and between-session reliability across the sets of items, (2) there was no trade-off between the utilization of lexical and non-lexical information, and (3) participants with high vocabulary knowledge were more accurate (and somewhat faster), and had a differential sensitivity to certain stimulus characteristics, than those with low vocabulary knowledge. We discuss the implications of these findings for theories of orthographic word production

    Stochastic accumulation of feature information in perception and memory

    Get PDF
    It is now well established that the time course of perceptual processing influences the first second or so of performance in a wide variety of cognitive tasks. Over the last20 years, there has been a shift from modeling the speed at which a display is processed, to modeling the speed at which different features of the display are perceived and formalizing how this perceptual information is used in decision making. The first of these models(Lamberts, 1995) was implemented to fit the time course of performance in a speeded perceptual categorization task and assumed a simple stochastic accumulation of feature information. Subsequently, similar approaches have been used to model performance in a range of cognitive tasks including identification, absolute identification, perceptual matching, recognition, visual search, and word processing, again assuming a simple stochastic accumulation of feature information from both the stimulus and representations held in memory. These models are typically fit to data from signal-to-respond experiments whereby the effects of stimulus exposure duration on performance are examined, but response times (RTs) and RT distributions have also been modeled. In this article, we review this approach and explore the insights it has provided about the interplay between perceptual processing, memory retrieval, and decision making in a variety of tasks. In so doing, we highlight how such approaches can continue to usefully contribute to our understanding of cognition
    corecore