1,464 research outputs found
Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation
International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods
Uncovering the myth of learning to read Chinese characters: phonetic, semantic, and orthographic strategies used by Chinese as foreign language learners
Oral Session - 6A: Lexical modeling: no. 6A.3Chinese is considered to be one of the most challenging orthographies to be learned by non-native speakers, in particular, the character. Chinese character is the basic reading unit that converges sound, form and meaning. The predominant type of Chinese character is semantic-phonetic compound that is composed of phonetic and semantic radicals, giving the clues of the sound and meaning, respectively. Over the last two decades, psycholinguistic research has made significant progress in specifying the roles of phonetic and semantic radicals in character processing among native Chinese speakers …postprin
(Dis)connections between specific language impairment and dyslexia in Chinese
Poster Session: no. 26P.40Specific language impairment (SLI) and dyslexia describe language-learning impairments that occur in the absence of a sensory, cognitive, or psychosocial impairment. SLI is primarily defined by an impairment in oral language, and dyslexia by a deficit in the reading of written words. SLI and dyslexia co-occur in school-age children learning English, with rates ranging from 17% to 75%. For children learning Chinese, SLI and dyslexia also co-occur. Wong et al. (2010) first reported on the presence of dyslexia in a clinical sample of 6- to 11-year-old school-age children with SLI. The study compared the reading-related cognitive skills of children with SLI and dyslexia (SLI-D) with 2 groups of children …postprin
Recognition of Japanese handwritten characters with Machine learning techniques
The recognition of Japanese handwritten characters has always been a challenge for researchers. A large number of classes, their graphic complexity, and the existence of three different writing systems make this problem particularly difficult compared to Western writing. For decades, attempts have been made to address the problem using traditional OCR (Optical Character Recognition) techniques, with mixed results. With the recent popularization of machine learning techniques through neural networks, this research has been revitalized, bringing new approaches to the problem. These new results achieve performance levels comparable to human recognition. Furthermore, these new techniques have allowed collaboration with very different disciplines, such as the Humanities or East Asian studies, achieving advances in them that would not have been possible without this interdisciplinary work. In this thesis, these techniques are explored until reaching a sufficient level of understanding that allows us to carry out our own experiments, training neural network models with public datasets of Japanese characters. However, the scarcity of public datasets makes the task of researchers remarkably difficult. Our proposal to minimize this problem is the development of a web application that allows researchers to easily collect samples of Japanese characters through the collaboration of any user. Once the application is fully operational, the examples collected until that point will be used to create a new dataset in a specific format. Finally, we can use the new data to carry out comparative experiments with the previous neural network models
Detecting Multilingual Lines of Text with Fusion Moves
This thesis proposes an optimization-based algorithm for detecting lines of text in images taken by hand-held cameras. The majority of existing methods for this problem assume alphabet-based texts (e.g. in Latin or Greek) and they use heuristics specific to such texts: proximity between letters within one line, larger distance between separate lines, etc. We are interested in a more challenging problem where images combine alphabet and logographic characters from multiple languages where typographic rules vary a lot (e.g. English, Korean, and Chinese). Significantly higher complexity of fitting multiple lines of text in different languages calls for an energy-based formulation combining a data fidelity term and a regularization prior. Our data cost combines geometric errors and likelihoods given by a classifier trained to low-level features in each language. Our regularization term encourages sparsity based on label costs. Our energy can be efficiently minimized by fusion moves. The algorithm was evaluated on a database of images from the subway of metropolitan area of Seoul and was proven to be robust
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Script Effects as the Hidden Drive of the Mind, Cognition, and Culture
This open access volume reveals the hidden power of the script we read in and how it shapes and drives our minds, ways of thinking, and cultures. Expanding on the Linguistic Relativity Hypothesis (i.e., the idea that language affects the way we think), this volume proposes the “Script Relativity Hypothesis” (i.e., the idea that the script in which we read affects the way we think) by offering a unique perspective on the effect of script (alphabets, morphosyllabaries, or multi-scripts) on our attention, perception, and problem-solving. Once we become literate, fundamental changes occur in our brain circuitry to accommodate the new demand for resources. The powerful effects of literacy have been demonstrated by research on literate versus illiterate individuals, as well as cross-scriptal transfer, indicating that literate brain networks function differently, depending on the script being read. This book identifies the locus of differences between the Chinese, Japanese, and Koreans, and between the East and the West, as the neural underpinnings of literacy. To support the “Script Relativity Hypothesis”, it reviews a vast corpus of empirical studies, including anthropological accounts of human civilization, social psychology, cognitive psychology, neuropsychology, applied linguistics, second language studies, and cross-cultural communication. It also discusses the impact of reading from screens in the digital age, as well as the impact of bi-script or multi-script use, which is a growing trend around the globe. As a result, our minds, ways of thinking, and cultures are now growing closer together, not farther apart. ; Examines the origin, emergence, and co-evolution of written language, the human mind, and culture within the purview of script effects Investigates how the scripts we read over time shape our cognition, mind, and thought patterns Provides a new outlook on the four representative writing systems of the world Discusses the consequences of literacy for the functioning of the min
Character Recognition
Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field
Designing and implementing interactive and realistic augmented reality experiences
In this paper, we propose an approach for supporting the design and implementation of interactive and realistic Augmented Reality (AR). Despite the advances in AR technology, most software applications still fail to support AR experiences where virtual objects appear as merged into the real setting. To alleviate this situation, we propose to combine the use of model-based AR techniques with the advantages of current game engines to develop AR scenes in which the virtual objects collide, are occluded, project shadows and, in general, are integrated into the augmented environment more realistically. To evaluate the feasibility of the proposed approach, we extended an existing game platform named GREP to enhance it with AR capacities. The realism of the AR experiences produced with the software was assessed in an event in which more than 100 people played two AR games simultaneously.This work is supported by the project CREAx and PACE funded by the Spanish Ministry of Economy, Industry and Competitiveness (TIN2014-56534-R and TIN2016-77690-R)
- …