2,854 research outputs found

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include

    Uncovering the myth of learning to read Chinese characters: phonetic, semantic, and orthographic strategies used by Chinese as foreign language learners

    Get PDF
    Oral Session - 6A: Lexical modeling: no. 6A.3Chinese is considered to be one of the most challenging orthographies to be learned by non-native speakers, in particular, the character. Chinese character is the basic reading unit that converges sound, form and meaning. The predominant type of Chinese character is semantic-phonetic compound that is composed of phonetic and semantic radicals, giving the clues of the sound and meaning, respectively. Over the last two decades, psycholinguistic research has made significant progress in specifying the roles of phonetic and semantic radicals in character processing among native Chinese speakers …postprin

    (Dis)connections between specific language impairment and dyslexia in Chinese

    Get PDF
    Poster Session: no. 26P.40Specific language impairment (SLI) and dyslexia describe language-learning impairments that occur in the absence of a sensory, cognitive, or psychosocial impairment. SLI is primarily defined by an impairment in oral language, and dyslexia by a deficit in the reading of written words. SLI and dyslexia co-occur in school-age children learning English, with rates ranging from 17% to 75%. For children learning Chinese, SLI and dyslexia also co-occur. Wong et al. (2010) first reported on the presence of dyslexia in a clinical sample of 6- to 11-year-old school-age children with SLI. The study compared the reading-related cognitive skills of children with SLI and dyslexia (SLI-D) with 2 groups of children …postprin

    Investigating oculomotor control during the learning and scanning of character strings

    Get PDF
    Word spacing plays an important role in both word identification and saccadic targeting in the reading of spaced languages (e.g., English), however, the spacing facilitation is not present when word spacing is added in normally unspaced Chinese text in Chinese native speakers (e.g., Grade-3 children, young adults, old adults). Frequency effects are well-documented in the reading of normal text. However, it remains controversial as to whether frequency effects would occur in non-reading tasks, such as searching for a target in normal text or text-like strings. Furthermore, it is unclear whether spacing would also play an important role in the guidance of eye movement control in text-like string scanning as it does in the reading of spaced languages. In three experiments, the present thesis examined how exposure frequency effects are established during the learning of novel stimuli in a learning session (Landolt-C clusters in Experiment 1 vs. pseudowords in Experiments 2 & 3) and how the simulated exposure frequency would affect the scanning of longer strings with or without boundary demarcation cues (spaced vs. unspaced shaded vs. unspaced) in a scanning session. Importantly, the present thesis investigated whether learning and scanning of novel character strings would be qualified by the stimulus type (Landolt-C vs. English pseudoword) and the population (English native speakers vs. Chinese participants). In Experiment 1, robust interactive effects between exposure frequency and learning blocks (e.g., learning rate effects) occurred during the learning of target stimuli. However, the exposure frequency effects did not carry over to the scanning session. Robust spacing effects occurred. Spacing facilitated eye movements to a greater degree than the shading manipulation. In Experiments 2 & 3, again, robust learning rate effects occurred in learning target pseudowords. The exposure frequency was simulated successfully and effectively during learning, however, the exposure frequency showed no influence on eye movements in the scanning session. The meta-analysis across the three experiments demonstrated that learning was more effective using pseudoword stimuli relative to Landolt- C stimuli, and more effective in Chinese participants than English participants. Generally, the degree of shading facilitation was much smaller in the scanning of Landolt-C strings compared to pseudoword strings and it was smaller for English participants relative to Chinese participants. The constant occurrence of learning rate effects across experiments suggests the replicability and reliability of the current character learning paradigm. Spacing facilitation constantly occurred in scanning either Landolt-C strings or pseudoword strings, indicating that spacing plays an important role in non-reading string scanning tasks. The absence of exposure frequency effects in the scanning session across three experiments seems to suggest that exposure frequency effects might not occur in string scanning when the task is to search for a pre-learnt target in the string. The differential pattern of shading and spacing facilitation between Chinese participants and English participants suggests an influence from the writing system of the native language on eye movements in the current string scanning

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average

    Chinese character processing for computerized bibliographic information exchange : summary report of an international workshop held in Hong Kong, 17-20 Dec. 1984

    Get PDF
    Meeting: Workshop on Chinese Character Processing for Computerized Bibliographic Applications, 17-20 Dec. 1984, H

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Minorities and the construction of a nation in post-socialist Laos

    Get PDF
    In the Introduction [Chapter 1] I first introduce the concept of 'nation' by stressing its 'fuzziness', and by reviewing Western and non-western interpretations of its definition. I then briefly review some pertinent events in Laos' recent history. I next explain the reasons for my choice of a certain terminology. In a third section, I introduce and justify my methodology.In Chapter Two, I introduce and discuss the theoretical framework and studies on Lao nationalism. I first look at the theories of nationalism put forward by Gellner, Anderson and Smith, three of the most influential thinkers on the subject, and note the limits of their theories with respect to my study. I then extend my discussion to theories of nationalism and ethnicity, and I argue that these propose a framework that is too constrained to explain the complexity of my research. I therefore suggest some other conceptual notions that may encompass the multiple outcomes of my study. Finally, I discuss studies that have dealt with the concepts of nation, nationalism and ethnicity in modern Laos, and show how my work may contribute to the fostering of research in this field. In Chapter Three, I review the historical relationships between the non-ethnic Lao people and the political authorities from the pre-modern period up to the proclamation of the Lao PDR in 1975. I focus in particular on three historical periods: pre-modern Laos (until the French colonisation), French rule (1893-1954) and the French and American Wars (1945-1974). Each period corresponds with a specific pattern of relationships between the non-ethnic Lao people and the political authority. Above all, I insist that the French and American Wars changed the role of the non-ethnic Lao populations socially, politically and historically. From the periphery where they were symbolically and administratively confined, the participation of some of their members in the wars exposed these individuals to socialisation and politicisation processes. From that point onwards, the nationalist discourse would have to include multi-ethnicity in its rhetoric. In Chapter Four, I analyse ethnic classifications in contemporary Laos, with a brief review of previous policies. I first look at the ideologies that have influenced the Lao ethnic classification, namely, those of the former Soviet Union, China and Vietnam. Through an analysis of the construction of the latest official census (August 2000), I suggest a close relationship between ethnic categorisation and the nationalist discourse. I conclude with a study of Kaysone Phomvihane's guidelines on the concept of the nation in Laos. In Chapter Five, I question the Majority's ethnicity. I first argue that the constitution of a national identity in post-socialist Laos is being conducted through a dual process of exclusion and inclusion, involving a politics of Minority/Majority representation and a dichotomy between Tradition and Modernity. I extend my discussion to the nationalist discourse's search for particularism, through a politics of cultural discipline and a new approach to the narrative of the national history. At the same time, I suggest that the new form of nation, more centred on a spiritual principle, i.e. Buddhism, also originates in popular will, namely, the ethnic Lao population's. In Chapter Six, I reverse the perspective and disclose the voices of those being represented. I focus my analysis on a few members of ethnic minorities who hold, or have held, a position of authority. More precisely, I analyse their interpretations of the past through their narratives. I point out their pattern, logic and coherence, but also their discontinuities, omissions and exaggerations. All these characteristics are constitutive of these individuals' identity. Experience, however, is never monolithic. Experience structures narratives, which, in turn, structure experience, while all interpretations and expressions are historically, politically and institutionally situated. I therefore show that narratives also can change under new historical and political conditions. In Chapter Seven, I reflect on the issues of ethnicity and identity. I first study the ambiguities of the ethnicities of the individuals discussed in Chapter Six, caught in between the official categorisation, the Majority's ethnicity and their own perception of their ethnic identity. I then analyse what I call the crisis of identity induced by social, economic, political and institutional changes during the post-socialist era. The social and political identity of these educated members of ethnic minority groups is being challenged. Finally, I conclude with a specific case of instrumentalist ethnicity, which might prefigure the awakening of new identities in post-socialist Laos

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead
    • …
    corecore