4,220 research outputs found

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Parametric classification in domains of characters, numerals, punctuation, typefaces and image qualities

    Get PDF
    This thesis contributes to the Optical Font Recognition problem (OFR), by developing a classifier system to differentiate ten typefaces using a single English character ‘e’. First, features which need to be used in the classifier system are carefully selected after a thorough typographical study of global font features and previous related experiments. These features have been modeled by multivariate normal laws in order to use parameter estimation in learning. Then, the classifier system is built up on six independent schemes, each performing typeface classification using a different method. The results have shown a remarkable performance in the field of font recognition. Finally, the classifiers have been implemented on Lowercase characters, Uppercase characters, Digits, Punctuation and also on Degraded Images

    Hemispheric lateralisation in the recognition of Chinese characters

    Get PDF

    The Digital Archive of Buddhist Temple Gazetteers and Named Entity Recognition (NER) in classical Chinese

    Full text link

    Investigating Transposed Letter Effects in Korean using Masked Priming

    Get PDF
    Typographical errors caused by transposing two letters in a word (e.g., jugde for JUDGE) are often readily misperceived as the words themselves. This phenomenon, known as the transposed letter (TL) effect, has been used widely in studying letter position coding in reading. Previous research by Lee and Taft (2009) found no TL effects in Korean, a nonlinear script, causing Lee and Taft to argue that the processing of letter position information varies as a function of the orthographic structure of a language. In particular, Lee and Taft suggested that, given the orthographic structure of Korean syllables, TL nonwords should not activate their base words and, therefore, no TL effects should exist in Korean. The purpose of the present research was to evaluate this claim using the masked priming, lexical decision task (LDT), a more conventional method for evaluating automatic processing than the simple, unprimed LDT used by Lee and Taft. TL primes were generated by transposing letters between syllables. Mirroring the manipulations used by Lee and Taft, there were three types of TL primes: onset1-onset2 transpositions, coda1-coda2 transpositions, and coda1-onset2 transpositions. Replacement primes created by replacing the transposed letters in TL primes with two other letters were used as control primes for each condition. As Lee and Taft predicted, no facilitation effects emerged, however, there were significant inhibition effects following TL primes, effects that Lee and Taft’s analysis cannot explain

    Managing writing systems using orthography profiles

    Get PDF
    This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    The Unicode cookbook for linguists: Managing writing systems using orthography profiles

    Get PDF
    This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research. This book is a prime example of open publishing as envisioned by Language Science Press. It is open access, has accompanying open source software, has open peer review, versioning and so on. Read more in this blog post. The book is continuously being improved. You can follow the development on https://github.com/unicode-cookbook/cookbook/releases/latest   &nbsp
    • …
    corecore