17 research outputs found
Recommended from our members
Codes of Modernity: Infrastructures of Language and Chinese Scripts in an Age of Global Information Revolution
This dissertation explores the global history of Chinese script reformsâthe effort to phoneticize Chinese language and/or simplify the writing systemâfrom its inception in the 1890s to its demise in the 1980s. These reforms took place at the intersection of industrialization, colonialism, and new information technologies, such as alphabet-based telegraphy and breakthroughs in printing technologies. As these social and technological transformations put unprecedented pressure on knowledge management and the use of mental and clerical labor, many Chinese intellectuals claimed that learning Chinese characters consumed too much time and mental energy. Chinese script reforms, this dissertation argues, were an effort to increase speed in producing, transmitting, and accessing information, and thus meet the demands of the industrializing knowledge economy.
The industrializing knowledge economy that this dissertation explores was built on and sustained by a psychological understanding of the human subject as a knowledge machine, and it was part of a global moment in which the optimization of labor in knowledge production was a key concern for all modernizing economies. While Chinese intellectuals were inventing new signs of inscription, American behavioral psychologists, Soviet psycho-economists, and Central Asian and Ottoman technicians were all experimenting with new scripts in order to increase mental efficiency and productivity. This dissertation reveals the intimate connections between the Chinese and non-Chinese script engineering projects that were taking place synchronically across the world. The chapters of this work demonstrate for the first time, for instance, that the simplification of Chinese characters in the 1920s and 1930s was intimately connected to the discipline of behavioral psychology in the US. The first generation of Chinese psychologists employed the American psychologistsâ methods to track eye movements, count word-frequencies, and statistically analyze the speed of reading, writing, and memorizing in order to simplify and ârationalizeâ the Chinese writing system in an effort to discipline and optimize mental labor. Other chapters explore the issue of mental and clerical optimization by finding the origins of the Chinese Latin Alphabet (CLA), the mother of pinyin, in hitherto unknown Eurasian connections. The CLA, the pages of this work shows, was the product of a transnational exchange that involved Ottoman and Transcaucasian typographers as well as Russian engineers and Chinese communists who sought efficiency in knowledge production through inventing new scripts. Situating the Chinese script reforms at this global intersection of psychology, economy, and linguistics, this dissertation examines the global connections and forces that turned the human subject into a knowledge worker who was cognitively managed through education, literacy, propaganda, and other measures of organizing information, all of which had the script at the center.
The search for efficiency and productivityâthe core values of industrialismâlay at the heart of script reforms in China, but this search was inseparable from linguistic orders and political ambitions. Even if writing, transmitting, and learning a phonetic script could theoretically be easier and more efficient than the Chinese characters, the alphabet opened a veritable Pandoraâs Box around the issue of selection: given the complex linguistic landscape in China, which speech was a phonetic script supposed to represent? There were myriad languages spoken throughout the empire and the subsequent nation-state, most of which were mutually incomprehensible. Mandarin as spoken in Beijing was different from that spoken in the south, and âtopolectsâ or regional languages such as Min or Cantonese were to Mandarin what Romanian is to English. As a linguistic life-or-death issue, phonetic scripts stood for the infrastructural possibilities and limitations in the representation of speeches. Some scripts, such as Lao Naixuanâs phonetic script composed of more than a hundred signs, were capable of representing multiple Mandarin and non-Mandarin speeches; whereas others, such as Phonetic Symbols that only has thirty-seven syllabic signs, represented only one speech, i.e., Mandarin. Using Mandarin-oriented scripts to transcribe non-Mandarin speeches was like writing English with fifteen letters, hence the acrimonious disputes that fill the pages of this dissertation. Succinctly put, it was at the level of script invention that Chinese and non-Chinese actors engineered different infrastructures not only for laboring minds but also for the social world of Chinese languages. The history of information technologies and knowledge economy in China was thus inseparable from the world of speech and language, as each script offered a new potential to reassemble the written matter and the speaking mind in a different way.
âCodes of Modernityâ thus conceptualizes the script itself as an infrastructural medium. A script was not merely a passive carrier of information, but an existential artifact. Building on an expanding literature on infrastructures, it endorses the observation that infrastructures, technologies, and the social world around them work in a recursive loop. An infrastructure is not just the physical object that permits the flow of information, goods, ideas, and people, but a sociotechnical product that enables the experience of culture, while imposing constrains on it at the same time. Like electricity grids, transportation systems, and sewage canals, the experience of scripts as infrastructures is the experience of thought worlds. After a long tradition of structuralism and poststructuralism that sought to understand the world through the semiotic prism of language, âCodes of Modernityâ argues that it is time for an infrastructuralism that excavates the indispensable media that enable the production of language and thought
Recommended from our members
Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing.
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms.
This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems.
Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques.
Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time.
Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images.
In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected.
The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase.
Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%.
Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved.
To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.King Fahd University of Petroleum and Minerals (KFUPM
Biometrics Writer Recognition for Arabic language: Analysis and Classification techniques using Subwords Features
Handwritten text in any language is believed to convey a great deal of information about writersâ personality and identity. Indeed, handwritten signature has long been accepted as an authentication of the writerâs physical stamp on financial and legal deals as well official/personal documents and works of art. Handwritten documents are frequently used as evidences in forensic tasks. Handwriting skills is learnt and developed from the early schooling stages. Research interest in behavioral biometrics was the main driving force behind the growth in research into Writer Identification (WI) from handwritten text, but recent rise in terrorism associated with extreme religious ideologies spreading primarily, but not exclusively, from the middle-east has led to a surge of interest in WI from handwritten text in Arabic and similar languages.
This thesis is the main outcome of extensive research investigations conducted with the aim of developing an automatic identification of a person from handwritten Arabic text samples. My motivations and interests, as an Iraqi researcher, emanate from my multi-faceted desires to provide scientific support for my people in their fight against terrorism by providing forensic evidences, and as contribute to the ongoing digitization of the Iraqi National archive as well as the wealth of religious and historical archives in Iraq and the middle-east. Good knowledge of the underlying language is invaluable in this project.
Despite the rising interest in this recognition modality worldwide, Arabic writer identification has not been addressed as extensively as Latin writer identification. However, in recent years some new Arabic writer identification approaches have been proposed some of which are reviewed in this thesis. Arabic is a cursive language when handwritten. This means that each and every writer in this language develops some unique features that could demonstrate writerâs habits and style. These habits and styles are considered as unique WI features and determining factors.
Existing dominating approaches to WI are based on recognizing handwriting habits/styles are embedded in certain parts/components of the written texts. Although the appearance of these components within long text contain rich information and clues to writer identity, the most common approaches to WI in Arabic in the literature are based on features extracted from paragraph(s), line(s), word(s), character(s), and/or a part of a character. Generally, Arabic words are made up of one or more subwords at the end of each; there is a connected stroke with a certain style of which seem to be most representative of writers habits. Another feature of Arabic writing is to do with diacritics that are added to written words/subwords, to add meaning and pronunciation. Subwords are more frequent in written Arabic text and appear as part of several different words or as full individual words. Thus, we propose a new innovative approach based on a seemingly plausible hypothesis that subwords based WI yields significant increase in accuracy over existing approaches. The thesis most significant contributions can be summarized as follows:
- Developed a high performing segmentation of scanned text images, that combines threshold based binarisation, morphological operation and active shape model.
- Defined digital measures and formed a 15-dimensional feature vectors representations of subwords that implicitly cover its diacritics and strokes. A pilot study that incrementally added features according to writer discriminating power. This reduced subwords feature vector dimension to 8, two of which were modelled as time series.
- For the dependent 8-dimensional WI scheme, we identify the best performing set of subwords (best 22 subwords out of 49 then followed by best 11 out of these 22 subwords).
- We established the validity of our hypothesis for different versions of subwords based WI schemes by providing empirical evidence when testing on a number of existing text dependent and in text-dependent databases plus a simulated text-in text-dependent DB. The text-dependent scenario results exhibited possible present of the Doddington Zoo phenomena.
- The final optimal subword based WI scheme, not only removes the need to include diacritics as part of the subword but also demonstrating that including diacritics within subwords impairs the WI discriminating power of subwords. This should not be taken to discredit research that are based on diacritics based WI. Also in this subword body (without diacritics) base WI scheme, resulted in eliminating the presence of Doddington Zoo effect.
- Finally, a significant but un-intended consequence of using subwords for WI is that there is no difference between a text-independent scenario and text-dependent one. In fact, we shall demonstrate that the text-dependent database of the 27-words can be used to simulate the testing of the scheme for an in text-dependent database without the need to record such a DB.
Finally, we discussed ways of optimising the performance of our last scheme by considering possible ways of complementing our scheme using the addition of various image texture analysis features to be extracted from subwords, lines, paragraphs or entire file of the scabbed image. These included LBP and Gabor Filter. We also suggested the possible addition of few more features
The Syntax of Colophons
The present volume focuses on the colophons found in several pothi manuscripts from Central, South and South East Asia. Its contributions discuss the colophonsâ defining features, thus exposing their âsyntaxâ, focusing particularly on the tracing of recurring patterns. The information extrapolated from colophons is further analysed to obtain a better understanding of these distinct manuscript cultures
The Nature of Writing â A Theory of Grapholinguistics [book cover]
Cover illustration: Purgatory: Canto VII â The Rule of the Mountain from A Typographic Dante (2008) by Barrie Tullett (also displayed in Barrie Tullett, Typewriter Art: A Modern Anthology, London: Laurence King Publishing, 2014, p. 167). With kind permission by Barrie Tullett. The text is taken from Dante. The Divine Comedy, translated by Dorothy L. Sayers, HarmondsworthÂMiddlesex: The Penguin Classics, 1949. On the lower part of the illustration, one can read the concluding
verses of the Canto:
But now the poet was going on before;
âForward!â said he; âlook how the sun doth stand
MeridianÂhigh, while on the Western shore
Night sets her foot upon Moroccoâs strand.
The Syntax of Colophons
The present volume focuses on the colophons found in several pothi manuscripts from Central, South and South East Asia. Its contributions discuss the colophonsâ defining features, thus exposing their âsyntaxâ, focusing particularly on the tracing of recurring patterns. The information extrapolated from colophons is further analysed to obtain a better understanding of these distinct manuscript cultures