Search CORE

381 research outputs found

A Guide to Copy Cataloging Arabic Materials

Author: Wilson Kristen E.
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2005
Field of study

For most catalogers, non-Roman script languages are more difficult to catalog than those in Roman scripts, and Arabic is particularly problematic. The cataloger must have a firm grasp of the language in order to correctly supply unwritten vowels and to use the standard Arabic-English dictionary which lists words by root rather than alphabetically. This manual presents the cataloger who does not have that language knowledge with strategies for effective copy cataloging searching. Topics include the development of Arabic cataloging automation, problems of name authority, distinguishing between Arabic and other languages written in the Arabic script, and using a non-alphabetic Arabic-to-English dictionary

Carolina Digital Repository

ArabTeX : a system for typesetting Arabic; user manual version 3.00

Author: Lagally Klaus
Publication venue
Publication date: 18/06/2013
Field of study

ArabTeX is a package extending the capabilities of TeX/LaTeX to generate the Arabic writing from an ASCII transliteration for texts in several languages using the Arabic script. It consists of a TeX macro package and an Arabic font in several sizes, presently only available in the Naskhi style. ArabTeX will run with Plain TeX and also with LaTeX. It is compatible with NFSS, NFSS2 and the EDMAC package; other additions to TeX have not been tried. ArabTeX is primarily intended for generating the Arabic writing, but the standard scientific transliteration can also be easily produced. For languages other than Arabic that are customarily written in the Arabic script some limited support is available. ArabTeX defines its own input notation which is both machine, and human, readable, and suited for electronic transmission and Email communication. However, texts in some of the Arabic standard encodings can also be processed. ArabTeX is copyrighted, but free use for scientific, experimental and other strictly private, noncommercial purposes is granted. Offprints of publications using ArabTeX are welcome. Using ArabTeX otherwise requires a license agreement. There is no warranty of any kind, either expressed or implied. The entire risk as to the quality and performance rests with the user

Urdu Handwritten Characters Data Visualization and Recognition Using Distributed Stochastic Neighborhood Embedding and Deep Network

Author: Ali Sikandar
Coustaty Mickäel
Husnain Mujtaba
Khan Dost muhammad
Khattak Hizbullah
Luqman Muhammad muzzamil
Mumtaz Shahzad
Ogier Jean-Marc
Saad missen Malik muhammad
Samad Ali
Publication venue
Publication date: 03/09/2021
Field of study

This study was supported by the China University of Petroleum-Beijing and Fundamental Research Funds for Central Universities under Grant no. 2462020YJRC001.Peer reviewedPublisher PD

Aberdeen University Research

A Simple Approach to Unify Ambiguously Encoded Kurdish Characters

Author: Jaf Sardar
Publication venue
Publication date: 09/09/2016
Field of study

In this study we outline a potential problem in the normalisation stage of processing texts that are based on a modified version of the Arabic alphabet. The main source of resources available for processing resource-scarce languages is raw text. We have identified an interesting challenge that must be addressed when normalising certain natural language texts. Many lessresourced languages, such as Kurdish, Farsi, Urdu, Pashtu, etc., use a modified version of the Arabic writing system. Many characters in harvested data from the Internet may have exactly the same form but encoded with different Unicode values (ambiguous characters). It is important to identify ambiguous characters during the normalisation stage of most text processing tasks. We will demonstrate cases related to ambiguous Kurdish and Farsi characters and propose a semi-automatic approach to identifying and unifying ambiguously encoded characters

Durham Research Online

A semi-automatic approach to identifying and unifying ambiguously encoded Arabic-based characters.

Author: Jaf Sardar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/03/2017
Field of study

In this study, we outline a potential problem in normalising texts that are based on a modified version of the Arabic alphabet. One of the main resources available for processing resource-scarce languages is raw text collected from the Internet. Many less-resourced languages, such as Kurdish, Farsi, Urdu, Pashtu, etc., use a modified version of the Arabic writing system. Many characters in harvested data from the Internet may have exactly the same form but encoded with different Unicode values (ambiguous characters). The existence of ambiguous characters in words leads to word duplication, thus it is important to identify and unify ambiguous characters during the normalisation stage. Here, we demonstrate cases related to ambiguous Kurdish and Farsi characters and propose a semi-automatic approach to identifying and unifying them

Durham Research Online

Crossref

A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters

Author: Dong Minghui
Jaf Sardar
Lee Lung-Hao
Li Haizhou
Lu Yanfeng
Tseng Yuen-Hsien
Wu Chung-Hsien
Yu Liang-Chih
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 13/03/2017
Field of study

Durham Research Online

Urdu Through Its Others: Ghazal, Canonization, and Translation.

Author: Grewal Sara
Publication venue
Publication date: 01/01/2016
Field of study

My dissertation, "Urdu Through Its Others: Ghazal, Canonization, and Translation" analyzes the codification of the Urdu literary tradition as it is both celebrated and reviled in a wide variety of popular and scholarly media. I focus specifically on the genre of the ghazal, which, as the most canonical of Urdu literary forms, holds a unique cultural cache throughout all of South Asia and the diaspora. The canonization of the ghazal reifies Urdu's linguistic boundaries through the project of literary histories and comparison with other proximate literary traditions like Hindi, Persian, and English. This reified notion of Urdu not only underwrites Anglicist colonial intervention in India by rhetorically painting Urdu as the backward foil to the English's modern progressivism, but also continues to shape the national Urdu imaginary in which the language is both vilified as dangerously communalist and idealized as redemptively secular. Although canonizing literary histories point to Rekhtah as the historical antecedent of the Urdu language, I show, via readings of the ghazals of Urdu's "founder" Valī Dakkanī (1667-1707), that Rekhtah in fact represents a unique poetic mode--an idiom of translation that forces us to reconsider boundaries between languages against the standardizing forces of canonization. The uneven ways in which the translative quality of Rekhtah get passed on to the Urdu tradition as it unfolds during the period of colonialism have shaped the ways in which Urdu is seen in the national imaginary as derivative, backward, and foreign. At the same time, popular narratives about ghazal work to naturalize the Urdu tradition in India, particularly through the nationalization of canonical poets Mirzā Ghālib (1797-1869) and Faiz Ahmed Faiz (1911-1984). This dissertation diverges from existing attempts to establish canonical literary histories, or reconstruct a moment prior to translation, which ultimately reinforce colonial notions of both history and translation; instead, I focus on the traces of past texts and events as they continue to operate within the present--what I am calling historicity--ultimately arguing that moments of translation themselves constitute the Urdu language and literary tradition.PHDComparative LiteratureUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135821/1/shakeem_1.pd

Deep Blue Documents at the University of Michigan

Generating an Arabic Calligraphy Text Blocks for Global Texture Analysis

Author: Bataineh Bilal
Omar Khairuddin
Sheikh Abdullah Siti Norul Huda
Publication venue: 'Insight Society'
Publication date: 01/04/2011
Field of study

This paper objective is to improve the current method for generating an Arabic Calligraphy text blocks. We test on seven types of Arabic Calligraphy text. We apply projection profiles and a proposed filter to discriminate each line of the Arabic Calligraphy scripts. After performing text detection, skew correction, text and line normalization subsequently, we generate Arabic Calligraphy text blocks for global texture analysis purposes. We compare our proposed filter with current method and median filter. The results show that the proposed filter is outperformed. The proposed method can be further improved to boost the overall performance

International Journal on Advanced Science, Engineering and Information Technology

Improving Search via Named Entity Recognition in Morphologically Rich Languages – A Case Study in Urdu

Author: Riaz Kashif
Publication venue
Publication date: 01/02/2018
Field of study

University of Minnesota Ph.D. dissertation. February 2018. Major: Computer Science. Advisors: Vipin Kumar, Blake Howald. 1 computer file (PDF); xi, 236 pages.Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem -- the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous -- a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages

University of Minnesota Digital Conservancy