101 research outputs found

    A novel image matching approach for word spotting

    Get PDF
    Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, image retrieval and information filtering. The important factors in word spotting techniques are pre-processing, selection and extraction of proper features and image matching algorithms. The Correlation Similarity Measure (CORR) algorithm is considered to be a faster matching algorithm, originally defined for finding similarities between binary patterns. In the word spotting literature the CORR algorithm has been used successfully to compare the GSC binary features extracted from binary word images, i.e., Gradient, Structural and Concavity (GSC) features. However, the problem with this approach is that binarization of images leads to a loss of very useful information. Furthermore, before extracting GSC binary features the word images must be skew corrected and slant normalized, which is not only difficult but in some cases impossible in Arabic and modified Arabic scripts. We present a new approach in which the Correlation Similarity Measure (CORR) algorithm has been used innovatively to compare Gray-scale word images. In this approach, binarization of images, skew correction and slant normalization of word images are not required at all. The various features, i.e., projection profiles, word profiles and transitional features are extracted from the Gray-scale word images and converted into their binary equivalents, which are compared via CORR algorithm with greater speed and higher accuracy. The experiments have been conducted on Gray-scale versions of newly created handwritten databases of Pashto and Dari languages, written in modified Arabic scripts. For each of these languages we have used 4599 words relating to 21 different word classes collected from 219 writers. The average precision rates achieved for Pashto and Dari languages were 93.18 % and 93.75 %, respectively. The time taken for matching a pair of images was 1.43 milli-seconds. In addition, we will present the handwritten databases for two well-known Indo- Iranian languages, i.e., Pashto and Dari languages. These are large databases which contain six types of data, i.e., Dates, Isolated Digits, Numeral Strings, Isolated Characters, Different Words and Special Symbols, written by native speakers of the corresponding languages

    Proceedings of the 1st Conference on Central Asian Languages and Linguistics (ConCALL)

    Get PDF
    The Conference on Central Asian Languages and Linguistics (ConCALL) was founded in 2014 at Indiana University by Dr. Öner Özçelik, the residing director of the Center for Languages of the Central Asian Region (CeLCAR). As the nation’s sole U.S. Department of Education funded Language Resource Center focusing on the languages of the Central Asian Region, CeLCAR’s main mission is to strengthen and improve the nation’s capacity for teaching and learning Central Asian languages through teacher training, research, materials development projects, and dissemination. As part of this mission, CeLCAR has an ultimate goal to unify and fortify the Central Asian language learning community by facilitating networking between linguists and language educators, encouraging research projects that will inform language instruction, and provide opportunities for professionals in the field to both showcase their work and receive feedback from their peers. Thus ConCALL was established to be the first international academic conference to bring together linguists and language educators in the languages of the Central Asian region, including both the Altaic and Eastern Indo-European languages spoken in the region, to focus on research into how these specific languages are represented formally, as well as acquired by second/foreign language learners, and also to present research driven teaching methods. Languages served by ConCALL include, but are not limited to: Azerbaijani, Dari, Karakalpak, Kazakh, Kyrgyz, Lokaabharan, Mari, Mongolian, Pamiri, Pashto, Persian, Russian, Shughnani, Tajiki, Tibetan, Tofalar, Tungusic, Turkish, Tuvan, Uyghur, Uzbek, Wakhi and more!The Conference on Central Asian Languages and Linguistics held at Indiana University on 16-17 May 1014 was made possible through the generosity of our sponsors: Center for Languages of the Central Asian Region (CeLCAR), Ostrom Grant Programs, IU's College of Arts and Humanities Center (CAHI), Inner Asian and Uralic National Resource Center (IAUNRC), IU's School of Global and International Studies (SGIS), IU's College of Arts and Sciences, Sinor Research Institute for Inner Asian Studies (SRIFIAS), IU's Department of Central Eurasian Studies (CEUS), and IU's Department of Linguistics

    Improving Retrieval Accuracy in Main Content Extraction from HTML Web Documents

    Get PDF
    The rapid growth of text based information on the World Wide Web and various applications making use of this data motivates the need for efficient and effective methods to identify and separate the “main content” from the additional content items, such as navigation menus, advertisements, design elements or legal disclaimers. Firstly, in this thesis, we study, develop, and evaluate R2L, DANA, DANAg, and AdDANAg, a family of novel algorithms for extracting the main content of web documents. The main concept behind R2L, which also provided the initial idea and motivation for the other three algorithms, is to use well particularities of Right-to-Left languages for obtaining the main content of web pages. As the English character set and the Right-to-Left character set are encoded in different intervals of the Unicode character set, we can efficiently distinguish the Right-to-Left characters from the English ones in an HTML file. This enables the R2L approach to recognize areas of the HTML file with a high density of Right-to-Left characters and a low density of characters from the English character set. Having recognized these areas, R2L can successfully separate only the Right-to-Left characters. The first extension of the R2L, DANA, improves effectiveness of the baseline algorithm by employing an HTML parser in a post processing phase of R2L for extracting the main content from areas with a high density of Right-to-Left characters. DANAg is the second extension of the R2L and generalizes the idea of R2L to render it language independent. AdDANAg, the third extension of R2L, integrates a new preprocessing step to normalize the hyperlink tags. The presented approaches are analyzed under the aspects of efficiency and effectiveness. We compare them to several established main content extraction algorithms and show that we extend the state-of-the-art in terms of both, efficiency and effectiveness. Secondly, automatically extracting the headline of web articles has many applications. We develop and evaluate a content-based and language-independent approach, TitleFinder, for unsupervised extraction of the headline of web articles. The proposed method achieves high performance in terms of effectiveness and efficiency and outperforms approaches operating on structural and visual features.Das rasante Wachstum von textbasierten Informationen im World Wide Web und die Vielfalt der Anwendungen, die diese Daten nutzen, macht es notwendig, effiziente und effektive Methoden zu entwickeln, die den Hauptinhalt identifizieren und von den zusätzlichen Inhaltsobjekten wie z.B. Navigations-Menüs, Anzeigen, Design-Elementen oder Haftungsausschlüssen trennen. Zunächst untersuchen, entwickeln und evaluieren wir in dieser Arbeit R2L, DANA, DANAg und AdDANAg, eine Familie von neuartigen Algorithmen zum Extrahieren des Inhalts von Web-Dokumenten. Das grundlegende Konzept hinter R2L, das auch zur Entwicklung der drei weiteren Algorithmen führte, nutzt die Besonderheiten der Rechts-nach-links-Sprachen aus, um den Hauptinhalt von Webseiten zu extrahieren. Da der lateinische Zeichensatz und die Rechts-nach-links-Zeichensätze durch verschiedene Abschnitte des Unicode-Zeichensatzes kodiert werden, lassen sich die Rechts-nach-links-Zeichen leicht von den lateinischen Zeichen in einer HTML-Datei unterscheiden. Das erlaubt dem R2L-Ansatz, Bereiche mit einer hohen Dichte von Rechts-nach-links-Zeichen und wenigen lateinischen Zeichen aus einer HTML-Datei zu erkennen. Aus diesen Bereichen kann dann R2L die Rechts-nach-links-Zeichen extrahieren. Die erste Erweiterung, DANA, verbessert die Wirksamkeit des Baseline-Algorithmus durch die Verwendung eines HTML-Parsers in der Nachbearbeitungsphase des R2L-Algorithmus, um den Inhalt aus Bereichen mit einer hohen Dichte von Rechts-nach-links-Zeichen zu extrahieren. DANAg erweitert den Ansatz des R2L-Algorithmus, so dass eine Sprachunabhängigkeit erreicht wird. Die dritte Erweiterung, AdDANAg, integriert eine neue Vorverarbeitungsschritte, um u.a. die Weblinks zu normalisieren. Die vorgestellten Ansätze werden in Bezug auf Effizienz und Effektivität analysiert. Im Vergleich mit mehreren etablierten Hauptinhalt-Extraktions-Algorithmen zeigen wir, dass sie in diesen Punkten überlegen sind. Darüber hinaus findet die Extraktion der Überschriften aus Web-Artikeln vielfältige Anwendungen. Hierzu entwickeln wir mit TitleFinder einen sich nur auf den Textinhalt beziehenden und sprachabhängigen Ansatz. Das vorgestellte Verfahren ist in Bezug auf Effektivität und Effizienz besser als bekannte Ansätze, die auf strukturellen und visuellen Eigenschaften der HTML-Datei beruhen

    \u3ci\u3eArthur Paul Afghanistan Collection Bibliography - Volume II: English and European Languages \u3c/i\u3e

    Get PDF
    In December 1995, the first volume of this bibliography was published. Volume I included all the Pashto and Dari language titles that were in the Arthur Paul Afghanistan Collection at that time. Volume II includes English and European language materials. This volume contains titles that were added to the Collection prior to January 1998. The Arthur Paul Afghanistan Collection is one of the largest collections of research materials on Afghanistan and serves scholars from around the world

    Human aspects in Afghanistan: Handbook

    Get PDF
    The Human Aspects of the Operational environment in Afghanistan Handbook was created as part of a broader project, "Human Aspects of the Operational Environment (HAOE)", under the coordination of the Emerging Security Challenges Division / NATO HQ. The project aims to provide a comprehensive introduction to the human aspects in any theatre of operations and was conducted as a series of workshops spanning two years. There are many country books, studies and reports, within and outside NATO community, that provide basic/specific information about Afghanistan. This handbook fuses information from the most relevant ones, adding the unique perspective and experience of eight contributors with various Afghanistan backgrounds

    Afghanistan's legal culture from 1750 to the twenty-first century

    Get PDF
    This thesis analyses Afghanistan’s legal culture from 1750 to the present date. In implementing this task, historical legal developments are discussed in light of available records such as the Law Gazettes and recorded cases by the law courts as well as the analyses produced in the form of books, research papers and objective reports by national and international sources. For the technical aspects of Afghanistan’s legal history and legal traditions (legal culture), the thesis aims to provide in-depth descriptions as to how legal issues in Afghanistan have developed over a period of nearly three centuries. In this thesis, the legal affairs, their strengths, weaknesses and deficiencies will be discussed in light of the given situations. They will be discussed in relation to the internal and external aspects of Afghanistan’s legal system in order to produce an informative research which shows whether in Afghanistan a culture of legality has been established. Through the completion of this thesis, the writer aims to provide insights as well as suggestions in relation to the issues discussed and analysed. The writer is conscious that on a topic like this, despite all the efforts he has made, there is, given the state of the question, simply no way of excluding any possibility of a gap or a lacuna

    The Beginnings of Islam in Afghanistan:Conquest, Acculturation, and Islamization

    Get PDF

    Conceptions of Criticism. Cross-Cultural, Interdisciplinary, and Historical Studies of Structures of a Concept of Values

    Get PDF
    This book is an introduction into the methods, applications, and the history of criticism. Directed in a historical perspectives, it gives access to the conditions of criticism that depend upon historical situations and cultural conditions. This book introduces into the main areas of applied criticism showing their historical development and the main applications in several cultures in Europe from a diachronic approach, and Asia, Africa, America, and Australia from a synchronic approach. Therefore it faces the specific terminology related to criticism in diverse cultures.Dieses Buch ist eine Einleitung in die Methoden, in die Anwendungen und in die Geschichte der Kritik. Von einer historischen Perspektive aus gibt es Zugang zu den Bedingungen der Kritik, die von historischen Situationen und kulturellen Zuständen abhängen. Das Buch führt in die Hauptbereiche der angewandten Kritik, ihre historische Entwicklung und die Hauptanwendungen in Kulturen in Europa von einer diachronen Annäherung aus ein und zeigt für Asien, Afrika, Amerika und Australien von einer synchronen Annäherung aus Formen der Kritik auf. Es betrachtet die spezifischen Terminologien, die auf Kritik in verschiedenen Kulturen bezogen sind

    From Kabul to the Academy: Narratives of Afghan Women\u27s Journeys to and Through U.S. Doctoral Programs

    Get PDF
    This study explored the experiences of seven Afghan women pursuing doctoral degrees in a variety of disciplines and programs across the United States. The guiding question for this study was: What factors influence Afghan women\u27s journeys to and experiences in doctoral programs? In an attempt to understand Afghan women doctoral students, I provided a historical background of Afghanistan and education in Afghanistan followed by a literature review on South Asian women, the broader category for Afghan women. Within this literature review I explored the following components: culture, gender, immigration, experiences in postsecondary education; all factors that may be influential in the journey of South Asian women in U.S. postsecondary education. Finally, a critical race feminism theoretical framework was utilized to fuse the factors affecting South Asian women in higher education and provide a theoretical guide for further research specifically investigating Afghan women in doctoral programs. Through the use of narrative inquiry, I provided an individual and collective story of the lives of seven Afghan women in U.S. doctoral programs. From these stories, four themes emerged as influential in the lives of the participants. The four themes that emerged were faith, identity, capital, and family. Upon a thorough investigation of the themes and multiple sub-themes, several implications and recommendations were made. The findings of the study showed that there are no formulas to understand the complexities in the lives of Afghan women doctoral students, but several intersecting identities and factors that create the journey and help the reader understand their experiences
    • …
    corecore