30 research outputs found

    Cross-document word matching for segmentation and retrieval of Ottoman divans

    Get PDF
    Cataloged from PDF version of article.Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are dif- ficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents

    Automatic categorization of Ottoman poems

    Get PDF
    Cataloged from PDF version of article.This work is partially supported by the Scientific and Technical Research Council of Turkey (TÜBİTAK) under the grant number 109E006.Authorship attribution and identifying time period of literary works are fundamental problems in quantitative analysis of languages. We investigate two fundamentally different machine learning text categorization methods, Support Vector Machines (SVM) and Naïve Bayes (NB), and several style markers in the categorization of Ottoman poems according to their poets and time periods. We use the collected works (divans) of ten different Ottoman poets: two poets from each of the five different hundred-year periods ranging from the 15th to 19 th century. Our experimental evaluation and statistical assessments show that it is possible to obtain highly accurate and reliable classifications and to distinguish the methods and style markers in terms of their effectiveness

    Cross-document word matching for segmentation and retrieval of Ottoman divans

    Get PDF
    Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are difficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents. © 2014, Springer-Verlag London

    A content-based social network study of evliyâ çelebi's seyahatnâme-bitlis section

    Get PDF
    Evliyâ Çelebi, an Ottoman writer, scholar and world traveler, visited most of the territories and also some of the neighboring countries of the Ottoman Empire in the seventeenth century. He took notes about his trips and wrote a 10-volume book called Seyahatnâme (Book of Travels). In this paper, we present two methods for constructing social networks by using textual data and apply it to Seyahatnâme-Bitlis Section from book IV. The first social network construction method is based on proximity of co-occurence of names. The second method is based on 2-pair associations obtained by association rule mining by using sliding text blocks as transactions. The social networks obtained by these two methods are validated using a Monte Carlo approach by comparing them with the social network created by a scholar-historian. © 2012 Springer-Verlag London Limited

    Redif extraction in handwritten Ottoman literary texts

    Get PDF
    Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradition either to prove their expertise or to show respect towards old masters. Automatic recognition of redifs would provide important data mining opportunities in literary analyses of Ottoman poetry where the majority of it is in handwritten form. In this study, we propose a matching criterion and method, Redif Extraction using Contour Segments (RECS) using the proposed matching criterion, that detects redifs in handwritten Ottoman literary texts using only visual analysis. Our method provides a success rate of 0.682 in a test collection of 100 poems. © 2010 IEEE

    Automatic categorization of ottoman literary texts by poet and time period

    Get PDF
    Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet and show that it is possible to develop effective Automatic Text Categorization techniques that can be applied to the Ottoman language. For this purpose, we use two fundamentally different machine learning methods: Naïve Bayes and Support Vector Machines, and employ four style markers: most frequent words, token lengths, two-word collocations, and type lengths. In the experiments, we use the collected works (divans) of ten different poets: two poets from five different hundred-year periods ranging from the 15th to 19th century. The experimental results show that it is possible to obtain highly accurate classifications in terms of poet and time period. By using statistical analysis we are able to recommend which style marker and machine learning method are to be used in future studies. © 2012 Springer-Verlag London Limited

    Matching Islamic patterns in Kufic images

    Get PDF
    In this study, we address the problem of matching patterns in Kufic calligraphy images. Being used as a decorative element, Kufic images have been designed in a way that makes it difficult to be read by non-experts. Therefore, available methods for handwriting recognition are not easily applicable to the recognition of Kufic patterns. In this study, we propose two new methods for Kufic pattern matching. The first method approximates the contours of connected components into lines and then utilizes chain code representation. Sequence matching techniques with a penalty for gaps are exploited for handling the variations between different instances of sub-patterns. In the second method, skeletons of connected components are represented as a graph where junction and end points are considered as nodes. Graph isomorphism techniques are then relaxed for partial graph matching. Methods are evaluated over a collection of 270 square Kufic images with 8,941 sub-patterns. Experimental results indicate that, besides retrieval and indexing of known patterns, our method also allows the discovery of new patterns. © 2015, Springer-Verlag London

    Line segmentation of Ottoman documents [Osmanlica belgeleri̇n satirlara bölütlenmesi̇]

    No full text
    Many researches and historians from all around the world are interested in historical Ottoman archives. However, translation of these documents requires competent historians which is not a feasible method in terms of time and cost. Thus, automatic translation of these documents are required. In this paper, preprocessing steps of accessing the Ottoman manuscripts with a word based search engine is studied. These preprocessing steps are binarization and line segmentation of digitalized documents. The traditional line segmentation methods applied to printed documents do not yield to satisfactory results for historical and handwritten documents. Due to this fact, more complex line segmentation techniques must be used. In this study, we developed a projection profile based method for line segmentation and local binarization is used. The experiments are conducted on a 120 page Ottoman archive and the results show that the proposed system is successful. © 2012 IEEE

    OTAP Ottoman archives internet interface [OTAP Osmanlica metinleri internet arayüzü]

    No full text
    Within Ottoman Text Archive Project a web interface to aid in uploading, binarization, line and word segmentation, labeling, recognition and testing of the Ottoman Turkish texts has been developed. It became possible to retrieve expert knowledge of scholars working with Ottoman archives through this interface, and apply this knowledge in developing further technologies in transliteration of historical manuscripts. © 2012 IEEE

    Turbulent Combustion Modeling with Fully Coupled Fully Implicit Compressible Solver

    No full text
    The aim of this paper is to report on a recently developed fully coupled and fully implicit solver for turbulent combustion. All the equationsare written in terms of primitive variables (pressure, velocity, temperature, turbulent parameters and species mass fractions) and solved in a fully coupled manner. The coupled system of equations is solved using an unstructured collocated Finite Volume (FV) approach using a fully implicit temporal discretization. An all speed version of AUSM approach is used along with a time derivative preconditioning. The developed fully coupled solver can be applied to a wide range of flow speeds from incompressible limit to hyper-sonic regimes. For solving the resulting system of equations, a sparse ILU preconditioned, sparse GMRES solver is used. This sparse matrix solver is very effective and only linearly increases the computational cost with increasing number of solution cells and number of solution variables. Turbulence reaction coupling is basically obtained using Eddy Dissipation Model (EDM) approach
    corecore