703 research outputs found

    Query by String word spotting based on character bi-gram indexing

    Full text link
    In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

    Arabic Manuscripts Analysis and Retrieval

    Get PDF

    Integrating Visual and Textual Cues for Query-by-String Word Spotting

    Full text link

    Creating and Maintaining Consistent Documents with Elucidative Development

    Get PDF
    Software systems usually consist of multiple artefacts, such as requirements, class diagrams, or source code. Documents, such as specifications and documentation, can also be viewed as artefacts. In practice, however, writing and updating documents is often neglected because it is expensive and brings no immediate benefit. Consequently, documents are often outdated and communicate wrong information about the software. The price is paid later when a software system must be maintained and much implicit knowledge that existed at the time of the original development has been lost. A simple way to keep documents up to date is generation. However, not all documents can be fully generated. Usually, at least some content must be written by a human author. This handwritten content is lost if the documents must be regenerated. In this thesis, Elucidative Development is introduced. It is an approach to create documents by partial generation. Partial generation means that some parts of the document are generated whereas others are handwritten. Elucidative Development retains manually written content when the document is regenerated. An integral part of Elucidative Development is a guidance system, which informs the author about changes in the generated content and helps him update the handwritten content.:1 Introduction 1.1 Contributions 1.2 Scope of the Thesis 1.3 Organisation 2 Problem Analysis and Solution Outline 2.1 Redundancy and Inconsistency 2.2 Improving Consistency with Partial Generation 2.3 Conclusion 3 Background 3.1 Grammar-Based Modularisation 3.2 Model-Driven Software Development 3.3 Round-Trip Engineering 3.4 Conclusion 4 Elucidative Development 4.1 General Idea and Running Example 4.2 Requirements of Elucidative Development 4.3 Structure and Basic Concepts of Elucidative Documents 4.4 Presentation Layer 4.5 Guidance 4.6 Conclusion 5 Model-Driven Elucidative Development 5.1 General Idea and Running Example 5.2 Requirements of Model-Driven Elucidative Development 5.3 Structure and Basic Concepts of Elucidative Documents in Model-Driven Elucidative Development 5.4 Guidance 5.5 Conclusion 6 Extensions of Elucidative Development 6.1 Validating XML-based Elucidative Documents 6.2 Backpropagation-Based Round-Trip Engineering for Computed Text Document Fragments 6.3 Conclusion 7 Tool Support for an Elucidative Development Environment 7.1 Managing Active References 7.2 Inserting Computed Document Fragments 7.3 Caching the Computed Document Fragments 7.4 Elucidative Document Validation with Schemas 7.5 Conclusion 8 Related Work 8.1 Related Documentation Approaches 8.2 Consistency Approaches 8.3 Compound Documents 8.4 Conclusion 9 Evaluation 9.1 Creating and Maintaining the Cool Component Specification 9.2 Creating and Maintaining the UML Specification 9.3 Feasibility Studies 9.4 Conclusion 10 ConclusionSoftwaresysteme setzen sich üblicherweise aus vielen verschiedenen Artefakten zusammen, zum Beispiel Anforderungen, Klassendiagrammen oder Quellcode. Dokumente, wie zum Beispiel Spezifikationen oder Dokumentation, können auch als Artefakte betrachtet werden. In der Praxis wird aber das Schreiben und Aktualisieren von Dokumenten oft vernachlässigt, weil es zum einen teuer ist und zum anderen keinen unmittelbaren Vorteil bringt. Dokumente sind darum häufig veraltet und vermitteln falsche Informationen über die Software. Den Preis muss man später zahlen, wenn die Software gepflegt wird, weil viel von dem impliziten Wissen, das zur Zeit der Entwicklung existierte, verloren ist. Eine einfache Möglichkeit, Dokumente aktuell zu halten, ist Generierung. Allerdings können nicht alle Dokumente generiert werden. Meist muss wenigstens ein Teil von einem Menschen geschrieben werden. Dieser handgeschriebene Inhalt geht verloren, wenn das Dokument neu generiert werden muss. In dieser Arbeit wird das Elucidative Development vorgestellt. Dabei handelt es sich um einen Ansatz zur Dokumenterzeugung mittels partieller Generierung. Das bedeutet, dass Teile eines Dokuments generiert werden und der Rest von Hand ergänzt wird. Beim Elucidative Development bleibt der handgeschriebene Inhalt bestehen, wenn das restliche Dokument neu generiert wird. Ein integraler Bestandteil von Elucidative Development ist darüber hinaus ein Hilfesystem, das den Autor über Änderungen an generiertem Inhalt informiert und ihm hilft, den handgeschriebenen Inhalt zu aktualisieren.:1 Introduction 1.1 Contributions 1.2 Scope of the Thesis 1.3 Organisation 2 Problem Analysis and Solution Outline 2.1 Redundancy and Inconsistency 2.2 Improving Consistency with Partial Generation 2.3 Conclusion 3 Background 3.1 Grammar-Based Modularisation 3.2 Model-Driven Software Development 3.3 Round-Trip Engineering 3.4 Conclusion 4 Elucidative Development 4.1 General Idea and Running Example 4.2 Requirements of Elucidative Development 4.3 Structure and Basic Concepts of Elucidative Documents 4.4 Presentation Layer 4.5 Guidance 4.6 Conclusion 5 Model-Driven Elucidative Development 5.1 General Idea and Running Example 5.2 Requirements of Model-Driven Elucidative Development 5.3 Structure and Basic Concepts of Elucidative Documents in Model-Driven Elucidative Development 5.4 Guidance 5.5 Conclusion 6 Extensions of Elucidative Development 6.1 Validating XML-based Elucidative Documents 6.2 Backpropagation-Based Round-Trip Engineering for Computed Text Document Fragments 6.3 Conclusion 7 Tool Support for an Elucidative Development Environment 7.1 Managing Active References 7.2 Inserting Computed Document Fragments 7.3 Caching the Computed Document Fragments 7.4 Elucidative Document Validation with Schemas 7.5 Conclusion 8 Related Work 8.1 Related Documentation Approaches 8.2 Consistency Approaches 8.3 Compound Documents 8.4 Conclusion 9 Evaluation 9.1 Creating and Maintaining the Cool Component Specification 9.2 Creating and Maintaining the UML Specification 9.3 Feasibility Studies 9.4 Conclusion 10 Conclusio

    A Computational Theory of Contextual Knowledge in Machine Reading

    Get PDF
    Machine recognition of off–line handwriting can be achieved by either recognising words as individual symbols (word level recognition) or by segmenting a word into parts, usually letters, and classifying those parts (letter level recognition). Whichever method is used, current handwriting recognition systems cannot overcome the inherent ambiguity in writingwithout recourse to contextual information. This thesis presents a set of experiments that use Hidden Markov Models of language to resolve ambiguity in the classification process. It goes on to describe an algorithm designed to recognise a document written by a single–author and to improve recognition by adaptingto the writing style and learning new words. Learning and adaptation is achieved by reading the document over several iterations. The algorithm is designed to incorporate contextual processing, adaptation to modify the shape of known words and learning of new words within a constrained dictionary. Adaptation occurs when a word that has previously been trained in the classifier is recognised at either the word or letter level and the word image is used to modify the classifier. Learning occurs when a new word that has not been in the training set is recognised at the letter level and is subsequently added to the classifier. Words and letters are recognised using a nearest neighbour classifier and used features based on the two–dimensional Fourier transform. By incorporating a measure of confidence based on the distribution of training points around an exemplar, adaptation and learning is constrained to only occur when a word is confidently classified. The algorithm was implemented and tested with a dictionary of 1000 words. Results show that adaptation of the letter classifier improved recognition on average by 3.9% with only 1.6% at the whole word level. Two experiments were carried out to evaluate the learning in the system. It was found that learning accounted for little improvement in the classification results and also that learning new words was prone to misclassifications being propagated

    Non-Visual Representation of Complex Documents for Use in Digital Talking Books

    Get PDF
    Essential written information such as text books, bills, and catalogues needs to be accessible by everyone. However, access is not always available to vision-impaired people. As they require electronic documents to be available in specific formats. In order to address the accessibility issues of electronic documents, this research aims to design an affordable, portable, standalone and simple to use complete reading system that will convert and describe complex components in electronic documents to print disabled users

    Capturing Synchronous Collaborative Design Activities: A State-Of-The-Art Technology Review

    Get PDF
    corecore