235 research outputs found

    A Formal Model of Ambiguity and its Applications in Machine Translation

    Get PDF
    Systems that process natural language must cope with and resolve ambiguity. In this dissertation, a model of language processing is advocated in which multiple inputs and multiple analyses of inputs are considered concurrently and a single analysis is only a last resort. Compared to conventional models, this approach can be understood as replacing single-element inputs and outputs with weighted sets of inputs and outputs. Although processing components must deal with sets (rather than individual elements), constraints are imposed on the elements of these sets, and the representations from existing models may be reused. However, to deal efficiently with large (or infinite) sets, compact representations of sets that share structure between elements, such as weighted finite-state transducers and synchronous context-free grammars, are necessary. These representations and algorithms for manipulating them are discussed in depth in depth. To establish the effectiveness and tractability of the proposed processing model, it is applied to several problems in machine translation. Starting with spoken language translation, it is shown that translating a set of transcription hypotheses yields better translations compared to a baseline in which a single (1-best) transcription hypothesis is selected and then translated, independent of the translation model formalism used. More subtle forms of ambiguity that arise even in text-only translation (such as decisions conventionally made during system development about how to preprocess text) are then discussed, and it is shown that the ambiguity-preserving paradigm can be employed in these cases as well, again leading to improved translation quality. A model for supervised learning that learns from training data where sets (rather than single elements) of correct labels are provided for each training instance and use it to learn a model of compound word segmentation is also introduced, which is used as a preprocessing step in machine translation

    Features and Algorithms for Visual Parsing of Handwritten Mathematical Expressions

    Get PDF
    Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis. Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation. The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds\u27 algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets. For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research

    Statistical relational learning of semantic models and grammar rules for 3D building reconstruction from 3D point clouds

    Get PDF
    Formal grammars are well suited for the estimation of models with an a-priori unknown number of parameters such as buildings and have proven their worth for 3D modeling and reconstruction of cities. However, the generation and design of corresponding grammar rules is a laborious task and relies on expert knowledge. This thesis presents novel approaches for the reduction of this effort using advanced machine learning methods resulting in automatically learned sophisticated grammar rules. Indeed, the learning of a wide range of sophisticated rules, that reflect the variety and complexity, is a challenging task. This is especially the case if a simultaneous machine learning of building structures and the underlying aggregation hierarchies as well as the building parameters and the constraints among them for a semantic interpretation is expected. Thus, in this thesis, an incremental approach is followed. It separates the structure learning from the parameter distribution learning of building parts. Moreover, the so far procedural approaches with formal grammars are mostly rather convenient for the generation of virtual city models than for the reconstruction of existing buildings. To this end, Inductive Logic Programming (ILP) techniques are transferred and applied for the first time in the field of 3D building modeling. This enables the automatic learning of declarative logic programs, which are equivalent to attribute grammars and separate the representation of buildings and their parts from the reconstruction task. A stepwise bottom-up learning, starting from the smallest atomic features of a building part together with the semantic, topological and geometric constraints, is a key to a successful learning of a whole building part. Only few examples are sufficient to learn from precise as well as noisy observations. The learning from uncertain data is realized using probability density functions, decision trees and uncertain projective geometry. This enables the handling and modeling of uncertain topology and geometric reasoning taking noise into consideration. The uncertainty of models itself is also considered. Therefore, a novel method is developed for the learning of Weighted Attribute Context-Free Grammar (WACFG). On the one hand, the structure learning of façades – context-free part of the Grammar – is performed based on annotated derivation trees using specific Support Vector Machines (SVMs). The latter are able to derive probabilistic models from structured data and to predict a most likely tree regarding to given observations. On the other hand, to the best of my knowledge, Statistical Relational Learning (SRL), especially Markov Logic Networks (MLNs), are applied for the first time in order to learn building part (shape and location) parameters as well as the constraints among these parts. The use of SRL enables to take profit from the elegant logical relational description and to benefit from the efficiency of statistical inference methods. In order to model latent prior knowledge and exploit the architectural regularities of buildings, a novel method is developed for the automatic identification of translational as well as axial symmetries. For symmetry identification a supervised machine learning approach is followed based on an SVM classifier. Building upon the classification results, algorithms are designed for the representation of symmetries using context-free grammars from authoritative building footprints. In all steps the machine learning is performed based on real- world data such as 3D point clouds and building footprints. The handling with uncertainty and occlusions is assured. The presented methods have been successfully applied on real data. The belonging classification and reconstruction results are shown.Statistisches relationales Lernen von semantischen Modellen und Grammatikregeln für 3D Gebäuderekonstruktion aus 3D Punktwolken Formale Grammatiken eignen sich sehr gut zur Schätzung von Modellen mit a-priori unbekannter Anzahl von Parametern und haben sich daher als guter Ansatz zur Rekonstruktion von Städten mittels 3D Stadtmodellen bewährt. Der Entwurf und die Erstellung der dazugehörigen Grammatikregeln benötigt jedoch Expertenwissen und ist mit großem Aufwand verbunden. Im Rahmen dieser Arbeit wurden Verfahren entwickelt, die diesen Aufwand unter Zuhilfenahme von leistungsfähigen Techniken des maschinellen Lernens reduzieren und automatisches Lernen von Regeln ermöglichen. Das Lernen umfangreicher Grammatiken, die die Vielfalt und Komplexität der Gebäude und ihrer Bestandteile widerspiegeln, stellt eine herausfordernde Aufgabe dar. Dies ist insbesondere der Fall, wenn zur semantischen Interpretation sowohl das Lernen der Strukturen und Aggregationshierarchien als auch von Parametern der zu lernenden Objekte gleichzeitig statt finden soll. Aus diesem Grund wird hier ein inkrementeller Ansatz verfolgt, der das Lernen der Strukturen vom Lernen der Parameterverteilungen und Constraints zielführend voneinander trennt. Existierende prozedurale Ansätze mit formalen Grammatiken sind eher zur Generierung von synthetischen Stadtmodellen geeignet, aber nur bedingt zur Rekonstruktion existierender Gebäude nutzbar. Hierfür werden in dieser Schrift Techniken der Induktiven Logischen Programmierung (ILP) zum ersten Mal auf den Bereich der 3D Gebäudemodellierung übertragen. Dies führt zum Lernen deklarativer logischer Programme, die hinsichtlich ihrer Ausdrucksstärke mit attributierten Grammatiken gleichzusetzen sind und die Repräsentation der Gebäude von der Rekonstruktionsaufgabe trennen. Das Lernen von zuerst disaggregierten atomaren Bestandteilen sowie der semantischen, topologischen und geometrischen Beziehungen erwies sich als Schlüssel zum Lernen der Gesamtheit eines Gebäudeteils. Das Lernen erfolgte auf Basis einiger weniger sowohl präziser als auch verrauschter Beispielmodelle. Um das Letztere zu ermöglichen, wurde auf Wahrscheinlichkeitsdichteverteilungen, Entscheidungsbäumen und unsichere projektive Geometrie zurückgegriffen. Dies erlaubte den Umgang mit und die Modellierung von unsicheren topologischen Relationen sowie unscharfer Geometrie. Um die Unsicherheit der Modelle selbst abbilden zu können, wurde ein Verfahren zum Lernen Gewichteter Attributierter Kontextfreier Grammatiken (Weighted Attributed Context-Free Grammars, WACFG) entwickelt. Zum einen erfolgte das Lernen der Struktur von Fassaden –kontextfreier Anteil der Grammatik – aus annotierten Herleitungsbäumen mittels spezifischer Support Vektor Maschinen (SVMs), die in der Lage sind, probabilistische Modelle aus strukturierten Daten abzuleiten und zu prädizieren. Zum anderen wurden nach meinem besten Wissen Methoden des statistischen relationalen Lernens (SRL), insbesondere Markov Logic Networks (MLNs), erstmalig zum Lernen von Parametern von Gebäuden sowie von bestehenden Relationen und Constraints zwischen ihren Bestandteilen eingesetzt. Das Nutzen von SRL erlaubt es, die eleganten relationalen Beschreibungen der Logik mit effizienten Methoden der statistischen Inferenz zu verbinden. Um latentes Vorwissen zu modellieren und architekturelle Regelmäßigkeiten auszunutzen, ist ein Verfahren zur automatischen Erkennung von Translations- und Spiegelsymmetrien und deren Repräsentation mittels kontextfreier Grammatiken entwickelt worden. Hierfür wurde mittels überwachtem Lernen ein SVM-Klassifikator entwickelt und implementiert. Basierend darauf wurden Algorithmen zur Induktion von Grammatikregeln aus Grundrissdaten entworfen

    Hierarchical and Spatial Structures for Interpreting Images of Man-made Scenes Using Graphical Models

    Get PDF
    The task of semantic scene interpretation is to label the regions of an image and their relations into meaningful classes. Such task is a key ingredient to many computer vision applications, including object recognition, 3D reconstruction and robotic perception. It is challenging partially due to the ambiguities inherent to the image data. The images of man-made scenes, e. g. the building facade images, exhibit strong contextual dependencies in the form of the spatial and hierarchical structures. Modelling these structures is central for such interpretation task. Graphical models provide a consistent framework for the statistical modelling. Bayesian networks and random fields are two popular types of the graphical models, which are frequently used for capturing such contextual information. The motivation for our work comes from the belief that we can find a generic formulation for scene interpretation that having both the benefits from random fields and Bayesian networks. It should have clear semantic interpretability. Therefore our key contribution is the development of a generic statistical graphical model for scene interpretation, which seamlessly integrates different types of the image features, and the spatial structural information and the hierarchical structural information defined over the multi-scale image segmentation. It unifies the ideas of existing approaches, e. g. conditional random field (CRF) and Bayesian network (BN), which has a clear statistical interpretation as the maximum a posteriori (MAP) estimate of a multi-class labelling problem. Given the graphical model structure, we derive the probability distribution of the model based on the factorization property implied in the model structure. The statistical model leads to an energy function that can be optimized approximately by either loopy belief propagation or graph cut based move making algorithm. The particular type of the features, the spatial structure, and the hierarchical structure however is not prescribed. In the experiments, we concentrate on terrestrial man-made scenes as a specifically difficult problem. We demonstrate the application of the proposed graphical model on the task of multi-class classification of building facade image regions. The framework for scene interpretation allows for significantly better classification results than the standard classical local classification approach on man-made scenes by incorporating the spatial and hierarchical structures. We investigate the performance of the algorithms on a public dataset to show the relative importance of the information from the spatial structure and the hierarchical structure. As a baseline for the region classification, we use an efficient randomized decision forest classifier. Two specific models are derived from the proposed graphical model, namely the hierarchical CRF and the hierarchical mixed graphical model. We show that these two models produce better classification results than both the baseline region classifier and the flat CRF.Hierarchische und räumliche Strukturen zur Interpretation von Bildern anthropogener Szenen unter Nutzung graphischer Modelle Ziel der semantischen Bildinterpretation ist es, Bildregionen und ihre gegenseitigen Beziehungen zu kennzeichnen und in sinnvolle Klassen einzuteilen. Dies ist eine der Hauptaufgabe in vielen Bereichen des maschinellen Sehens, wie zum Beispiel der Objekterkennung, 3D Rekonstruktion oder der Wahrnehmung von Robotern. Insbesondere Bilder anthropogener Szenen, wie z.B. Fassadenaufnahmen, sind durch starke räumliche und hierarchische Strukturen gekennzeichnet. Diese Strukturen zu modellieren ist zentrale Teil der Interpretation, für deren statistische Modellierung graphische Modelle ein geeignetes konsistentes Werkzeug darstellen. Bayes Netze und Zufallsfelder sind zwei bekannte und häufig genutzte Beispiele für graphische Modelle zur Erfassung kontextabhängiger Informationen. Die Motivation dieser Arbeit liegt in der überzeugung, dass wir eine generische Formulierung der Bildinterpretation mit klarer semantischer Bedeutung finden können, die die Vorteile von Bayes Netzen und Zufallsfeldern verbindet. Der Hauptbeitrag der vorliegenden Arbeit liegt daher in der Entwicklung eines generischen statistischen graphischen Modells zur Bildinterpretation, welches unterschiedlichste Typen von Bildmerkmalen und die räumlichen sowie hierarchischen Strukturinformationen über eine multiskalen Bildsegmentierung integriert. Das Modell vereinheitlicht die existierender Arbeiten zugrunde liegenden Ideen, wie bedingter Zufallsfelder (conditional random field (CRF)) und Bayesnetze (Bayesian network (BN)). Dieses Modell hat eine klare statistische Interpretation als Maximum a posteriori (MAP) Schätzer eines mehrklassen Zuordnungsproblems. Gegeben die Struktur des graphischen Modells und den dadurch definierten Faktorisierungseigenschaften leiten wir die Wahrscheinlichkeitsverteilung des Modells ab. Dies führt zu einer Energiefunktion, die näherungsweise optimiert werden kann. Der jeweilige Typ der Bildmerkmale, die räumliche sowie hierarchische Struktur ist von dieser Formulierung unabhängig. Wir zeigen die Anwendung des vorgeschlagenen graphischen Modells anhand der mehrklassen Zuordnung von Bildregionen in Fassadenaufnahmen. Wir demonstrieren, dass das vorgeschlagene Verfahren zur Bildinterpretation, durch die Berücksichtigung räumlicher sowie hierarchischer Strukturen, signifikant bessere Klassifikationsergebnisse zeigt, als klassische lokale Klassifikationsverfahren. Die Leistungsfähigkeit des vorgeschlagenen Verfahrens wird anhand eines öffentlich verfügbarer Datensatzes evaluiert. Zur Klassifikation der Bildregionen nutzen wir ein Verfahren basierend auf einem effizienten Random Forest Klassifikator. Aus dem vorgeschlagenen allgemeinen graphischen Modell werden konkret zwei spezielle Modelle abgeleitet, ein hierarchisches bedingtes Zufallsfeld (hierarchical CRF) sowie ein hierarchisches gemischtes graphisches Modell. Wir zeigen, dass beide Modelle bessere Klassifikationsergebnisse erzeugen als die zugrunde liegenden lokalen Klassifikatoren oder die einfachen bedingten Zufallsfelder

    Computing a partition function of a generalized pattern-based energy over a semiring

    Full text link
    Valued constraint satisfaction problems with ordered variables (VCSPO) are a special case of Valued CSPs in which variables are totally ordered and soft constraints are imposed on tuples of variables that do not violate the order. We study a restriction of VCSPO, in which soft constraints are imposed on a segment of adjacent variables and a constraint language Γ\Gamma consists of {0,1}\{0,1\}-valued characteristic functions of predicates. This kind of potentials generalizes the so-called pattern-based potentials, which were applied in many tasks of structured prediction. For a constraint language Γ\Gamma we introduce a closure operator, ΓΓ \overline{\Gamma^{\cap}}\supseteq \Gamma, and give examples of constraint languages for which Γ|\overline{\Gamma^{\cap}}| is small. If all predicates in Γ\Gamma are cartesian products, we show that the minimization of a generalized pattern-based potential (or, the computation of its partition function) can be made in O(VD2Γ2){\mathcal O}(|V|\cdot |D|^2 \cdot |\overline{\Gamma^{\cap}}|^2 ) time, where VV is a set of variables, DD is a domain set. If, additionally, only non-positive weights of constraints are allowed, the complexity of the minimization task drops to O(VΓDmaxρΓρ2){\mathcal O}(|V|\cdot |\overline{\Gamma^{\cap}}| \cdot |D| \cdot \max_{\rho\in \Gamma}\|\rho\|^2 ) where ρ\|\rho\| is the arity of ρΓ\rho\in \Gamma. For a general language Γ\Gamma and non-positive weights, the minimization task can be carried out in O(VΓ2){\mathcal O}(|V|\cdot |\overline{\Gamma^{\cap}}|^2) time. We argue that in many natural cases Γ\overline{\Gamma^{\cap}} is of moderate size, though in the worst case Γ|\overline{\Gamma^{\cap}}| can blow up and depend exponentially on maxρΓρ\max_{\rho\in \Gamma}\|\rho\|
    corecore