Search CORE

26 research outputs found

A Formal View on Training of Weighted Tree Automata by Likelihood-Driven State Splitting and Merging

Author: Dietze Toni
Publication venue
Publication date: 03/06/2019
Field of study

The use of computers and algorithms to deal with human language, in both spoken and written form, is summarized by the term natural language processing (nlp). Modeling language in a way that is suitable for computers plays an important role in nlp. One idea is to use formalisms from theoretical computer science for that purpose. For example, one can try to find an automaton to capture the valid written sentences of a language. Finding such an automaton by way of examples is called training. In this work, we also consider the structure of sentences by making use of trees. We use weighted tree automata (wta) in order to deal with such tree structures. Those devices assign weights to trees in order to, for example, distinguish between good and bad structures. The well-known expectation-maximization algorithm can be used to train the weights for a wta while the state behavior stays fixed. As a way to adapt the state behavior of a wta, state splitting, i.e. dividing a state into several new states, and state merging, i.e. replacing several states by a single new state, can be used. State splitting, state merging, and the expectation maximization algorithm already were combined into the state splitting and merging algorithm, which was successfully applied in practice. In our work, we formalized this approach in order to show properties of the algorithm. We also examined a new approach – the count-based state merging algorithm – which exclusively relies on state merging. When dealing with trees, another important tool is binarization. A binarization is a strategy to code arbitrary trees by binary trees. For each of three different binarizations we showed that wta together with the binarization are as powerful as weighted unranked tree automata (wuta). We also showed that this is still true if only probabilistic wta and probabilistic wuta are considered.:How to Read This Thesis 1. Introduction 1.1. The Contributions and the Structure of This Work 2. Preliminaries 2.1. Sets, Relations, Functions, Families, and Extrema 2.2. Algebraic Structures 2.3. Formal Languages 3. Language Formalisms 3.1. Context-Free Grammars (CFGs) 3.2. Context-Free Grammars with Latent Annotations (CFG-LAs) 3.3. Weighted Tree Automata (WTAs) 3.4. Equivalences of WCFG-LAs and WTAs 4. Training of WTAs 4.1. Probability Distributions 4.2. Maximum Likelihood Estimation 4.3. Probabilities and WTAs 4.4. The EM Algorithm for WTAs 4.5. Inside and Outside Weights 4.6. Adaption of the Estimation of Corazza and Satta [CS07] to WTAs 5. State Splitting and Merging 5.1. State Splitting and Merging for Weighted Tree Automata 5.1.1. Splitting Weights and Probabilities 5.1.2. Merging Probabilities 5.2. The State Splitting and Merging Algorithm 5.2.1. Finding a Good π-Distributor 5.2.2. Notes About the Berkeley Parser 5.3. Conclusion and Further Research 6. Count-Based State Merging 6.1. Preliminaries 6.2. The Likelihood of the Maximum Likelihood Estimate and Its Behavior While Merging 6.3. The Count-Based State Merging Algorithm 6.3.1. Further Adjustments for Practical Implementations 6.4. Implementation of Count-Based State Merging 6.5. Experiments with Artificial Automata and Corpora 6.5.1. The Artificial Automata 6.5.2. Results 6.6. Experiments with the Penn Treebank 6.7. Comparison to the Approach of Carrasco, Oncina, and Calera-Rubio [COC01] 6.8. Conclusion and Further Research 7. Binarization 7.1. Preliminaries 7.2. Relating WSTAs and WUTAs via Binarizations 7.2.1. Left-Branching Binarization 7.2.2. Right-Branching Binarization 7.2.3. Mixed Binarization 7.3. The Probabilistic Case 7.3.1. Additional Preliminaries About WSAs 7.3.2. Constructing an Out-Probabilistic WSA from a Converging WSA 7.3.3. Binarization and Probabilistic Tree Automata 7.4. Connection to the Training Methods in Previous Chapters 7.5. Conclusion and Further Research A. Proofs for Preliminaries B. Proofs for Training of WTAs C. Proofs for State Splitting and Merging D. Proofs for Count-Based State Merging Bibliography List of Algorithms List of Figures List of Tables Index Table of Variable Name

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Inklusionsorientierte Schulentwicklung

Author: Bengel Angelika
Dietze Torsten
Frohn Julia
Piezunka Anne
Simon Toni
Publication venue: 'Verlag Julius Klinkhardt KG'
Publication date: 01/12/2022
Field of study

Der Band eröffnet einen interdisziplinären Zugang zur Thematik der inklusionsorientierten Schulentwicklung, indem die Dimensionen der Organisations-, Personal- und Unterrichtsentwicklung aus Perspektive verschiedener (Sub-)Disziplinen und Zugänge betrachtet und miteinander verknüpft werden. Neben einer historisch orientierten Inklusionsforschung, mit der Fragen möglicher Anknüpfungspunkte, Pfadabhängigkeiten oder Zäsuren bearbeitet werden, widmen sich die Beiträge des Bandes u.a. der Analyse des Status quo schulischer Inklusion, z.B. hinsichtlich relevanter Merkmale inklusionsorientierten Unterrichts oder möglicher Grenzen der Inklusionsforschung und -implementierung. Zudem werden Zukunftsperspektiven aus gegenwärtigen wissenschaftlichen Diskursen und Erkenntnissen abgeleitet

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Cutting out the middleman: calibrating and validating a dynamic vegetation model (ED2-PROSPECT5) using remotely sensed surface reflectance

Author: Dietze Michael C.
Fer Istem
Serbin Shawn P.
Shiklomanov Alexey
Viskari Toni
Publication venue: 'Copernicus GmbH'
Publication date: 22/10/2020
Field of study

Ecosystem models are often calibrated and/or validated against derived remote sensing data products, such as MODIS leaf area index. However, these data products are generally based on their own models, whose assumptions may not be compatible with those of the ecosystem model in question, and whose uncertainties are usually not well quantified. Here, we develop an alternative approach whereby we modify an ecosystem model to predict full-range, high spectral resolution surface reflectance, which can then be compared directly against airborne and satellite data. Specifically, we coupled the two-stream representation of canopy radiative transfer in the Ecosystem Demography model (ED2) with a leaf radiative transfer model (PROSPECT 5) and a simple soil reflectance model. We then calibrated this model against reflectance observations from the NASA Airborne VIsible/InfraRed Imaging Spectrometer (AVIRIS) and survey data from 54 temperate forest plots in the northeastern United States. The calibration successfully constrained the posterior distributions of model parameters related to leaf biochemistry and morphology and canopy structure for five plant functional types. The calibrated model was able to accurately reproduce surface reflectance and leaf area index for sites with highly varied forest composition and structure, using a single common set of parameters across all sites. We conclude that having dynamic vegetation models directly predict surface reflectance is a promising avenue for model calibration and validation using remote sensing data.https://gmd.copernicus.org/preprints/gmd-2020-324/gmd-2020-324.pdfFirst author draf

Boston University Institutional Repository (OpenBU)

Assembling evidence for identifying reservoirs of infection

Author: Allen
Almberg
Almberg
Antia
Ashford
Ashford
Bartelt
Bartlett
Bartlett
Bartlett
Beaumont
Bedford
Beerli
Beerli
Beerli
Begon
Beyer
Beyer
Biek
Blackwood
Blumberg
Brown
Brown
Brunker
Caley
Carpentier
Carslake
Case
Caswell
Choisy
Cleaveland
Cleaveland
Conlan
Craft
Cross
Daniel T. Haydon
Diekmann
Dietze
Dobson
Donnelly
Drexler
Faria
Farmer
Farrington
Fenton
Ferrari
Gilbert
Godfray
Gortazar
Grenfell
Grenfell
Gunning
Hanski
Harrison
Haydon
He
Heisey
Holling
Ionides
James O. Lloyd-Smith
Jombart
Keeling
Kilpatrick
Kuzmin
Lachish
Laurenson
Lembo
Lembo
Lemey
Lloyd
Lloyd-Smith
MacInnes
Mafalda Viana
Mather
McDonald-Madden
Minin
Mollentze
Moran–Ellis
Morelli
Nishiura
Nugent
Nåsell
Nåsell
O’Cathain
Packer
Paul C. Cross
Platt
Prager
Quinnell
Rebecca Mancy
Roberts
Roeder
Roman Biek
Sarah Cleaveland
Schenzle
Serrano
Stankey
Streicker
Streicker
Swinton
Taylor
Toni
Vitasek
Wobeser
Woolhouse
Publication venue: 'Elsevier BV'
Publication date: 01/05/2014
Field of study

Many pathogens persist in multihost systems, making the identification of infection reservoirs crucial for devising effective interventions. Here, we present a conceptual framework for classifying patterns of incidence and prevalence, and review recent scientific advances that allow us to study and manage reservoirs simultaneously. We argue that interventions can have a crucial role in enriching our mechanistic understanding of how reservoirs function and should be embedded as quasi-experimental studies in adaptive management frameworks. Single approaches to the study of reservoirs are unlikely to generate conclusive insights whereas the formal integration of data and methodologies, involving interventions, pathogen genetics, and contemporary surveillance techniques, promises to open up new opportunities to advance understanding of complex multihost systems

Elsevier - Publisher Connector

Crossref

PubMed Central

eScholarship - University of California

Enlighten

Recommended from our members

Beyond ecosystem modeling: a roadmap to community cyberinfrastructure for ecological data‐model integration

Author: Campbell Eleanor E.
Cowdery Elizabeth M.
De Kauwe Martin G.
Desai Ankur
Dietze Michael C.
Duveneck Matthew J.
Fer Istem
Fisher Joshua B.
Gardella Anthony K.
Haynes Katherine D.
Hoffman Forrest M.
Johnston Miriam R.
Kooper Rob
LeBauer David S.
Mantooth Joshua
Parton William
Poulter Benjamin
Quaife Tristan
Raiho Ann
Schaefer Kevin
Serbin Shawn P.
Shiklomanov Alexey N.
Simkins James
Viskari Toni
Wilcox Kevin R.
Publication venue: 'Wiley'
Publication date: 19/10/2020
Field of study

In an era of rapid global change, our ability to understand and predict Earth's natural systems is lagging behind our ability to monitor and measure changes in the biosphere. Bottlenecks to informing models with observations have reduced our capacity to fully exploit the growing volume and variety of available data. Here, we take a critical look at the information infrastructure that connects ecosystem modeling and measurement efforts, and propose a roadmap to community cyberinfrastructure development that can reduce the divisions between empirical research and modeling and accelerate the pace of discovery. A new era of data‐model integration requires investment in accessible, scalable, transparent tools that integrate the expertise of the whole community, including both modelers and empiricists. This roadmap focuses on five key opportunities for community tools: the underlying foundationsof community cyberinfrastructure; data ingest; calibration of models to data; model‐data benchmarking; and data assimilation and ecological forecasting. This community‐driven approach is key to meeting the pressing needs of science and society in the 21st century

Central Archive at the University of Reading

Crossref

The University of Arizona

eScholarship - University of California

Explore Bristol Research

Count-based state merging for probabilistic regular tree grammars

Author: Dietze Toni
Nederhof Mark Jan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 22/06/2015
Field of study

We present an approach to obtain language models from a tree corpus using probabilistic regular tree grammars (prtg). Starting with a prtg only generating trees from the corpus, the prtg is generalized step by step by merging nonterminals. We focus on bottom-up deterministic prtg to simplify the calculations.Postprin

University of St. Andrews - Pure

St Andrews Research Repository

A Formal View on Training of Weighted Tree Automata by Likelihood-Driven State Splitting and Merging

Author: Dietze Toni
Publication venue
Publication date: 03/06/2019
Field of study

Qucosa