1,396 research outputs found
Modeling Global Syntactic Variation in English Using Dialect Classification
This paper evaluates global-scale dialect identification for 14 national
varieties of English as a means for studying syntactic variation. The paper
makes three main contributions: (i) introducing data-driven language mapping as
a method for selecting the inventory of national varieties to include in the
task; (ii) producing a large and dynamic set of syntactic features using
grammar induction rather than focusing on a few hand-selected features such as
function words; and (iii) comparing models across both web corpora and social
media corpora in order to measure the robustness of syntactic variation across
registers
Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar
A usage-based Construction Grammar (CxG) posits that slot-constraints
generalize from common exemplar constructions. But what is the best model of
constraint generalization? This paper evaluates competing frequency-based and
association-based models across eight languages using a metric derived from the
Minimum Description Length paradigm. The experiments show that
association-based models produce better generalizations across all languages by
a significant margin
Synthesis and reactivity of silylmethylcyclopropanes
PhDSubstituted tetrahydrofurans (THFs) are common structural motifs found in natural products.
The biological activity and structural complexity of these compounds makes their efficient
construction with controlled regio- and stereochemistry a significant challenge in organic
synthesis. This thesis is concerned with investigating the use of silylmethylcyclopropanes as
precursors for the efficient and practical synthesis of tetrahydrofurans.
The first chapter consists of a review of the relevant literature comprising of four sections. The
first section is a brief review of the current methods for the synthesis of tetrahydrofurans with
discussions of the advantages and disadvantages of these methods. Next, the concept of donoracceptor
cyclopropanes is introduced and examples of how they have been employed in
tetrahydrofuran synthesis are given. The third section outlines the uses of silicon in organic
synthesis with particular reference to the physical and electronic influences of silicon on organic
molecules. Finally, the chapter concludes with an overview of the application of Lewis acid
promoted cycloadditon reactions of allylsilanes and silymethylcyclopropanes to the preparation
of tetrahydrofurans.
The second chapter discusses the preparation and purification of unsubstituted
silylmethylcyclopropanes outlining various conditions tried and the array of different
substituents that may be attached to the silicon. The successful Lewis acid promoted [3+2]
cycloaddition reaction of various silylmethylcyclopropanes with -keto-aldehydes is presented,
together with a detailed account of the screening studies of different Lewis acids and aldehydes,
and optimisation of reaction conditions. The advantages of having a ketone functionality in the
final compound are practically demonstrated by way of several synthetic modifications to
produce a range of chemically diverse compounds containing the tetrahydrofuran substructure.
The third chapter presents the synthesis of substituted silylmethylcyclopropanes and their
attempted cyclisations using the conditions previously developed for unsubstituted
silylmethylcyclopropanes.
Following attempts to use Lewis acid-activated aldehydes in [3+2] cycloaddition reactions, and
the consequent disadvantage of randomly trialling Lewis acids, chapter four presents our
4
investigations into the use of NMR spectroscopy as a probe to establish a relative quantitative
scale of carbonyl activation with different Lewis acids. Our studies into this method are
presented along with the NMR data of several carbonyl-based Lewis bases complexed to the
Lewis acids that proved successful in the cycloaddition reactions.
Chapter five provides detailed experimental procedures and characterisation data for the
compounds described within this thesis.Engineering and Physical Science Research counci
Everyone Youâll Never Meet
Everyone Youâll Never Meet is a multi-perspective mystery set in the fictional southern town of Ransom, South Carolina. It follows a young woman whose boyfriend disappears, a failed megachurch pastor at personal and professional crossroads, and a young father coming to grips with the shape of his life in light of a chance encounter with a murder victim
Modeling the Complexity and Descriptive Adequacy of Construction Grammars
This paper uses the Minimum Description Length paradigm to model the complexity of CxGs (operationalized as the encoding size of a grammar) alongside their descriptive adequacy (operationalized as the encoding size of a corpus given a grammar). These two quantities are combined to measure the quality of potential CxGs against unannotated corpora, supporting discovery-device CxGs for English, Spanish, French, German, and Italian. The results show (i) that these grammars provide significant generalizations as measured using compression and (ii) that more complex CxGs with access to multiple levels of representation provide greater generalizations than single-representation CxGs
Representations of Language Varieties Are Reliable Given Corpus Similarity Measures
This paper measures similarity both within and between 84 language varieties
across nine languages. These corpora are drawn from digital sources (the web
and tweets), allowing us to evaluate whether such geo-referenced corpora are
reliable for modelling linguistic variation. The basic idea is that, if each
source adequately represents a single underlying language variety, then the
similarity between these sources should be stable across all languages and
countries. The paper shows that there is a consistent agreement between these
sources using frequency-based corpus similarity measures. This provides further
evidence that digital geo-referenced corpora consistently represent local
language varieties
Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages
This paper uses computational experiments to explore the role of exposure in
the emergence of construction grammars. While usage-based grammars are
hypothesized to depend on a learner's exposure to actual language use, the
mechanisms of such exposure have only been studied in a few constructions in
isolation. This paper experiments with (i) the growth rate of the
constructicon, (ii) the convergence rate of grammars exposed to independent
registers, and (iii) the rate at which constructions are forgotten when they
have not been recently observed. These experiments show that the lexicon grows
more quickly than the grammar and that the growth rate of the grammar is not
dependent on the growth rate of the lexicon. At the same time,
register-specific grammars converge onto more similar constructions as the
amount of exposure increases. This means that the influence of specific
registers becomes less important as exposure increases. Finally, the rate at
which constructions are forgotten when they have not been recently observed
mirrors the growth rate of the constructicon. This paper thus presents a
computational model of usage-based grammar that includes both the emergence and
the unentrenchment of constructions
Validating and Exploring Large Geographic Corpora
This paper investigates the impact of corpus creation decisions on large
multi-lingual geographic web corpora. Beginning with a 427 billion word corpus
derived from the Common Crawl, three methods are used to improve the quality of
sub-corpora representing specific language-country pairs like New Zealand
English: (i) the agreement of independent language identification systems, (ii)
hash-based deduplication, and (iii) location-specific outlier detection. The
impact of each of these steps is then evaluated at the language level and the
country level by using corpus similarity measures to compare each resulting
corpus with baseline data sets. The goal is to understand the impact of
upstream data cleaning decisions on downstream corpora with a specific focus on
under-represented languages and populations. The evaluation shows that the
validity of sub-corpora is improved with each stage of cleaning but that this
improvement is unevenly distributed across languages and populations. This
result shows how standard corpus creation techniques can accidentally exclude
under-represented populations
- âŚ