273 research outputs found

    Referring to discourse participants in Ibero-Romance languages

    Get PDF
    Synopsis: This volume brings together contributions by researchers focusing on personal pronouns in Ibero-Romance languages, going beyond the well-established variable of expressed vs. non-expressed subjects. While factors such as agreement morphology, topic shift and contrast or emphasis have been argued to account for variable subject expression, several corpus studies on Ibero-Romance languages have shown that the expression of subject pronouns goes beyond these traditionally established factors and is also subject to considerable dialectal variation. One of the factors affecting choice and expression of personal pronouns or other referential devices is whether the construction is used personally or impersonally. The use and emergence of new impersonal constructions, eventually also new (im)personal pronouns, as well as the variation found in the expression of human impersonality in different Ibero-Romance language varieties is another interesting research area that has gained ground in the recent years. In addition to variable subject expression, similar methods and theoretical approaches have been applied to study the expression of objects. Finally, the reference to the addressee(s) using different address pronouns and other address forms is an important field of study that is closely connected to the variable expression of pronouns. The present book sheds light on all these aspects of reference to discourse participants. The volume contains contributions with a strong empirical background and various methods and both written and spoken corpus data from Ibero-Romance languages. The focus on discourse participants highlights the special properties of first and second person referents and the factors affecting them that are often different from the anaphoric third person. The chapters are organized into three thematic sections: (i) Variable expression of subjects and objects, (ii) Between personal and impersonal, and (iii) Reference to the addressee

    The Realisation of syntactic principles in non-standard Afrikaans: the correspondence of Jan Jonker Afrikaner (1820-1889)

    Get PDF
    This study compares the syntax of nineteenth-century Orange River Afrikaans with Dutch and synchronic Afrikaans varieties, with particular attention to Griqua Afrikaans. It provides an account of the differences that are found between the earliest attestations of an extraterritorial variety of the Dutch language on southern African soil (the so-called Cape Dutch Vernacular) with the present-day outcome. The data collected for this study originate chiefly from an hitherto undisclosed corpus of letters kept in the Namibian State Archives by the so-called Oorlam-Nama, people of mixed descent who lived on the periphery of the nineteenth- century Cape colonial society. This thesis argues that nineteenth-century Orange River Afrikaans is a representative continuation of the earliest developments in the linguistic contact situation that existed at the Cape. The thesis advances that literacy and social class are important factors in the assessment of the written record from the Dutch colony at the Cape. The thesis centers around the letters by one author, Jan Jonker Afrikaner, written over a period of nearly twenty years in the second half of the nineteenth century. This legacy is a unique contribution to the diachronic data concerning the development of Afrikaans. From the data it is shown that this author had the command over different registers, fluctuating between a near perfect metropolitan Dutch and a Hollands that is classified as basilectal Afrikaans. The comparison of the data is set in a framework inspired by the concepts put forward in Generative Grammar. This has precipitated an exciting linguistic comparison of contemporary Afrikaans grammar with the diachronic material. This dissertation challenges the idea that the Khoesan Languages were of no or little influence in the development of Afrikaans. The linguistic analysis of the nineteenth-century data reveal that the developments which took place cannot be attributed to one single origin. It is demonstrated that the innovations and change that can be identified run parallel to regular patterns that are found in other languages generally classified as creole languages. It is argued that the syntax of the Khoesan languages is a major reinforcing factor in the development of the syntactic idiosyncrasies that are identified as un-Germanic characteristics of Afrikaans. Limited to nonstandard varieties of Afrikaans, in the concluding sections the question is raised how these findings are to be addressed in the larger context of language change

    Discovering Causal Relations and Equations from Data

    Full text link
    Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.Comment: 137 page

    Deep Neural Networks and Tabular Data: Inference, Generation, and Explainability

    Get PDF
    Over the last decade, deep neural networks have enabled remarkable technological advancements, potentially transforming a wide range of aspects of our lives in the future. It is becoming increasingly common for deep-learning models to be used in a variety of situations in the modern life, ranging from search and recommendations to financial and healthcare solutions, and the number of applications utilizing deep neural networks is still on the rise. However, a lot of recent research efforts in deep learning have focused primarily on neural networks and domains in which they excel. This includes computer vision, audio processing, and natural language processing. It is a general tendency for data in these areas to be homogeneous, whereas heterogeneous tabular datasets have received relatively scant attention despite the fact that they are extremely prevalent. In fact, more than half of the datasets on the Google dataset platform are structured and can be represented in a tabular form. The first aim of this study is to provide a thoughtful and comprehensive analysis of deep neural networks' application to modeling and generating tabular data. Apart from that, an open-source performance benchmark on tabular data is presented, where we thoroughly compare over twenty machine and deep learning models on heterogeneous tabular datasets. The second contribution relates to synthetic tabular data generation. Inspired by their success in other homogeneous data modalities, deep generative models such as variational autoencoders and generative adversarial networks are also commonly applied for tabular data generation. However, the use of Transformer-based large language models (which are also generative) for tabular data generation have been received scant research attention. Our contribution to this literature consists of the development of a novel method for generating tabular data based on this family of autoregressive generative models that, on multiple challenging benchmarks, outperformed the current state-of-the-art methods for tabular data generation. Another crucial aspect for a deep-learning data system is that it needs to be reliable and trustworthy to gain broader acceptance in practice, especially in life-critical fields. One of the possible ways to bring trust into a data-driven system is to use explainable machine-learning methods. In spite of this, the current explanation methods often fail to provide robust explanations due to their high sensitivity to the hyperparameter selection or even changes of the random seed. Furthermore, most of these methods are based on feature-wise importance, ignoring the crucial relationship between variables in a sample. The third aim of this work is to address both of these issues by offering more robust and stable explanations, as well as taking into account the relationships between variables using a graph structure. In summary, this thesis made a significant contribution that touched many areas related to deep neural networks and heterogeneous tabular data as well as the usage of explainable machine learning methods

    From Pitch to Commentary Gantry: Investigating Syntactic Features in High-Pressure Events in the Sports Announcer Talk (SAT) of Football Commentators.

    Get PDF
    This MA thesis investigates the various syntactic elements in the register of football commentators to determine the presence of prevailing notions of temporal pressure. More precisely, the register of football commentators is defined as Sports Announcer Talk (SAT). Drawing on the theories of Increment Functional Grammar (IFG), this investigation addresses four research questions concerning on-pitch occurrences, the application of holophrastic utterances, formulaic language, and syntactic characteristics in high-pressure situations. The study reveals a correlation between the employment of holophrastic utterances by commentators and the statistical metric Expected Goals (xG), wherein a predominant frequency of holophrastic utterances is associated with greater xG values. Furthermore, the prevalence and functions of time expressions are scrutinised, with the conclusion that different roles of commentary (play-by-play and colour commentators) exhibit varying frequencies of expressions. The analysis of formulaic routines in goal-scoring events identifies persistent structures involving player names, goals, metonymic and metaphoric sequences. Moreover, the pervasive usage of right dislocation (RD) structures is discussed to explore their connection with temporal pressure. This study posits that RD structures are ubiquitous in the SAT register of football commentary contributing to their fluency in both pressurised and unpressurised linguistic settings. Overall, this study establishes the prevalence of temporal pressure in the SAT of football commentators and emphasises the significance of certain syntactic elements in high-pressure situations

    Discovering causal relations and equations from data

    Get PDF
    Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws, and principles that are invariant, robust, and causal has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventions on the system under study. With the advent of big data and data-driven methods, the fields of causal and equation discovery have developed and accelerated progress in computer science, physics, statistics, philosophy, and many applied fields. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for data-driven causal and equation discovery, point out connections, and showcase comprehensive case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is revolutionised with the efficient exploitation of observational data and simulations, modern machine learning algorithms and the combination with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems

    Application of the Machine Learning Tools in the Integrity Management of Pipelines Containing Dent-gouges and Corrosions

    Get PDF
    Dent-gouges and corrosions are two of the well-known failure mechanisms that threaten the structural integrity management of oil and gas pipelines. Dent-gouges or corrosions markedly reduce the burst capacity of pipelines as a result of localized wall thickness reduction. Fitness-for-service (FFS) assessment is commonly employed to maintain the integrity of in-service pipelines containing flaws and the burst capacity evaluation is central to the FFS assessment. As the predictive accuracy of existing FFS models is generally very poor, the use of machine learning (ML) tools provides a viable option to develop burst capacity models with high accuracy. The main objective of the present thesis is to facilitate the FFS assessment of dent-gouges and corrosions based on ML tools. The first study proposes an improved burst capacity model for pipelines containing dent-gouges based on European Pipeline Research Group (EPRG) burst capacity model using full-scale burst tests by adding a correction term. The Gaussian process regression (GPR) is employed to quantify the correction term, which is a function of six non-dimensional random variables incorporating the effect of pipe and geometric properties, sizes of dent-gouges, and internal pressure loading condition. The accuracy of the improved EPRG model, i.e. EPRG-C model, is validated based on the comparison between the test and predicted burst capacities corresponding to the test data, and shown to be markedly greater than that of the EPRG model, suggesting the high effectiveness of the correction term. The second study presents a limit state-based assessment (LSBA) framework for pipelines containing dent-gouges to achieve reliability consistent outcomes. The LSBA is formulated based on the EPRG-C model proposed in the first study by assigning appropriate partial safety factors to key variables as well as the internal pressure. The calibration of partial safety factors is carried out by making the outcomes of LSBA are consistent with those of the reliability-based assessment given different pre-selected allowable failure probabilities. The failure probabilities corresponding to extensive assessment cases covering wide ranges of pipe geometric and material properties, sizes of dent-gouges and the model error are evaluated using the first-order reliability method. The validity of the calibrated partial safety factors is demonstrated using independent assessment cases and two illustrative examples. The advantages of LSBA over the deterministic assessment procedure in terms of achieving reliability-consistent assessment outcomes is further demonstrated. The third study employs a deep learning algorithm tabular generative adversarial network (TGAN) to generate synthetic burst tests by capturing the joint probability distribution based on real full-scale burst test data of corroded pipelines. Two other ML tools, random forest (RF) and extra tree (ET), are used to tune the hyper-parameters and validate the credibility of TGAN-generated data. A simple criterion is proposed to eliminate the outliers contained in the synthetic data. The results indicate that the synthetic burst test data match well with the real data, suggesting that TGAN can accurately capture the joint probability distribution of real test data and generate credible synthetic data. The fourth study develops new ML-based burst capacity models for dent-gouges with combined real and synthetic full-scale burst tests. The synthetic burst test data are generated using TGAN framework, which is proposed in the third study. The results of which are used as the basis combined with the real burst tests to develop ML burst capacity models based on three ML tools, i.e. RF, ET and GPR. The proposed models are shown to be more accurate than the models developed using real test data only. The analysis result further indicates that trained models are markedly more accurate than the semi-empirical EPRG model widely employed in the pipeline industry

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    The Realisation of syntactic principles in non-standard Afrikaans: the correspondence of Jan Jonker Afrikaner (1820-1889)

    Get PDF
    This study compares the syntax of nineteenth-century Orange River Afrikaans with Dutch and synchronic Afrikaans varieties, with particular attention to Griqua Afrikaans. It provides an account of the differences that are found between the earliest attestations of an extraterritorial variety of the Dutch language on southern African soil (the so-called Cape Dutch Vernacular) with the present-day outcome. The data collected for this study originate chiefly from an hitherto undisclosed corpus of letters kept in the Namibian State Archives by the so-called Oorlam-Nama, people of mixed descent who lived on the periphery of the nineteenth- century Cape colonial society. This thesis argues that nineteenth-century Orange River Afrikaans is a representative continuation of the earliest developments in the linguistic contact situation that existed at the Cape. The thesis advances that literacy and social class are important factors in the assessment of the written record from the Dutch colony at the Cape. The thesis centers around the letters by one author, Jan Jonker Afrikaner, written over a period of nearly twenty years in the second half of the nineteenth century. This legacy is a unique contribution to the diachronic data concerning the development of Afrikaans. From the data it is shown that this author had the command over different registers, fluctuating between a near perfect metropolitan Dutch and a Hollands that is classified as basilectal Afrikaans. The comparison of the data is set in a framework inspired by the concepts put forward in Generative Grammar. This has precipitated an exciting linguistic comparison of contemporary Afrikaans grammar with the diachronic material. This dissertation challenges the idea that the Khoesan Languages were of no or little influence in the development of Afrikaans. The linguistic analysis of the nineteenth-century data reveal that the developments which took place cannot be attributed to one single origin. It is demonstrated that the innovations and change that can be identified run parallel to regular patterns that are found in other languages generally classified as creole languages. It is argued that the syntax of the Khoesan languages is a major reinforcing factor in the development of the syntactic idiosyncrasies that are identified as un-Germanic characteristics of Afrikaans. Limited to nonstandard varieties of Afrikaans, in the concluding sections the question is raised how these findings are to be addressed in the larger context of language change
    corecore