45 research outputs found
Towards Machine-Assisted Meta Studies of Astrophysical Data From the Scientific Literature
We develop a new model for automatic extraction of reported measurements from the astrophysical literature, utilising modern Natural Language Processing techniques. We begin with a rules-based model for keyword-search-based extraction, and then proceed to develop artificial neural network models for full entity and relation extraction from free text. This process also requires the creation of hand-annotated datasets selected from the available astrophysical literature for training and validation purposes. We use a set of cosmological parameters to examine the model's ability to identify information relating to a specific parameter and to illustrate its capabilities, using the Hubble constant as a primary case study due to the well-document history of that parameter. Our results correctly highlight the current tension present in measurements of the Hubble constant and recover the 3.5σ discrepancy – demonstrating that the models are useful for meta-studies of astrophysical measurements from a large number of publications. From the other cosmological parameter results we can clearly observe the historical trends in the reported values of these quantities over the past two decades, and see the impacts of landmark publications on our understanding of cosmology. The outputs of these models, when applied to the article abstracts present in the arXiv repository, constitute a database of over 231,000 astrophysical numerical measurements, relating to over 61,000 different symbolic parameter representations – here a measurement refers to the combination of a numerical value and an identifier (i.e. a name or symbol) to give it physical meaning. We present an online interface (Numerical Atlas) to allow users to query and explore this database, based on parameter names and symbolic representations, and download the resulting datasets for their own research uses
Theoretical and empirical arguments for the reassessment of the notion of paradigm
The volume discusses the breadth of applications for an extended notion of paradigm. Paradigms in this sense are not only tools of morphological description but constitute the inherent structure of grammar. Grammatical paradigms are structural sets forming holistic, semiotic structures with an informational value of their own. We argue that as such, paradigms are a part of speaker knowledge and provide necessary structuring for grammaticalization processes. The papers discuss theoretical as well as conceptual questions and explore different domains of grammatical phenomena, ranging from grammaticalization, morphology, and cognitive semantics to modality, aiming to illustrate what the concept of grammatical paradigms can and cannot (yet) explain
MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS
This thesis presents a theoretical framework for the design of user-programmable
robots. The objective of the work is to investigate multi-modal unconstrained natural
instructions given to robots in order to design a learning robot. A corpus-centred
approach is used to design an agent that can reason, learn and interact with a human in a
natural unconstrained way. The corpus-centred design approach is formalised and
developed in detail. It requires the developer to record a human during interaction and
analyse the recordings to find instruction primitives. These are then implemented into a
robot. The focus of this work has been on how to combine speech and gesture using
rules extracted from the analysis of a corpus. A multi-modal integration algorithm is
presented, that can use timing and semantics to group, match and unify gesture and
language. The algorithm always achieves correct pairings on a corpus and initiates
questions to the user in ambiguous cases or missing information. The domain of card
games has been investigated, because of its variety of games which are rich in rules and
contain sequences. A further focus of the work is on the translation of rule-based
instructions. Most multi-modal interfaces to date have only considered sequential
instructions. The combination of frame-based reasoning, a knowledge base organised as
an ontology and a problem solver engine is used to store these rules. The understanding
of rule instructions, which contain conditional and imaginary situations require an agent
with complex reasoning capabilities. A test system of the agent implementation is also
described. Tests to confirm the implementation by playing back the corpus are
presented. Furthermore, deployment test results with the implemented agent and human
subjects are presented and discussed. The tests showed that the rate of errors that are
due to the sentences not being defined in the grammar does not decrease by an
acceptable rate when new grammar is introduced. This was particularly the case for
complex verbal rule instructions which have a large variety of being expressed
Paradigms regained
The volume discusses the breadth of applications for an extended notion of paradigm. Paradigms in this sense are not only tools of morphological description but constitute the inherent structure of grammar. Grammatical paradigms are structural sets forming holistic, semiotic structures with an informational value of their own. We argue that as such, paradigms are a part of speaker knowledge and provide necessary structuring for grammaticalization processes. The papers discuss theoretical as well as conceptual questions and explore different domains of grammatical phenomena, ranging from grammaticalization, morphology, and cognitive semantics to modality, aiming to illustrate what the concept of grammatical paradigms can and cannot (yet) explain
Towards the Repayment of Self-Admitted Technical Debt
Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often must be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt, often referred to as Self-Admitted Technical Debt (SATD).
In this thesis, we survey research work on SATD, analyzing characteristics of current approaches and techniques for SATD, dividing literature in three categories: detection, comprehension, and repayment. To set the stage for novel and improved work on SATD, we compile tools, resources, and data sets made publicly available. We also identify areas that are missing investigation, open challenges, and discuss potential future research avenues. From the literature survey, we conclude that most findings and contributions have focused on techniques to identify, classify, and comprehend SATD. Few studies focused on the repayment or management of SATD, which is an essential goal of studying technical debt for software maintenance.
Therefore, we perform an empirical study towards SATD repayment. We conducted a preliminary online survey with developers to understand the elements they consider to prioritize SATD. With the acquired knowledge from the survey responses and previous literature work, we select metrics to estimate SATD repayment effort. We examine SATD instances found in software systems to see how it has been repaid and investigate the possibility of using historical data at the time of SATD introduction as indicators for SATD that should be addressed. We find two SATD repayment effort metrics that can be consistently modeled in our studied projects and surface the best early indicators for important SATD
Paradigms regained
The volume discusses the breadth of applications for an extended notion of paradigm. Paradigms in this sense are not only tools of morphological description but constitute the inherent structure of grammar. Grammatical paradigms are structural sets forming holistic, semiotic structures with an informational value of their own. We argue that as such, paradigms are a part of speaker knowledge and provide necessary structuring for grammaticalization processes. The papers discuss theoretical as well as conceptual questions and explore different domains of grammatical phenomena, ranging from grammaticalization, morphology, and cognitive semantics to modality, aiming to illustrate what the concept of grammatical paradigms can and cannot (yet) explain
Statistical modelling of lexical and syntactic complexity of postgraduate academic writing: a genre and corpus-based study of EFL,ESL, and English L1 M.A. dissertations
This research is an interdisciplinary study that adopts the principles of corpus linguistics and the methods of quantitative linguistics and statistical modelling to analyse the rhetorical sections of MA dissertations written by EFL, ESL, and English L1 postgraduate students. A discipline-specific corpus was analysed for 22 lexical and 11 syntactic complexity measures using three natural language processing tools [LCA-AW, TAALED, Coh Metrix] to find differences of academic texts by English L1 vs. L2 and to investigate the relationship between these linguistic indices. Structural factor analyses as well as the two statistical modelling methods of linear mixed-effects modelling and the supervised machine learning predictive classification modelling were then employed to verify the existing classification of the complexity indices, to explore their further dimensions, to investigate the effects of English language background and rhetorical sections on the production of lexically and syntactically complex texts, and finally to predict models that can best classify the group membership and the membership to the rhetorical sections based on the values of these measures. This investigation resulted in more than 20 specific findings with important implications for academic writing assessment of English L1 vs. L2, for academic writing research on rhetorical sections of English academic texts, for academic writing instruction especially materials development and syllabus designs in the EFL contexts, and academic immersion programmes, for the measure-testing and selection processes, and for methodological aspects of statistical modelling in corpus-based academic studies