515 research outputs found

    Active learning and the Irish treebank

    Get PDF
    We report on our ongoing work in developing the Irish Dependency Treebank, describe the results of two Inter annotator Agreement (IAA) studies, demonstrate improvements in annotation consistency which have a knock-on effect on parsing accuracy, and present the final set of dependency labels. We then go on to investigate the extent to which active learning can play a role in treebank and parser development by comparing an active learning bootstrapping approach to a passive approach in which sentences are chosen at random for manual revision. We show that active learning outperforms passive learning, but when annotation effort is taken into account, it is not clear how much of an advantage the active learning approach has. Finally, we present results which suggest that adding automatic parses to the training data along with manually revised parses in an active learning setup does not greatly affect parsing accuracy

    Universal dependencies for Irish

    Get PDF
    Les ressources linguistiques permettant aux études cross-langues de se développer sont très importantes pour les langues minoritaires telles que l’irlandais, car elles favorisent le partage des ressources pour palier au problème du manque de données. Le projet «Universal Dependencies » (UD) a pour but de faciliter les études cross-langues des arbres syntaxiques, des structures linguistiques et de l’analyse syntaxique. L’objectif principal de ce projet est de former un ensemble harmonieux d’arbres syntaxiques en utilisant un schéma d’annotations universelles. Dans cet article, nous présentons la transformation de l’arbre de dépendance syntaxique irlandais (IDT) (Lynn, 2016) au schéma d’annotations universelles du projet UD, suivie d’une description claire des changements structurels nécessaires à cette conversion. Le nouvel arbre est ainsi appelé « Irish Universal Dependency Treebank » ( IUDT ). Language resources that enable cross-lingual studies have become increasingly valuable for lesserresourced languages such as Irish, as they allow for easier sharing of resources, thus overcoming the problem of data scarcity. The Universal Dependencies (UD) Project1 is an initiative aimed at cross-lingual studies of treebanks, linguistic structures and parsing. Its goal is to create a set of multilingual harmonised treebanks that are designed according to a universal annotation scheme. In this paper, we report on the conversion of the Irish Dependency Treebank (IDT) (Lynn, 2016) to a UD version of the treebank which we term the Irish Universal Dependency Treebank (IUDT). We report on the mapping of the IDT labelling scheme to the UD scheme, along with a clear description of the structural changes required in this conversion

    Irish treebanking and parsing: a preliminary evaluation

    Get PDF
    Language resources are essential for linguistic research and the development of NLP applications. Low- density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish – namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development

    Morphological features of the Irish universal dependency treebank

    Get PDF
    The Universal Dependencies Project1 (Nivre, [9]; Nivre et al., [10]) is an ongoing effort towards creating a set of harmonised dependency treebanks that are annotated and structured according to universal guidelines. This paper reports on the addition of morphological features to the Irish Universal Dependencies Treebank (IUDT). Our feature set subscribes to the feature inventory of the UD Project and has been mapped from Irish morpho-syntactic tags – the output of a Finite State Morphological Analyser for Irish (Uí Dhonnchadha and van Genabith [16]). Irish, a Celtic language, has some relatively unusual morphological features that require language-specific labels not covered by the universal feature set. In this paper, we summarise the Irish-specific features that we have added to this set by explaining the linguistic properties that they each describe. We also report on the first parsing experiments using the IUDT by assessing the effect that the inclusion of morphological features has on parsing accuracy

    Denial of abortion in legal settings.

    Get PDF
    BackgroundFactors such as poverty, stigma, lack of knowledge about the legal status of abortion, and geographical distance from a provider may prevent women from accessing safe abortion services, even where abortion is legal. Data on the consequences of abortion denial outside of the US, however, are scarce.MethodsIn this article we present data from studies among women seeking legal abortion services in four countries (Colombia, Nepal, South Africa and Tunisia) to assess sociodemographic characteristics of legal abortion seekers, as well as the frequency and reasons that women are denied abortion care.ResultsThe proportion of women denied abortion services and the reasons for which they were denied varied widely by country. In Colombia, 2% of women surveyed did not receive the abortions they were seeking; in South Africa, 45% of women did not receive abortions on the day they were seeking abortion services. In both Tunisia and Nepal, 26% of women were denied their wanted abortions.ConclusionsThe denial of legal abortion services may have serious consequences for women's health and wellbeing. Additional evidence on the risk factors for presenting later in pregnancy, predictors of seeking unsafe illegal abortion, and the health consequences of illegal abortion and childbirth after an unwanted pregnancy is needed. Such data would assist the development of programmes and policies aimed at increasing access to and utilisation of safe abortion services where abortion is legal, and harm reduction models for women who are unable to access legal abortion services

    Cross-lingual transfer parsing for low-resourced languages: an Irish case study

    Get PDF
    We present a study of cross-lingual direct transfer parsing for the Irish language. Firstly we discuss mapping of the annotation scheme of the Irish Dependency Treebank to a universal dependency scheme. We explain our dependency label mapping choices and the structural changes required in the Irish Dependency Treebank. We then experiment with the universally annotated treebanks of ten languages from four language family groups to assess which languages are the most useful for cross-lingual parsing of Irish by using these treebanks to train delexicalised parsing models which are then applied to sentences from the Irish Dependency Treebank. The best results are achieved when using Indonesian, a language from the Austronesian language family

    Underground mining of aggregates. Main report

    Get PDF
    This report examines the economic feasibility of underground mining for crushed rock aggregates in the UK, but particularly in the London, South East and East of England regions (the South East area of England). These regions import substantial volumes of crushed rock, primarily from the East Midlands and South West regions, requiring relatively long transport distances to market for this bulk commodity. A key part of the research was to determine whether or not aggregate could be produced and delivered to a local market from an underground aggregates operation at a cost comparable with that for production and transport of the commodity from traditional surface quarries located further afield. In essence the investigation asked – could the reduced transport costs compensate for the higher production costs underground so that underground crushed rock aggregates producers can compete with the established Leicestershire and Somerset surface quarries exporting to the South East? Work Programme The research effort involved establishing and verifying cost models for aggregates production, stone processing (sizing and sorting), haulage of product to market, environmental impact mitigation, health and safety, decommissioning and restoration. Another major element of the work was the re-examination of the BGS exploratory borehole and geophysical databases to identify potential areas of crushed rock aggregates resource at depth in the South East area of England. Land use pressure is typically higher in this area of England than elsewhere so another major part of the research was the identification of potential concurrent uses of land around the surface facilities of underground aggregates mines. The value, development costs for specific developments and determination of yields expected, from these uses were estimated. These were also used to investigate potential economic benefits associated with after uses of remediated surface land above potential underground aggregates mines and also for the new underground space that would be created. Key technical issues such as subsidence within relatively heavily populated areas of the South East area of England were also addressed. Economic Results The discounted cost of aggregate delivered at a discount rate of 10% was the metric used to appraise the options. This is the price of aggregate that leads to a zero net present value of project cash flows realised over the aggregates project life. The results show that the discounted costs of aggregate delivered to a local South East area of England market from an underground mine producing 3.5 million tonnes per annum (MTPA) of crushed rock aggregates, are in the range of £13.03 per tonne to £13.93 per tonne for the top six prospect locations. These are greater than the corresponding cost for a “reference” quarry in Leicestershire producing 3.5 MTPA (£10.95 per tonne), but lower than a “reference” quarry in Leicestershire producing 1.25 MTPA (£16.48 per tonne). These figures indicate that underground crushed rock aggregate mines located within the South East area of England may be able to compete for a share in the overall market by replacing / displacing aggregate imported from the quarries in Leicestershire and Somerset producing around or less than 1.25 MTPA. The surprise in these figures is not really that the more remote surface quarry has a lower discounted cost of aggregate delivered, but that the values for the quarry and underground mine are so close. The capital intensity for the development of underground aggregates mines was found to be higher than that required for surface quarries of comparable scale, by a factor ranging from 1.33 to 1.65 and thus may represent a disincentive for aggregates operators. Carbon Emissions The total carbon emissions of the ‘reference’ 3.5 MTPA quarry in Leicestershire were estimated at 9.28 kg CO2/tonne aggregate delivered and this is to be compared with carbon emissions for the 150 metre deep underground mines serving the local market which were estimated at 9.31 kg CO2/tonne delivered for a Bletchley prospect using an adit to access the sub-surface and 14.25 kg CO2/tonne delivered for a prospect based on the Chitty bore hole using a shaft. Depth of the mine is a key factor in determination of the relative carbon emissions from each of the underground mining operations considered as electricity consumption for ventilation, pumping and winding is proportional to depth. Recommendations The current research generated seven principal recommendations which are discussed in detail in the concluding section of the report. These are: Appraise policy incentives for underground aggregates mining. Conduct an industry-wide consultation on findings from the current research. Obtain public and stakeholder opinion on new uses for underground space. Conduct research to reducing the energy intensity of mine services. Develop a deep level aggregates-specific drilling campaign. Investigate underground aggregates mines developed from existing surface quarries. Investigate underground aggregates as co-products of industrial minerals mining

    A validation study of the Eurostat harmonised European time use study (HETUS) diary using wearable technology

    Get PDF
    Background The central aim was to examine the accuracy of the full range of daily activities recorded in self-report time-use diaries against data from two objective passive data collection devices (wearable camera and accelerometer) serving as criterion reference instruments. This enabled systematic checks and comparisons on the timing, sequence and duration of activities recorded from the three data sources. Methods Participants (n = 148) were asked to complete a single-day self-report paper time-use diary designed for use in the Harmonised European Time Use Study (HETUS), while simultaneously wearing a camera that continuously recorded images of their activities, and an accelerometer tracking physical movement. In a reconstruction interview shortly after the data collection period, participants viewed the camera images to help researchers interpret the image sequences. Of the initial 148 recruits (multi-seed snowball sample, 59% women, aged 18–91, 43% > 40) 131 returned usable diary and camera records (of whom 124 also provided a usable whole-day accelerometer record. We compare time allocation estimates from the diary and camera records, and also match the diary and camera records to the simultaneously recorded accelerometer vector magnitudes. Results The data were examined at three analytic levels: aggregate, individual diarist and timeslot. The most important finding is that the estimates of mean daily time devoted to 8 of the 10 main activities differ by < 10% in the camera and diary records. The single case of major divergence (eating) can be explained by a systematic difference between the procedures followed by the self-reporting diarist and the observer coding the camera records. There are more substantial differences at the respondent level, paired t-tests showing significant differences in time spent in the 4/10 categories. 45% of all variation in the accelerometer vector magnitudes in the timeslots is explained by camera and diary records. Detailed activity classifications perform much better than METs as predictors of actigraphy. Conclusions The comparison of the diary with the camera and accelerometer records strongly supports using diary methodology for studying the full range of daily activity, particularly at aggregate levels. Accelerometer data could be combined with diary measures to improve estimation of METs equivalents for various types of active and sedentary behaviour
    corecore