14 research outputs found
XML technologies in language documentation workflows
More and more programs use XML formats for internal data storage, not only for interchange. This includes both general-purpose tools like MS Office and OpenOffice/LibreOffice and specialized linguistic software such as ELAN, EXMARaLDA, FLEx, Speech Analyzer, Arbil, WeSay, SayMore and so on. Thus more and more linguistic data are being created in XML, not just convertible to XML. Although not ideal (verbosity, high processing time), XML formats have a number of benefits to boost workflow efficiency. Importantly, XML documents can be processed with XSL transforms to get new data, remaining still in the realm of XML (the XSL transforms themselves are also XML and can be transformed by other XSL...), displayed as HTML or published into PDF. Finally, there are now mature free native-XML databases like eXist-db and BaseX which offer the full cycle of operations in one application with browser-based interface: store existing documents, browse and query data, create and edit data online, apply XSLT to publish. I will illustrate this with examples of transformations we used in language documentation workflow to convert interlinear texts in Archi (East Caucasian) between various formats including OpenOffice and FLEx. A connected issue which will be addressed is the need for an interchange standard format for interlinear texts
Reflections on software and technology for language documentation
Technological developments in the last decades enabled an unprecedented growth in volumes and quality of collected language data. Emerging challenges include ensuring the longevity of the records, making them accessible and reusable for fellow researchers as well as for the speech communities. These records are robust research data on which verifiable claims can be based and on which future research can be built, and are the basis for revitalization of cultural practices, including language and music performance. Recording, storage and analysis technologies become more lightweight and portable, allowing language speakers to actively participate in documentation activities. This also results in growing needs for training and support, and thus more interaction and collaboration between linguists, developers and speakers. Both cutting-edge speech technologies and crowdsourcing methods can be effectively used to overcome bottlenecks between different stages of analysis. While the endeavour to develop a single all-purpose integrated workbench for documentary linguists may not be achievable, investing in robust open interchange formats that can be accessed and enriched by independent pieces of software seems more promising for the near future.National Foreign Language Resource Cente
Design and baseline characteristics of the finerenone in reducing cardiovascular mortality and morbidity in diabetic kidney disease trial
Background: Among people with diabetes, those with kidney disease have exceptionally high rates of cardiovascular (CV) morbidity and mortality and progression of their underlying kidney disease. Finerenone is a novel, nonsteroidal, selective mineralocorticoid receptor antagonist that has shown to reduce albuminuria in type 2 diabetes (T2D) patients with chronic kidney disease (CKD) while revealing only a low risk of hyperkalemia. However, the effect of finerenone on CV and renal outcomes has not yet been investigated in long-term trials.
Patients and Methods: The Finerenone in Reducing CV Mortality and Morbidity in Diabetic Kidney Disease (FIGARO-DKD) trial aims to assess the efficacy and safety of finerenone compared to placebo at reducing clinically important CV and renal outcomes in T2D patients with CKD. FIGARO-DKD is a randomized, double-blind, placebo-controlled, parallel-group, event-driven trial running in 47 countries with an expected duration of approximately 6 years. FIGARO-DKD randomized 7,437 patients with an estimated glomerular filtration rate >= 25 mL/min/1.73 m(2) and albuminuria (urinary albumin-to-creatinine ratio >= 30 to <= 5,000 mg/g). The study has at least 90% power to detect a 20% reduction in the risk of the primary outcome (overall two-sided significance level alpha = 0.05), the composite of time to first occurrence of CV death, nonfatal myocardial infarction, nonfatal stroke, or hospitalization for heart failure.
Conclusions: FIGARO-DKD will determine whether an optimally treated cohort of T2D patients with CKD at high risk of CV and renal events will experience cardiorenal benefits with the addition of finerenone to their treatment regimen.
Trial Registration: EudraCT number: 2015-000950-39; ClinicalTrials.gov identifier: NCT02545049
Towards a more general model of interlinear text
The interlinear glossed text (IGT) is a complex object, the complexity of its structure depending on factors such as origin, intended use, languages involved etc. Developing tools and workflows for integrated linguistic analysis environments calls for particular attention to those aspects which in many common cases can be disregarded as insignificant; thus, collaborating for ELANâFLEx integration was particularly motivating for this paper.
IGT is often conceived of as a tree: the root node corresponds to the whole text, subdivided into smaller units (sentences, words, morphemes). Each unit has a number of associated annotations, generally one per information type, like sentence translation, part-of-speech label, morpheme gloss.
However, an IGT can easily amount to a large set of trees. Unresolved ambiguities of all kinds are one reason for it. Each pair of alternative analyses (e.g. two concurrent parses of a word) implies two distinct trees, identical except for the node in question and all its descendants. The more ambiguities arise, the more underlying trees should be posited. Still, all trees in such a tree family stem from a single analyzed object (transcript, original orthographic representation). Storing entire trees for each combination of relevant alternatives being utterly inefficient, a more compact storage model is needed.
Turning to the media dimension, an accurate transcript of a spontaneous discourse is most often unsuitable for a grammatical analysis without some preprocessing (normalization) dealing with various speech errors, incomprehensible fragments etc. to produce a grammatically correct and coherent text for subsequent grammatical analysis â whereas the ârawâ transcript feeds phonological and possibly discourse analysis. We thus get two distinct texts, interconnected but giving rise to independent (families of) analysis trees; only one of them is linked directly to the media timeline.
In some scenarios, more than one media-based timeline emerge which need to be interlinked (cf. BOLD framework: sound annotations to sound events; retelling experiments, e.g. pear stories; sign languages translated from/into spoken languages). The reference axis may not be properly a timeline (text, path through a complex graphic image).
One should mention further complicating factors such as multi-speaker and multi-lingual settings, collaboration and versioning.
The overall structure (an XML sketch will be presented) might grow unreasonably complex for any specialized analysis component to handle. It may thus be efficient to use an intermediate repository, e.g. a unified underlying RDF representation [Nakhimovsky et al. 2012], to which all changes made in specific tools are merged.
References
Bow, Cathy, Baden Hughes and Steven Bird. 2003. Towards a General Model of Interlinear Text.
Nakhimovsky, Alexander, Jeff Good, Tom Myers. 2012. Interoperability of Language Documentation Tools and Materials for Local Communities // Digital Humanities 2012
INEL Bibliographie
The bibliography comprises 2056 entries including references to all relevant linguistic and ethnologic publications for Selkup and Kamas language, further more numerous references for Dolgan, Ewenki, Nenets, Nganasan, Tatar and Enets. It is being supplemented and revised constantly by the members of the INEL project. A web-based and searchable version is available online