316 research outputs found
Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
Among mass digitization methods, double-keying is considered to be the one with the lowest error rate. This method requires two independent transcriptions of a text by two different operators. It is particularly well suited to historical texts, which often exhibit deficiencies like poor master copies or other difficulties such as spelling variation or complex text structures. Providers of data entry services using the double-keying method generally advertise very high accuracy rates (around 99.95% to 99.98%). These advertised percentages are generally estimated on the basis of small samples, and little if anything is said about either the actual amount of text or the text genres which have been proofread, about error types, proofreaders, etc. In order to obtain significant data on this problem it is necessary to analyze a large amount of text representing a balanced sample of different text types, to distinguish the structural XML/TEI level from the typographical level, and to differentiate between various types of errors which may originate from different sources and may not be equally severe. This paper presents an extensive and complex approach to the analysis and correction of double-keying errors which has been applied by the DFG-funded project "Deutsches Textarchiv" (German Text Archive, hereafter DTA) in order to evaluate and preferably to increase the transcription and annotation accuracy of double-keyed DTA texts. Statistical analyses of the results gained from proofreading a large quantity of text are presented, which verify the common accuracy rates for the double-keying method
The DTA “Base Format”: A TEI Subset for the Compilation of a Large Reference Corpus of Printed Text from Multiple Sources
In this article we describe the DTA “Base Format” (DTABf), a strict subset of the TEI P5 tag set. The purpose of the DTABf is to provide a balance between expressiveness and precision as well as an interoperable annotation scheme for a large variety of text types of historical corpora of printed text from multiple sources. The DTABf has been developed on the basis of a large amount of historical text data in the core corpus of the project Deutsches Textarchiv (DTA) and text collections from 15 cooperating projects with a current total of 210 million tokens. The DTABf is a “living” TEI format which is continuously adjusted when new text candidates for the DTA containing new structural phenomena are encountered. We also focus on other aspects of the DTABf including consistency, interoperability with other TEI dialects, HTML and other presentations of the TEI texts, and conversion into other formats, as well as linguistic analysis. We include some examples of best practices to illustrate how external corpora can be losslessly converted into the DTABf, thus enabling third parties to use the DTABf in their specific projects. The DTABf is comprehensively documented, and several software tools are available for working with it, making it a widely used format for the encoding of historical printed German text
Recommended from our members
Effector memory differentiation increases detection of replication-competent HIV-l in resting CD4+ T cells from virally suppressed individuals.
Studies have demonstrated that intensive ART alone is not capable of eradicating HIV-1, as the virus rebounds within a few weeks upon treatment interruption. Viral rebound may be induced from several cellular subsets; however, the majority of proviral DNA has been found in antigen experienced resting CD4+ T cells. To achieve a cure for HIV-1, eradication strategies depend upon both understanding mechanisms that drive HIV-1 persistence as well as sensitive assays to measure the frequency of infected cells after therapeutic interventions. Assays such as the quantitative viral outgrowth assay (QVOA) measure HIV-1 persistence during ART by ex vivo activation of resting CD4+ T cells to induce latency reversal; however, recent studies have shown that only a fraction of replication-competent viruses are inducible by primary mitogen stimulation. Previous studies have shown a correlation between the acquisition of effector memory phenotype and HIV-1 latency reversal in quiescent CD4+ T cell subsets that harbor the reservoir. Here, we apply our mechanistic understanding that differentiation into effector memory CD4+ T cells more effectively promotes HIV-1 latency reversal to significantly improve proviral measurements in the QVOA, termed differentiation QVOA (dQVOA), which reveals a significantly higher frequency of the inducible HIV-1 replication-competent reservoir in resting CD4+ T cells
A succinate/SUCNR1-brush cell defense program in the tracheal epithelium
Host-derived succinate accumulates in the airways during bacterial infection. Here, we show that luminal succinate activates murine tracheal brush (tuft) cells through a signaling cascade involving the succinate receptor 1 (SUCNR1), phospholipase Cβ2, and the cation channel transient receptor potential channel subfamily M member 5 (TRPM5). Stimulated brush cells then trigger a long-range Ca2+ wave spreading radially over the tracheal epithelium through a sequential signaling process. First, brush cells release acetylcholine, which excites nearby cells via muscarinic acetylcholine receptors. From there, the Ca2+ wave propagates through gap junction signaling, reaching also distant ciliated and secretory cells. These effector cells translate activation into enhanced ciliary activity and Cl− secretion, which are synergistic in boosting mucociliary clearance, the major innate defense mechanism of the airways. Our data establish tracheal brush cells as a central hub in triggering a global epithelial defense program in response to a danger-associated metabolite
Eco-evolutionary dynamics in fragmented landscapes
Peer reviewedPostprin
ART Suppresses Plasma HIV-1 RNA to a Stable Set Point Predicted by Pretherapy Viremia
Current antiretroviral therapy is effective in suppressing but not eliminating HIV-1 infection. Understanding the source of viral persistence is essential for developing strategies to eradicate HIV-1 infection. We therefore investigated the level of plasma HIV-1 RNA in patients with viremia suppressed to less than 50–75 copies/ml on standard protease inhibitor- or non-nucleoside reverse transcriptase inhibitor-containing antiretroviral therapy using a new, real-time PCR-based assay for HIV-1 RNA with a limit of detection of one copy of HIV-1 RNA. Single copy assay results revealed that >80% of patients on initial antiretroviral therapy for 60 wk had persistent viremia of one copy/ml or more with an overall median of 3.1 copies/ml. The level of viremia correlated with pretherapy plasma HIV-1 RNA but not with the specific treatment regimen. Longitudinal studies revealed no significant decline in the level of viremia between 60 and 110 wk of suppressive antiretroviral therapy. These data suggest that the persistent viremia on current antiretroviral therapy is derived, at least in part, from long-lived cells that are infected prior to initiation of therapy
- …