558 research outputs found

    Profiling of OCR'ed Historical Texts Revisited

    Full text link
    In the absence of ground truth it is not possible to automatically determine the exact spectrum and occurrences of OCR errors in an OCR'ed text. Yet, for interactive postcorrection of OCR'ed historical printings it is extremely useful to have a statistical profile available that provides an estimate of error classes with associated frequencies, and that points to conjectured errors and suspicious tokens. The method introduced in Reffle (2013) computes such a profile, combining lexica, pattern sets and advanced matching techniques in a specialized Expectation Maximization (EM) procedure. Here we improve this method in three respects: First, the method in Reffle (2013) is not adaptive: user feedback obtained by actual postcorrection steps cannot be used to compute refined profiles. We introduce a variant of the method that is open for adaptivity, taking correction steps of the user into account. This leads to higher precision with respect to recognition of erroneous OCR tokens. Second, during postcorrection often new historical patterns are found. We show that adding new historical patterns to the linguistic background resources leads to a second kind of improvement, enabling even higher precision by telling historical spellings apart from OCR errors. Third, the method in Reffle (2013) does not make any active use of tokens that cannot be interpreted in the underlying channel model. We show that adding these uninterpretable tokens to the set of conjectured errors leads to a significant improvement of the recall for error detection, at the same time improving precision

    Rebricking frames and bases

    Full text link
    In 1949, Denis Gabor introduced the ``complex signal'' (nowadays called ``analytic signal'') by combining a real function ff with its Hilbert transform HfHf to a complex function f+iHff+ iHf. His aim was to extract phase information, an idea that has inspired techniques as the monogenic signal and the complex dual tree wavelet transform. In this manuscript, we consider two questions: When do two real-valued bases or frames {fn:n∈N}\{f_{n} : n\in\mathbb{N}\} and {gn:n∈N}\{g_{n} : n\in\mathbb{N}\} form a complex basis or frame of the form {fn+ign:n∈N}\{f_{n} + i g_{n}: n\in\mathbb{N}\}? And for which bounded linear operators AA forms {fn+iAfn:n∈N}\{f_{n} + i A f_{n} : n\in\mathbb{N}\} a complex-valued orthonormal basis, Riesz basis or frame, when {fn:n∈N}\{f_{n} : n\in\mathbb{N}\} is a real-valued orthonormal basis, Riesz basis or frame? We call this approach \emph{rebricking}. It is well-known that the analytic signals don't span the complex vector space L2(R;C)L^{2}(\mathbb{R}; \mathbb{C}), hence HH is not a rebricking operator. We give a full characterization of rebricking operators for bases, in particular orthonormal and Riesz bases, Parseval frames, and frames in general. We also examine the special case of finite dimensional vector spaces and show that we can use any real, invertible matrix for rebricking if we allow for permutations in the imaginary part.Comment: 39 pages, 1 tabl

    Semantische Indexierung mit expliziten Wissensressourcen

    Get PDF

    Semantische Indexierung mit expliziten Wissensressourcen

    Get PDF

    Cognitive Reserve in Young and Old Healthy Subjects: Differences and Similarities in a Testing-the-Limits Paradigm with DSST

    Get PDF
    Cognitive reserve (CR) is understood as capacity to cope with challenging conditions, e. g. after brain injury or in states of brain dysfunction, or age-related cognitive decline. CR in elderly subjects has attracted much research interest, but differences between healthy older and younger subjects have not been addressed in detail hitherto. Usually, one-time standard individual assessments are used to characterise CR. Here we observe CR as individual improvement in cognitive performance (gain) in a complex testing-the-limits paradigm, the digit symbol substitution test (DSST),with 10 repeated measurements, in 140 younger (20-30 yrs) and 140 older (57-74 yrs) healthy subjects. In addition, we assessed attention, memory and executive function, and mood and personality traits as potential influence factors for CR. We found that both, younger and older subjects showed significant gains, which were significantly correlated with speed of information processing, verbal short-term memory and visual problem solving in the older group only. Gender, personality traits and mood did not significantly influence gains in either group. Surprisingly about half of the older subjects performed at the level of the younger group, suggesting that interindividual differences in CR are possibly age-independent. We propose that these findings may also be understood as indication that one-time standard individual measurements do not allow assessment of CR, and that the use of DSST in a testing-the-limits paradigm is a valuable assessment method for CR in young and elderly subjects

    Towards a more comprehensive assessment of the intensity of historical European heat waves (1979–2019)

    Get PDF
    Europe has been affected by record-breaking heat waves in recent decades. Using station data and a gridded reanalysis as input, four commonly used heat wave indices, the heat wave magnitude index daily (HWMId), excess heat factor (EHF), wet-bulb globe temperature (WBGT) and universal thermal climate index (UTCI), are computed. The extremeness of historical European heat waves between 1979 and 2019 using the four indices and different metrics is ranked. A normalisation to enable the comparison between the four indices is introduced. Additionally, a method to quantify the influence of the input parameters on heat wave magnitude is introduced. The spatio-temporal behaviour of heat waves is assessed by spatial–temporal tracking. The areal extent, large-scale intensity and duration are visualized using bubble plots. As expected, temperature explains the largest variance in all indices, but humidity is nearly as important in WBGT and wind speed plays a substantial role in UTCI. While the 2010 Russian heat wave is by far the most extreme event in duration and intensity in all normalized indices, the 2018 heat wave was comparable in size for EHF, WBGT and UTCI. Interestingly, the well-known 2003 central European heat wave was only the fifth and tenth strongest in cumulative intensity in WBGT and UTCI, respectively. The June and July 2019 heat waves were very intense, but short-lived, thus not belonging to the top heat waves in Europe when duration and areal extent are taken into account. Overall, the proposed normalized indices and the multi-metric assessment of large-scale heat waves allow for a more robust description of their extremeness and will be helpful to assess heat waves worldwide and in climate projections

    The devil in the detail of storms

    Get PDF

    Birth of the Biscane

    Get PDF
    • …
    corecore