28 research outputs found

    Logical structure recognition for heterogeneous periodical collections

    No full text
    This work introduces a practical method for performing logical layout analysis on heterogeneous periodical collections. The described module is incorporated into the Fraunhofer document image understanding system and has been successfully used as part of mass digitization projects on more than 500 000 scanned pages. Our primary target are documents with complex layouts such as newspapers, however the described methods can easily be adapted to non-periodical publications. While encouraging, experimental results obtained on a heterogeneous set of digitized newspaper and chronicle pages spanning about 70 years reflect the high complexity of the generic, automated layout analysis problem. Our results allow the identification of promising areas for future investigation and provide a baseline for current in-the-wild document logical structure recognition

    Supporting radio archive workflows with vocabulary independent spoken keyword search

    No full text
    Archive departments of large radio broadcasters stand to benefit greatly from speech recognition technology and other audio processing techniques. In order to move towards a practical understanding of how these technologies can support archive staff, two large German radio broadcasters, Deutsche Welle and Westdeutscher Rundfunk, commissioned Fraunhofer IAIS to build a German-language radio archive prototype. This paper discusses the development and assessment of the spoken keyword search module of this prototype. The search module was designed and tested in a project group consisting of both multimedia researchers and archive professionals. As a result, the prototype is unique in that its design and evaluation are tuned explicitly to the requirements of archivists. The paper discusses the special needs of radio archive staff and how they were accommodated in the design of the keyword search functionality. In particular, the archive staff required a vocabulary-independent search facility capable of searching for keywords in an archive containing a high proportion of spontaneous speech. Keyword search is implemented using a fuzzy-matching algorithm, which performs a similarity search on syllable transcripts generated by the speech recognizer. An evaluation is carried out to assess whether or not the radio archive prototype fulfilled the needs of archivists

    A robust front page detection algorithm for large periodical collections

    No full text
    Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automatic segmentation of the document stream into issues heavily influence the overall output quality of a document image analysis system. As a solution to the issue segmentation problem, this paper introduces a robust, two-step front page detection algorithm. First, the salient connected components from the front page of the periodical are described using a multi-dimensional Gaussian distribution based on discrete cosine transform (DCT) features. Second, a graph model is computed by applying Delaunay triangulation on the selected set of components. A specialized, error-tolerant graph matching algorithm is used to compute the distance score between the model and each candidate page. Experiments on a large, real-world newspaper data set demonstrate the generality and effectiveness of the proposed method

    Towards large scale vocabulary independent spoken term detection: Advances in the Fraunhofer IAIS audiomining system

    No full text
    This contribution presents the advances of the Fraunhofer IAIS Audiomining system for vocabulary independent spoken term detection since the last SIGIR workshop on searching spontaneous conversational speech in 2007. Based on feedback from archivists involved in the development of the prototype, a set of requirements for spoken term detection systems was established, guiding the development of the overall system. After improving the automatic speech recognition (ASR) baseline with data from the broadcast domain, the syllable error rate on a set of broadcast news and broadcast conversation shows could be improved by 45.6% relative, while the time required for analyzing the data could be reduced by 90%. Based on the new ASR results, the F1 value of the fuzzy syllable search used for open vocabulary spoken term detection was increased by 49% relative. The best results could be achieved with a hybrid word and syllable system, with a relative F1 improvement of 58% compared to the 2007 prototype. With the better ASR baseline, exact string search on the syllable transcripts becomes a promising alternative, yielding precise results on large audiovisual archives with only small reductions in recall

    Improved Parameters Estimating Scheme for E-HMM with Application to Face Recognition

    No full text
    corecore