11 research outputs found

    Automatic detection and extraction of artificial text in video

    Get PDF
    A significant challenge in large multimedia databases is the provision of efficient means for semantic indexing and retrieval of visual information. Artificial text in video is normally generated in order to supplement or summarise the visual content and thus is an important carrier of information that is highly relevant to the content of the video. As such, it is a potential ready-to-use source of semantic information. In this paper we present an algorithm for detection and localisation of artificial text in video using a horizontal difference magnitude measure and morphological processing. The result of character segmentation, based on a modified version of the Wolf-Jolion algorithm [1][2] is enhanced using smoothing and multiple binarisation. The output text is input to an “off-the-shelf” noncommercial OCR. Detection, localisation and recognition results for a 20min long MPEG-1 encoded television programme are presented

    Text Recognition Past, Present and Future

    Get PDF
    Text recognition in various images is a research domain which attempts to develop a computer programs with a feature to read the text from images by the computer. Thus there is a need of character recognition mechanisms which results Document Image Analysis (DIA) which changes different documents in paper format computer generated electronic format. In this paper we have read and analyzed various methods for text recognition from different types of text images like scene images, text images, born digital images and text from videos. Text Recognition is an easy task for people who can read, but to make a computer that does character recognition is highly difficult task. The reasons behind this might be variability, abstraction and absence of various hard-and-fast rules that locate the appearance of a visual character in various text images. Therefore rules that is to be applied need to be very heuristically deduced from samples domain. This paper gives a review for various existing methods. The objective of this paper is to give a summary on well-known methods

    Automatic Car Number Plate Extraction Using Connected Components and Geometrical Features Approach

    Get PDF
    As today information era of advanced and secure digital technology field, monitoring system and security mechanisms are played as the most important role. By using specialized security camera in public sectors and pedestrian crossings, it can monitor and record a real time events and information of the sectors as video clips to track criminals. According to get the important data clearly and correctly from the video clips, the detection and extraction methods are essential. The proposed system focuses on the detection and extraction of car number plate that are taken from over speed driving cars. So, these number plates are deblurred to overcome some of the security threat and enhance the motion deblurring technique. Our proposed method is the combination of connected component based approach with the regional geometrical features. In this method, key frames are generated from an input video clips using Discrete Wavelet Transform (DWT) based approach. From the key frame images, rectangle shape areas which has high luminance value is detected and extracted as foreground regions and others are discarded as background by using regional geometric features. Finally, the rectangle shapes are checked whether any text is included or not. If a rectangle shape area contains text, this system accepts that it is a number plate and other region is omitted. Then the accuracy of the research method is evaluated with various experiments to compare with previous researches. This system can be widely used in

    Video text detection and extraction using temporal information.

    Get PDF
    Luo Bo.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 55-60).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.viTable of Contents --- p.viiList of Figures --- p.ixList of Tables --- p.xList of Abbreviations --- p.xiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Text in Videos --- p.1Chapter 1.3 --- Related Work --- p.4Chapter 1.3.1 --- Connected Component Based Methods --- p.4Chapter 1.3.2 --- Texture Classification Based Methods --- p.5Chapter 1.3.3 --- Edge Detection Based Methods --- p.5Chapter 1.3.4 --- Multi-frame Enhancement --- p.7Chapter 1.4 --- Our Contribution --- p.9Chapter Chapter 2 --- Caption Segmentation --- p.10Chapter 2.1 --- Temporal Feature Vectors --- p.10Chapter 2.2 --- Principal Component Analysis --- p.14Chapter 2.3 --- PCA of Temporal Feature Vectors --- p.16Chapter Chapter 3 --- Caption (Dis)Appearance Detection --- p.20Chapter 3.1 --- Abstract Image Sequence --- p.20Chapter 3.2 --- Abstract Image Refinement --- p.23Chapter 3.2.1 --- Refinement One --- p.23Chapter 3.2.2 --- Refinement Two --- p.24Chapter 3.2.3 --- Discussions --- p.24Chapter 3.3 --- Detection of Caption (Dis)Appearance --- p.26Chapter Chapter 4 --- System Overview --- p.31Chapter 4.1 --- System Implementation --- p.31Chapter 4.2 --- Computation of the System --- p.35Chapter Chapter 5 --- Experiment Results and Performance Analysis --- p.36Chapter 5.1 --- The Gaussian Classifier --- p.36Chapter 5.2 --- Training Samples --- p.37Chapter 5.3 --- Testing Data --- p.38Chapter 5.4 --- Caption (Dis)appearance Detection --- p.38Chapter 5.5 --- Caption Segmentation --- p.43Chapter 5.6 --- Text Line Extraction --- p.45Chapter 5.7 --- Caption Recognition --- p.50Chapter Chapter 6 --- Summary --- p.53Bibliography --- p.5

    Video text detection and extraction using temporal information.

    Get PDF
    Luo Bo.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 55-60).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.viTable of Contents --- p.viiList of Figures --- p.ixList of Tables --- p.xList of Abbreviations --- p.xiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Text in Videos --- p.1Chapter 1.3 --- Related Work --- p.4Chapter 1.3.1 --- Connected Component Based Methods --- p.4Chapter 1.3.2 --- Texture Classification Based Methods --- p.5Chapter 1.3.3 --- Edge Detection Based Methods --- p.5Chapter 1.3.4 --- Multi-frame Enhancement --- p.7Chapter 1.4 --- Our Contribution --- p.9Chapter Chapter 2 --- Caption Segmentation --- p.10Chapter 2.1 --- Temporal Feature Vectors --- p.10Chapter 2.2 --- Principal Component Analysis --- p.14Chapter 2.3 --- PCA of Temporal Feature Vectors --- p.16Chapter Chapter 3 --- Caption (Dis)Appearance Detection --- p.20Chapter 3.1 --- Abstract Image Sequence --- p.20Chapter 3.2 --- Abstract Image Refinement --- p.23Chapter 3.2.1 --- Refinement One --- p.23Chapter 3.2.2 --- Refinement Two --- p.24Chapter 3.2.3 --- Discussions --- p.24Chapter 3.3 --- Detection of Caption (Dis)Appearance --- p.26Chapter Chapter 4 --- System Overview --- p.31Chapter 4.1 --- System Implementation --- p.31Chapter 4.2 --- Computation of the System --- p.35Chapter Chapter 5 --- Experiment Results and Performance Analysis --- p.36Chapter 5.1 --- The Gaussian Classifier --- p.36Chapter 5.2 --- Training Samples --- p.37Chapter 5.3 --- Testing Data --- p.38Chapter 5.4 --- Caption (Dis)appearance Detection --- p.38Chapter 5.5 --- Caption Segmentation --- p.43Chapter 5.6 --- Text Line Extraction --- p.45Chapter 5.7 --- Caption Recognition --- p.50Chapter Chapter 6 --- Summary --- p.53Bibliography --- p.5

    MPEG-2 Kodlanmış video görüntülerinin içerik tabanlı sorgulanması

    Get PDF
    Doktora tezi, MPEG-2 Kodlanmış Video Görüntülerinin İçerik Tabanlı Sorgulanması, T.C. Trakya Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı. Bu tezin amacı, günümüzde yaygın olarak kullanılan video sıkıştırma yöntemlerinden olan MPEG-2 yöntemi kullanılarak sıkıştırılmış video görüntülerinin içerik tabanlı sorgulanmasına imkân sağlayan bir sistem geliştirmektir. Geliştirilen sistem MPEG-2 yöntemi kullanılarak sıkıştırılmış video dosyaları içersinde yer alan resim çerçevelerini elde ederek, bu resim çerçeveleri içersinde yer alan sahne metni ve yapay metinlerin konumlarını tespit etmektedir. Konumları tespit edilen bu ifadeler, OCR işlemine tabi tutularak, resim çerçevesinin içeriği hakkında bilgiler vermektedir. Elde edilen bu bilgiler veritabanına kayıt edilerek, video dosyasının içerik tabanlı sorgulanması sağlanmaktadır

    Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

    Get PDF
    Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures

    Extraction of Text from Images and Videos

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computergestützte Inhaltsanalyse von digitalen Videoarchiven

    Full text link
    Der Übergang von analogen zu digitalen Videos hat in den letzten Jahren zu großen Veränderungen innerhalb der Filmarchive geführt. Insbesondere durch die Digitalisierung der Filme ergeben sich neue Möglichkeiten für die Archive. Eine Abnutzung oder Alterung der Filmrollen ist ausgeschlossen, so dass die Qualität unverändert erhalten bleibt. Zudem wird ein netzbasierter und somit deutlich einfacherer Zugriff auf die Videos in den Archiven möglich. Zusätzliche Dienste stehen den Archivaren und Anwendern zur Verfügung, die erweiterte Suchmöglichkeiten bereitstellen und die Navigation bei der Wiedergabe erleichtern. Die Suche innerhalb der Videoarchive erfolgt mit Hilfe von Metadaten, die weitere Informationen über die Videos zur Verfügung stellen. Ein großer Teil der Metadaten wird manuell von Archivaren eingegeben, was mit einem großen Zeitaufwand und hohen Kosten verbunden ist. Durch die computergestützte Analyse eines digitalen Videos ist es möglich, den Aufwand bei der Erzeugung von Metadaten für Videoarchive zu reduzieren. Im ersten Teil dieser Dissertation werden neue Verfahren vorgestellt, um wichtige semantische Inhalte der Videos zu erkennen. Insbesondere werden neu entwickelte Algorithmen zur Erkennung von Schnitten, der Analyse der Kamerabewegung, der Segmentierung und Klassifikation von Objekten, der Texterkennung und der Gesichtserkennung vorgestellt. Die automatisch ermittelten semantischen Informationen sind sehr wertvoll, da sie die Arbeit mit digitalen Videoarchiven erleichtern. Die Informationen unterstützen nicht nur die Suche in den Archiven, sondern führen auch zur Entwicklung neuer Anwendungen, die im zweiten Teil der Dissertation vorgestellt werden. Beispielsweise können computergenerierte Zusammenfassungen von Videos erzeugt oder Videos automatisch an die Eigenschaften eines Abspielgerätes angepasst werden. Ein weiterer Schwerpunkt dieser Dissertation liegt in der Analyse historischer Filme. Vier europäische Filmarchive haben eine große Anzahl historischer Videodokumentationen zur Verfügung gestellt, welche Anfang bis Mitte des letzten Jahrhunderts gedreht und in den letzten Jahren digitalisiert wurden. Durch die Lagerung und Abnutzung der Filmrollen über mehrere Jahrzehnte sind viele Videos stark verrauscht und enthalten deutlich sichtbare Bildfehler. Die Bildqualität der historischen Schwarz-Weiß-Filme unterscheidet sich signifikant von der Qualität aktueller Videos, so dass eine verlässliche Analyse mit bestehenden Verfahren häufig nicht möglich ist. Im Rahmen dieser Dissertation werden neue Algorithmen vorgestellt, um eine zuverlässige Erkennung von semantischen Inhalten auch in historischen Videos zu ermöglichen

    Content-based video indexing for sports applications using integrated multi-modal approach

    Full text link
    This thesis presents a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple (audio-visual) modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s). The main objective is to contribute to the three major components of sports video indexing systems. The first component is a set of powerful techniques to extract audio-visual features and semantic contents automatically. The main purposes are to reduce manual annotations and to summarize the lengthy contents into a compact, meaningful and more enjoyable presentation. The second component is an expressive and flexible indexing technique that supports gradual index construction. Indexing scheme is essential to determine the methods by which users can access a video database. The third and last component is a query language that can generate dynamic video summaries for smart browsing and support user-oriented retrievals
    corecore