2,459 research outputs found

    Text detection in natural scenes through weighted majority voting of DCT high pass filters, line removal, and color consistency filtering

    Get PDF
    Detecting text in images presents the unique challenge of finding both in-scene and superimposed text of various sizes, fonts, colors, and textures in complex backgrounds. The goal of this system is not to recognize specific letters or words but only to determine if a pixel is text or not. This pixel level decision is made by applying a set of weighted classifiers created using a set of high pass filters, and a series of image processing techniques. It is our assertion that the learned weighted combination of frequency filters in conjunction with image processing techniques may show better pixel level text detection performance in terms of precision, recall, and f-metric, than any of the components do individually. Qualitatively, our algorithm performs well and shows promising results. Quantitative numbers are not as high as is desired, but not unreasonable. For the complete ensemble, the f-metric was found to be 0.36

    Digits Recognition on Medical Device

    Get PDF
    With the rapid development of mobile health, mechanisms for automatic data input are becoming increasingly important for mobile health apps. In these apps, users are often required to input data frequently, especially numbers, from medical devices such as glucometers and blood pressure meters. However, these simple tasks are tedious and prone to error. Even though some Bluetooth devices can make those input operations easier, they are not popular enough due to being expensive and requiring complicated protocol support. Therefore, we propose an automatic procedure to recognize the digits on the screen of medical devices with smartphone cameras. The whole procedure includes several “standard” components in computer vision: image enhancement, the region-of-interest detection, and text recognition. Previous works existed for each component, but they have various weaknesses that lead to a low recognition rate. We proposed several novel enhancements in each component. Experiment results suggest that our enhanced procedure outperforms the procedure of applying optical character recognition directly from 6.2% to 62.1%. This procedure can be adopted (with human verification) to recognize the digits on the screen of medical devices with smartphone cameras

    A machine learning approach for digital image restoration

    Get PDF
    This paper illustrates the process of image restoration in the sense of detecting images within a scanned document such as a photo album or scrapbook. The primary use case of this research is to accelerate the cropping process for the employees of Cinetis, a company based in Martigny, Switzerland that specializes in the digitalization of old media formats. In this paper, we will first summarize the state of the art in this field of research. This will include explanations of various techniques and algorithms involved with feature and document detection used by various digital companies

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    Rotation-invariant features for multi-oriented text detection in natural images.

    Get PDF
    Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes

    NeatVision: a development environment for machine vision engineers

    Get PDF
    This Chapter will detail a free image analysis development environment for machine vision engineers. The environment provides high-level access to a wide range of image manipulation, processing and analysis algorithms (over 300 to date) through a well-defined and easy to use graphical interface. Users can extend the core library using the developer’s interface, a plug-in, which features, automatic source code generation, compilation with full error feedback and dynamic algorithm updates. The Chapter will also discuss key issues associated with the environment and outline the advantages in adopting such a system for machine vision application developmen

    Image Segmentation and Multiple skew estimation, correction in printed and handwritten documents

    Get PDF
    Analysis of handwritten document has always been a challenging task in the field of image processing. Various algorithms have been developed in finding solution to this problem. The algorithms implemented here for segmentation and skew detection works not only on printed or scanned document images but for also handwritten document images which creates an edge over other methodologies. Here Line segmentation for both printed and handwritten document image is done using two methods namely Histogram projections and Hough Transform assuming that input document image consists of no major skews. For Histogram Projection to work correct, the document must not contain even slight skews. Hough transform gives better results than the former case. Word Segmentation can be done using the connected components analysis. Here, we first identify connected components in the printed or handwritten document image. A methodology is being used here which detects multiple skews in multi handwritten documents or printed ones. Using clustering algorithms, we detect multiple skew blocks in a handwritten document image or printed document image or a combination of both. The algorithm used here also works for skewed multi handwritten text blocks

    Framework for extracting and solving combination puzzles

    Get PDF
    Selles töös uuritakse, kuidas arvuti nĂ€gemisega seotud algoritme on vĂ”imalik rakendada objektide tuvastuse probleemile. TĂ€psemalt, kas arvuti nĂ€gemist on vĂ”imalik kasutada pĂ€ris maailma kombinatoorsete probleemide lahendamiseks. Idee kasutada arvuti rakendust probleemide lahendamiseks, tulenes tĂ€helepanekust, et probleemide lahenduse protsessid on kĂ”ik enamasti algoritmid. Sellest vĂ”ib jĂ€reldada, et arvutid sobivad algoritmiliste probleemide lahendamiseks paremini kui inimesed, kellel vĂ”ib sama ĂŒlesande peale kuluda kordades kauem. Siiski ei vaatle arvutid probleeme samamoodi nagu inimesed ehk nad ei saa probleeme analĂŒĂŒsida. Niisiis selle töö panuseks saab olema erinevate arvuti nĂ€gemise algoritmide uurimine, mille eesmĂ€rgiks on pĂ€ris maailma kombinatoorsete probleemide tĂ”lgendamine abstraktseteks struktuurideks, mida arvuti on vĂ”imeline mĂ”istma ning lahendama.Praegu on antud valdkonnas vĂ€he materiali, mis annab hea vĂ”imaluse panustada sellesse valdkonda. Seda saavutatakse lĂ€bi empiirilise uurimise testide kogumiku kujul selleks, et veenduda millised lĂ€henemised on kĂ”ige paremad. Nende eesmĂ€rkide saavutamiseks töötati lĂ€bi suur hulk arvuti nĂ€gemisega seotud materjale ning teooriat. Lisaks vĂ”eti ka arvesse reaalaja toimingute tĂ€htsus, mida vĂ”ib nĂ€ha erinevate liikumisest struktuuri eraldavate algoritmide(SLAM, PTAM) Ă”pingutest, mida hiljem edukalt kasutati navigatsiooni ja liitreaalsuse probleemide lahendamiseks. Siiski tuleb mainida, et neid algoritme ei kasutatud objektide omaduste tuvastamiseks.See töö uurib, kuidas saab erinevaid lĂ€henemisi kasutada selleks, et aidata vĂ€hekogenud kasutajaid kombinatoorsete pĂ€ris maailma probleemide lahendamisel. Lisaks tekib selle töö tulemusena vĂ”imalus tuvastada objektide liikumist (translatsioon, pöörlemine), mida saab kasutada koos virutaalse probleemi mudeliga, et parandada kasutaja kogemust.This thesis describes and investigates how computer vision algorithms and stereo vision algorithms may be applied to the problem of object detection. In particular, if computer vision can aid on puzzle solving. The idea to use computer application for puzzle solving came from the fact that all solution techniques are algorithms in the end. This fact leads to the conclusion that algorithms are well solved by machines, for instance, a machine requires milliseconds to compute the solution while a human can handle this in minutes or hours. Unfortunately, machines cannot see puzzles from human perspective thus cannot analyze them. Hence, the contribution of this thesis is to study different computer vision approaches from non-related solutions applied to the problem of translating the physical puzzle model into the abstract structure that can be understood and solved by a machine.Currently, there is a little written on this subject, therefore, there is a great chance to contribute. This is achieved through empirical research represented as a set of experiments in order to ensure which approaches are suitable. To accomplish these goals huge amount of computer vision theory has been studied. In addition, the relevance of real-time operations was taken into account. This was manifested through the Different real-time Structure from Motion algorithms (SLAM, PTAM) studies that were successfully applied for navigation or augmented reality problems; however, none of them for object characteristics extraction.This thesis examines how these different approaches can be applied to the given problem to help inexperienced users solve the combination puzzles. Moreover, it produces a side effect which is a possibility to track objects movement (rotation, translation) that can be used for manipulating a rendered game puzzle and increase interactivity and engagement of the user
    • 

    corecore