Search CORE

2,459 research outputs found

Text detection in natural scenes through weighted majority voting of DCT high pass filters, line removal, and color consistency filtering

Author: Snyder Dave
Publication venue: RIT Scholar Works
Publication date: 01/05/2011
Field of study

Detecting text in images presents the unique challenge of finding both in-scene and superimposed text of various sizes, fonts, colors, and textures in complex backgrounds. The goal of this system is not to recognize specific letters or words but only to determine if a pixel is text or not. This pixel level decision is made by applying a set of weighted classifiers created using a set of high pass filters, and a series of image processing techniques. It is our assertion that the learned weighted combination of frequency filters in conjunction with image processing techniques may show better pixel level text detection performance in terms of precision, recall, and f-metric, than any of the components do individually. Qualitatively, our algorithm performs well and shows promising results. Quantitative numbers are not as high as is desired, but not unreasonable. For the complete ensemble, the f-metric was found to be 0.36

RIT Scholar Works

Digits Recognition on Medical Device

Author: Liu Chang
Publication venue: Scholarship@Western
Publication date: 20/04/2016
Field of study

With the rapid development of mobile health, mechanisms for automatic data input are becoming increasingly important for mobile health apps. In these apps, users are often required to input data frequently, especially numbers, from medical devices such as glucometers and blood pressure meters. However, these simple tasks are tedious and prone to error. Even though some Bluetooth devices can make those input operations easier, they are not popular enough due to being expensive and requiring complicated protocol support. Therefore, we propose an automatic procedure to recognize the digits on the screen of medical devices with smartphone cameras. The whole procedure includes several “standard” components in computer vision: image enhancement, the region-of-interest detection, and text recognition. Previous works existed for each component, but they have various weaknesses that lead to a low recognition rate. We proposed several novel enhancements in each component. Experiment results suggest that our enhanced procedure outperforms the procedure of applying optical character recognition directly from 6.2% to 62.1%. This procedure can be adopted (with human verification) to recognize the digits on the screen of medical devices with smartphone cameras

Scholarship@Western

A machine learning approach for digital image restoration

Author: Genoud Dominique
Mayoraz Calixte
Publication venue
Publication date: 13/06/2018
Field of study

This paper illustrates the process of image restoration in the sense of detecting images within a scanned document such as a photo album or scrapbook. The primary use case of this research is to accelerate the cropping process for the employees of Cinetis, a company based in Martigny, Switzerland that specializes in the digitalization of old media formats. In this paper, we will first summarize the state of the art in this field of research. This will include explanations of various techniques and algorithms involved with feature and document detection used by various digital companies

RERO DOC Digital Library

Text Extraction From Natural Scene: Methodology And Application

Author: Yi Chucai
Publication venue: CUNY Academic Works
Publication date: 01/10/2014
Field of study

With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

City University of New York

Rotation-invariant features for multi-oriented text detection in natural images.

Author: Bai Xiang
Liu Wenyu
Ma Yi
Tu Zhuowen
Yao Cong
Zhang Xin
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

NeatVision: a development environment for machine vision engineers

Author: Batchelor Bruce G.
Whelan Paul F.
Publication venue: Springer-Verlag London Limited
Publication date: 01/01/2011
Field of study

This Chapter will detail a free image analysis development environment for machine vision engineers. The environment provides high-level access to a wide range of image manipulation, processing and analysis algorithms (over 300 to date) through a well-defined and easy to use graphical interface. Users can extend the core library using the developer’s interface, a plug-in, which features, automatic source code generation, compilation with full error feedback and dynamic algorithm updates. The Chapter will also discuss key issues associated with the environment and outline the advantages in adopting such a system for machine vision application developmen

Irish Universities

DCU Online Research Access Service

Image Segmentation and Multiple skew estimation, correction in printed and handwritten documents

Author: Gumpalli Sai Prasanth
Kandipalli Prasanth
Publication venue
Publication date: 12/05/2014
Field of study

Analysis of handwritten document has always been a challenging task in the field of image processing. Various algorithms have been developed in finding solution to this problem. The algorithms implemented here for segmentation and skew detection works not only on printed or scanned document images but for also handwritten document images which creates an edge over other methodologies. Here Line segmentation for both printed and handwritten document image is done using two methods namely Histogram projections and Hough Transform assuming that input document image consists of no major skews. For Histogram Projection to work correct, the document must not contain even slight skews. Hough transform gives better results than the former case. Word Segmentation can be done using the connected components analysis. Here, we first identify connected components in the printed or handwritten document image. A methodology is being used here which detects multiple skews in multi handwritten documents or printed ones. Using clustering algorithms, we detect multiple skew blocks in a handwritten document image or printed document image or a combination of both. The algorithm used here also works for skewed multi handwritten text blocks

ethesis@nitr

Framework for extracting and solving combination puzzles

Author: Zakharov Vitalii
Publication venue
Publication date: 01/01/2017
Field of study

Selles töös uuritakse, kuidas arvuti nägemisega seotud algoritme on võimalik rakendada objektide tuvastuse probleemile. Täpsemalt, kas arvuti nägemist on võimalik kasutada päris maailma kombinatoorsete probleemide lahendamiseks. Idee kasutada arvuti rakendust probleemide lahendamiseks, tulenes tähelepanekust, et probleemide lahenduse protsessid on kõik enamasti algoritmid. Sellest võib järeldada, et arvutid sobivad algoritmiliste probleemide lahendamiseks paremini kui inimesed, kellel võib sama ülesande peale kuluda kordades kauem. Siiski ei vaatle arvutid probleeme samamoodi nagu inimesed ehk nad ei saa probleeme analüüsida. Niisiis selle töö panuseks saab olema erinevate arvuti nägemise algoritmide uurimine, mille eesmärgiks on päris maailma kombinatoorsete probleemide tõlgendamine abstraktseteks struktuurideks, mida arvuti on võimeline mõistma ning lahendama.Praegu on antud valdkonnas vähe materiali, mis annab hea võimaluse panustada sellesse valdkonda. Seda saavutatakse läbi empiirilise uurimise testide kogumiku kujul selleks, et veenduda millised lähenemised on kõige paremad. Nende eesmärkide saavutamiseks töötati läbi suur hulk arvuti nägemisega seotud materjale ning teooriat. Lisaks võeti ka arvesse reaalaja toimingute tähtsus, mida võib näha erinevate liikumisest struktuuri eraldavate algoritmide(SLAM, PTAM) õpingutest, mida hiljem edukalt kasutati navigatsiooni ja liitreaalsuse probleemide lahendamiseks. Siiski tuleb mainida, et neid algoritme ei kasutatud objektide omaduste tuvastamiseks.See töö uurib, kuidas saab erinevaid lähenemisi kasutada selleks, et aidata vähekogenud kasutajaid kombinatoorsete päris maailma probleemide lahendamisel. Lisaks tekib selle töö tulemusena võimalus tuvastada objektide liikumist (translatsioon, pöörlemine), mida saab kasutada koos virutaalse probleemi mudeliga, et parandada kasutaja kogemust.This thesis describes and investigates how computer vision algorithms and stereo vision algorithms may be applied to the problem of object detection. In particular, if computer vision can aid on puzzle solving. The idea to use computer application for puzzle solving came from the fact that all solution techniques are algorithms in the end. This fact leads to the conclusion that algorithms are well solved by machines, for instance, a machine requires milliseconds to compute the solution while a human can handle this in minutes or hours. Unfortunately, machines cannot see puzzles from human perspective thus cannot analyze them. Hence, the contribution of this thesis is to study different computer vision approaches from non-related solutions applied to the problem of translating the physical puzzle model into the abstract structure that can be understood and solved by a machine.Currently, there is a little written on this subject, therefore, there is a great chance to contribute. This is achieved through empirical research represented as a set of experiments in order to ensure which approaches are suitable. To accomplish these goals huge amount of computer vision theory has been studied. In addition, the relevance of real-time operations was taken into account. This was manifested through the Different real-time Structure from Motion algorithms (SLAM, PTAM) studies that were successfully applied for navigation or augmented reality problems; however, none of them for object characteristics extraction.This thesis examines how these different approaches can be applied to the given problem to help inexperienced users solve the combination puzzles. Moreover, it produces a side effect which is a possibility to track objects movement (rotation, translation) that can be used for manipulating a rendered game puzzle and increase interactivity and engagement of the user

DSpace at Tartu University Library