132 research outputs found

    DOCUMENT TEXT DETECTION IN VIDEO FRAMES ACQUIRED BY A SMARTPHONE BASED ON LINE SEGMENT DETECTOR AND DBSCAN CLUSTERING

    Get PDF
    Automatic document text detection in video is an important task and a prerequisite for video retrieval, annotation, recognition, indexing and content analysis. In this paper, we present an effective and efficient model for detecting the page outlines within frames of video clip acquired by a Smartphone. The model consists of four stages: In the first stage, all line segments of each video frame are detected by LSD method. In the second stage, the line segments are grouped into clusters using the DBSCAN clustering algorithm, and then a prior knowledge is used in order to discover the cluster of page document from the background. In the third and fourth stages, a length and an angle filtering processes are performed respectively on the cluster of line segments. Finally a sorting operation is applied in order to detect the quadrilateral coordinates of the document page in the input video frame. The proposed model is evaluated on the ICDAR 2015 Smartphone Capture OCR dataset. Experimental results and comparative study show that our model can achieve encouraging and useful results and works efficiently even under different classes of documents

    Digits Recognition on Medical Device

    Get PDF
    With the rapid development of mobile health, mechanisms for automatic data input are becoming increasingly important for mobile health apps. In these apps, users are often required to input data frequently, especially numbers, from medical devices such as glucometers and blood pressure meters. However, these simple tasks are tedious and prone to error. Even though some Bluetooth devices can make those input operations easier, they are not popular enough due to being expensive and requiring complicated protocol support. Therefore, we propose an automatic procedure to recognize the digits on the screen of medical devices with smartphone cameras. The whole procedure includes several “standard” components in computer vision: image enhancement, the region-of-interest detection, and text recognition. Previous works existed for each component, but they have various weaknesses that lead to a low recognition rate. We proposed several novel enhancements in each component. Experiment results suggest that our enhanced procedure outperforms the procedure of applying optical character recognition directly from 6.2% to 62.1%. This procedure can be adopted (with human verification) to recognize the digits on the screen of medical devices with smartphone cameras

    Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification

    Get PDF
    An enormous amount of digital images are being generated and stored every day. Understanding text in these images is an important challenge with large impacts for academic, industrial and domestic applications. Recent studies address the difficulty of separating text targets from noise and background, all of which vary greatly in natural scenes. To tackle this problem, we develop a text detection system to analyze and utilize visual information in a data driven, automatic and intelligent way. The proposed method incorporates features learned from data, including patch-based coarse-to-fine detection (Text-Conv), connected component extraction using region growing, and graph-based word segmentation (Word-Graph). Text-Conv is a sliding window-based detector, with convolution masks learned using the Convolutional k-means algorithm (Coates et. al, 2011). Unlike convolutional neural networks (CNNs), a single vector/layer of convolution mask responses are used to classify patches. An initial coarse detection considers both local and neighboring patch responses, followed by refinement using varying aspect ratios and rotations for a smaller local detection window. Different levels of visual detail from ground truth are utilized in each step, first using constraints on bounding box intersections, and then a combination of bounding box and pixel intersections. Combining masks from different Convolutional k-means initializations, e.g., seeded using random vectors and then support vectors improves performance. The Word-Graph algorithm uses contextual information to improve word segmentation and prune false character detections based on visual features and spatial context. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems, and producing highly accurate text detection masks at the pixel level. To investigate the utility of our feature learning approach for other image types, we perform tests on 8- bit greyscale USPTO patent drawing diagram images. An ensemble of Ada-Boost classifiers with different convolutional features (MetaBoost) is used to classify patches as text or background. The Tesseract OCR system is used to recognize characters in detected labels and enhance performance. With appropriate pre-processing and post-processing, f-measures of 82% for part label location, and 73% for valid part label locations and strings are obtained, which are the best obtained to-date for the USPTO patent diagram data set used in our experiments. To sum up, an intelligent refinement of convolutional k-means-based feature learning and novel automatic classification methods are proposed for text detection, which obtain state-of-the-art results without the need for strong prior knowledge. Different ground truth representations along with features including edges, color, shape and spatial relationships are used coherently to improve accuracy. Different variations of feature learning are explored, e.g. support vector-seeded clustering and MetaBoost, with results suggesting that increased diversity in learned features benefit convolution-based text detectors

    Registration and categorization of camera captured documents

    Get PDF
    Camera captured document image analysis concerns with processing of documents captured with hand-held sensors, smart phones, or other capturing devices using advanced image processing, computer vision, pattern recognition, and machine learning techniques. As there is no constrained capturing in the real world, the captured documents suffer from illumination variation, viewpoint variation, highly variable scale/resolution, background clutter, occlusion, and non-rigid deformations e.g., folds and crumples. Document registration is a problem where the image of a template document whose layout is known is registered with a test document image. Literature in camera captured document mosaicing addressed the registration of captured documents with the assumption of considerable amount of single chunk overlapping content. These methods cannot be directly applied to registration of forms, bills, and other commercial documents where the fixed content is distributed into tiny portions across the document. On the other hand, most of the existing document image registration methods work with scanned documents under affine transformation. Literature in document image retrieval addressed categorization of documents based on text, figures, etc. However, the scalability of existing document categorization methodologies based on logo identification is very limited. This dissertation focuses on two problems (i) registration of captured documents where the overlapping content is distributed into tiny portions across the documents and (ii) categorization of captured documents into predefined logo classes that scale to large datasets using local invariant features. A novel methodology is proposed for the registration of user defined Regions Of Interest (ROI) using corresponding local features from their neighborhood. The methodology enhances prior approaches in point pattern based registration, like RANdom SAmple Consensus (RANSAC) and Thin Plate Spline-Robust Point Matching (TPS-RPM), to enable registration of cell phone and camera captured documents under non-rigid transformations. Three novel aspects are embedded into the methodology: (i) histogram based uniformly transformed correspondence estimation, (ii) clustering of points located near the ROI to select only close by regions for matching, and (iii) validation of the registration in RANSAC and TPS-RPM algorithms. Experimental results on a dataset of 480 images captured using iPhone 3GS and Logitech webcam Pro 9000 have shown an average registration accuracy of 92.75% using Scale Invariant Feature Transform (SIFT). Robust local features for logo identification are determined empirically by comparisons among SIFT, Speeded-Up Robust Features (SURF), Hessian-Affine, Harris-Affine, and Maximally Stable Extremal Regions (MSER). Two different matching methods are presented for categorization: matching all features extracted from the query document as a single set and a segment-wise matching of query document features using segmentation achieved by grouping area under intersecting dense local affine covariant regions. The later approach not only gives an approximate location of predicted logo classes in the query document but also helps to increase the prediction accuracies. In order to facilitate scalability to large data sets, inverted indexing of logo class features has been incorporated in both approaches. Experimental results on a dataset of real camera captured documents have shown a peak 13.25% increase in the F–measure accuracy using the later approach as compared to the former

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Computer Vision for Timber Harvesting

    Get PDF

    Contributions to the Content-Based Image Retrieval Using Pictorial Queris

    Get PDF
    L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca.Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal.La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final

    Contributions to the content-based image retrieval using pictorial queries

    Get PDF
    Descripció del recurs: el 02 de novembre de 2010L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca. Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal. La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final

    Detection and identifitication of registration and fishing gear in vessels

    Get PDF
    Illegal, unreported and unregulated (IUU) fishing is a global menace to both marine ecosystems and sustainable fisheries. IUU products often come from fisheries lacking conservation and management measures, which allows the violation of bycatch limits or unreported catching. To counteract such issue, some countries adopted vessel monitoring systems (VMS) in order to track and monitor the activities of fishing vessels. The VMS approach is not flawless and as such, there are still known cases of IUU fishing. The present work is integrated in a project PT2020 SeeItAll of the company Xsealence and was included in INOV tasks in which a monitoring system using video cameras in the Ports (Non-boarded System) was developed, in order to detect registrations of vessels. This system registers the time of entry or exit of the vessel in the port. A second system (Boarded System) works with a camera placed in each vessel and an automatic learning algorithm detects and records fishing activities, for a comparison with the vessel’s fishing report.A pesca ilegal, não declarada e não regulamentada (INDNR) é uma ameaça global tanto para os ecossistemas marinhos quanto para a pesca sustentável. Os produtos INDNR são frequentemente provenientes de pescas que não possuem medidas de conservação e de gestão, o que permite a violação dos limites das capturas ou a captura não declarada. Para contrariar esse problema, alguns países adotaram sistemas de monitoramento de embarcações (VMS) para acompanhar e monitorar as atividades dos navios de pesca. A abordagem VMS não é perfeita e, como tal, ainda há casos conhecidos de pesca INDNR. O presente trabalho encontra-se integrado num projeto PT2020 SeeItAll da empresa Xsealence. Este trabalho integrado nas tarefas do INOV no qual foi desenvolvido um sistema de monitorização das entradas dos navios nos Portos (Sistema não embarcado) no qual pretende-se desenvolver um sistema que detete as matriculas dos navios registando a hora de entrada e saída do porto com recurso da camaras de vídeo. A outra componente (sistema embarcado) é colocada em cada embarcação uma camara de video e, recorrendo a aprendizagem automática e um sistema de CCTV, são detetadas as atividades de pesca e gravadas, para posterior comparação com o relatório de pesca do navio
    corecore