Search CORE

6,291 research outputs found

Artistic ideation based on computer vision methods

Author: Figueras Ferrer Eva
Planas Rosselló Miquel
Reverter Comes Ferran
Rosado Rodrigo Pilar
Publication venue: Computer Science Commission of Polish Academy of Science
Publication date: 28/04/2017
Field of study

Exploramos el problema de clasificar las categorías escénicas que constituyen la base de la ideación y el diseño de la producción cultural de un artista. El objetivo principal es evaluar el desempeño de los descriptores de SIFT, la representación de la bolsa de imágenes y la correspondencia espacial de la pirámide cuando estas metodologías de visión computacional se enfrentan a este tipo de imágenes. Los resultados son prometedores, en promedio la puntuación de rendimiento es de alrededor del 70% y su desviación estándar es de aproximadamente el 5%. Explorem el problema de classificar les categories escèniques que constitueixen la base de la ideació i el disseny de la producció cultural d'un artista. L'objectiu principal és avaluar l'acompliment dels descriptors de SIFT, la representació de la borsa d'imatges i la correspondència espacial de la piràmide quan aquestes metodologies de visió computacional s'enfronten a aquest tipus d'imatges. Els resultats són prometedors, de mitjana la puntuació de rendiment és del voltant del 70% i la seva desviació estàndard és d'aproximadament el 5%

Diposit Digital de la Universitat de Barcelona

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

Author: Gong Yiwen
Luo Siwen
Poon Josiah
Wu Mengting
Zhou Wanying
Publication venue
Publication date: 20/02/2021
Field of study

Automatic table detection in PDF documents has achieved a great success but tabular data extraction are still challenging due to the integrity and noise issues in detected table areas. The accurate data extraction is extremely crucial in finance area. Inspired by this, the aim of this research is proposing an automated table detection and tabular data extraction from financial PDF documents. We proposed a method that consists of three main processes, which are detecting table areas with a Faster R-CNN (Region-based Convolutional Neural Network) model with Feature Pyramid Network (FPN) on each page image, extracting contents and structures by a compounded layout segmentation technique based on optical character recognition (OCR) and formulating regular expression rules for table header separation. The tabular data extraction feature is embedded with rule-based filtering and restructuring functions that are highly scalable. We annotate a new Financial Documents dataset with table regions for the experiment. The excellent table detection performance of the detection model is obtained from our customized dataset. The main contributions of this paper are proposing the Financial Documents dataset with table-area annotations, the superior detection model and the rule-based layout segmentation technique for the tabular data extraction from PDF files

arXiv.org e-Print Archive

Text Detection Using Transformation Scaling Extension Algorithm in Natural Scene Images

Author: A.S.Venkata Praneel et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 02/11/2023
Field of study

In recent study efforts, the importance of text identification and recognition in images of natural scenes has been stressed more and more. Natural scene text contains an enormous amount of useful semantic data that can be applied in a variety of vision-related applications. The detection of shape-robust text confronts two major challenges: 1. A large number of traditional quadrangular bounding box-based detectors failed to identify text with irregular forms, making it difficult to include such text within perfect rectangles.2. Pixel-wise segmentation-based detectors sometimes struggle to identify closely positioned text examples from one another. Understanding the surroundings and extracting information from images of natural scenes depends heavily on the ability to detect and recognise text. Scene text can be aligned in a variety of ways, including vertical, curved, random, and horizontal alignments. This paper has created a novel method, the Transformation Scaling Extention Algorithm (TSEA), for text detection using a mask-scoring R-ConvNN (Region Convolutional Neural Network). This method works exceptionally well at accurately identifying text that is curved and text that has multiple orientations inside real-world input images. This study incorporates a mask-scoring R-ConvNN network framework to enhance the model's ability to score masks correctly for the observed occurrences. By providing more weight to accurate mask predictions, our scoring system eliminates inconsistencies between mask quality and score and enhances the effectiveness of instance segmentation. This paper also incorporates a Pyramid-based Text Proposal Network (PBTPN) and a Transformation Component Network (TCN) to enhance the feature extraction capabilities of the mask-scoring R-ConvNN for text identification and segmentation with the TSEA. Studies show that Pyramid Networks are especially effective in reducing false alarms caused by images with backgrounds that mimic text. On benchmark datasets ICDAR 2015, SCUT-CTW1500 containing multi-oriented and curved text, this method outperforms existing methods by conducting extensive testing across several scales and utilizing a single model. This study expands the field of vision-oriented applications by highlighting the growing significance of effectively locating and detecting text in natural situations

International Journal on Recent and Innovation Trends in Computing and Communication

Automatic Classification of Bright Retinal Lesions via Deep Network Features

Author: Elawady Mohamed
Sadek Ibrahim
Shabayek Abd El Rahman
Publication venue
Publication date: 28/07/2017
Field of study

The diabetic retinopathy is timely diagonalized through color eye fundus images by experienced ophthalmologists, in order to recognize potential retinal features and identify early-blindness cases. In this paper, it is proposed to extract deep features from the last fully-connected layer of, four different, pre-trained convolutional neural networks. These features are then feeded into a non-linear classifier to discriminate three-class diabetic cases, i.e., normal, exudates, and drusen. Averaged across 1113 color retinal images collected from six publicly available annotated datasets, the deep features approach perform better than the classical bag-of-words approach. The proposed approaches have an average accuracy between 91.23% and 92.00% with more than 13% improvement over the traditional state of art methods.Comment: Preprint submitted to Journal of Medical Imaging | SPIE (Tue, Jul 28, 2017

arXiv.org e-Print Archive

HAL-uB

HAL-UJM