6,290 research outputs found
Artistic ideation based on computer vision methods
Exploramos el problema de clasificar las categorías escénicas que constituyen la base de la ideación y el diseño de la producción cultural de un artista. El objetivo principal es evaluar el desempeño de los descriptores de SIFT, la representación de la bolsa de imágenes y la correspondencia espacial de la pirámide cuando estas metodologías de visión computacional se enfrentan a este tipo de imágenes. Los resultados son prometedores, en promedio la puntuación de rendimiento es de alrededor del 70% y su desviación estándar es de aproximadamente el 5%. Explorem el problema de classificar les categories escèniques que constitueixen la base de la ideació i el disseny de la producció cultural d'un artista. L'objectiu principal és avaluar l'acompliment dels descriptors de SIFT, la representació de la borsa d'imatges i la correspondència espacial de la piràmide quan aquestes metodologies de visió computacional s'enfronten a aquest tipus d'imatges. Els resultats són prometedors, de mitjana la puntuació de rendiment és del voltant del 70% i la seva desviació estàndard és d'aproximadament el 5%
Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images
Automatic table detection in PDF documents has achieved a great success but
tabular data extraction are still challenging due to the integrity and noise
issues in detected table areas. The accurate data extraction is extremely
crucial in finance area. Inspired by this, the aim of this research is
proposing an automated table detection and tabular data extraction from
financial PDF documents. We proposed a method that consists of three main
processes, which are detecting table areas with a Faster R-CNN (Region-based
Convolutional Neural Network) model with Feature Pyramid Network (FPN) on each
page image, extracting contents and structures by a compounded layout
segmentation technique based on optical character recognition (OCR) and
formulating regular expression rules for table header separation. The tabular
data extraction feature is embedded with rule-based filtering and restructuring
functions that are highly scalable. We annotate a new Financial Documents
dataset with table regions for the experiment. The excellent table detection
performance of the detection model is obtained from our customized dataset. The
main contributions of this paper are proposing the Financial Documents dataset
with table-area annotations, the superior detection model and the rule-based
layout segmentation technique for the tabular data extraction from PDF files
Text Detection Using Transformation Scaling Extension Algorithm in Natural Scene Images
In recent study efforts, the importance of text identification and recognition in images of natural scenes has been stressed more and more. Natural scene text contains an enormous amount of useful semantic data that can be applied in a variety of vision-related applications. The detection of shape-robust text confronts two major challenges: 1. A large number of traditional quadrangular bounding box-based detectors failed to identify text with irregular forms, making it difficult to include such text within perfect rectangles.2. Pixel-wise segmentation-based detectors sometimes struggle to identify closely positioned text examples from one another. Understanding the surroundings and extracting information from images of natural scenes depends heavily on the ability to detect and recognise text. Scene text can be aligned in a variety of ways, including vertical, curved, random, and horizontal alignments. This paper has created a novel method, the Transformation Scaling Extention Algorithm (TSEA), for text detection using a mask-scoring R-ConvNN (Region Convolutional Neural Network). This method works exceptionally well at accurately identifying text that is curved and text that has multiple orientations inside real-world input images. This study incorporates a mask-scoring R-ConvNN network framework to enhance the model's ability to score masks correctly for the observed occurrences. By providing more weight to accurate mask predictions, our scoring system eliminates inconsistencies between mask quality and score and enhances the effectiveness of instance segmentation. This paper also incorporates a Pyramid-based Text Proposal Network (PBTPN) and a Transformation Component Network (TCN) to enhance the feature extraction capabilities of the mask-scoring R-ConvNN for text identification and segmentation with the TSEA. Studies show that Pyramid Networks are especially effective in reducing false alarms caused by images with backgrounds that mimic text. On benchmark datasets ICDAR 2015, SCUT-CTW1500 containing multi-oriented and curved text, this method outperforms existing methods by conducting extensive testing across several scales and utilizing a single model. This study expands the field of vision-oriented applications by highlighting the growing significance of effectively locating and detecting text in natural situations
Automatic Classification of Bright Retinal Lesions via Deep Network Features
The diabetic retinopathy is timely diagonalized through color eye fundus
images by experienced ophthalmologists, in order to recognize potential retinal
features and identify early-blindness cases. In this paper, it is proposed to
extract deep features from the last fully-connected layer of, four different,
pre-trained convolutional neural networks. These features are then feeded into
a non-linear classifier to discriminate three-class diabetic cases, i.e.,
normal, exudates, and drusen. Averaged across 1113 color retinal images
collected from six publicly available annotated datasets, the deep features
approach perform better than the classical bag-of-words approach. The proposed
approaches have an average accuracy between 91.23% and 92.00% with more than
13% improvement over the traditional state of art methods.Comment: Preprint submitted to Journal of Medical Imaging | SPIE (Tue, Jul 28,
2017
- …