48,458 research outputs found
Image processing for the extraction of nutritional information from food labels
Current techniques for tracking nutritional data require undesirable amounts of either time or man-power. People must choose between tediously recording and updating dietary information or depending on unreliable crowd-sourced or costly maintained databases. Our project looks to overcome these pitfalls by providing a programming interface for image analysis that will read and report the information present on a nutrition label directly. Our solution involves a C++ library that combines image pre-processing, optical character recognition, and post-processing techniques to pull the relevant information from an image of a nutrition label. We apply an understanding of a nutrition label\u27s content and data organization to approach the accuracy of traditional data-entry methods. Our system currently provides around 80% accuracy for most label images, and we will continue to work to improve our accuracy
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
Optical character recognition with neural networks and post-correction with finite state methods
The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and journal corpus is rather low for reliable search and scientific research on the OCRed data. The estimated character error rate (CER) of the corpus, achieved with commercial software, is between 8 and 13%. There have been earlier attempts to train high-quality OCR models with open-source software, like Ocropy (https://github.com/tmbdev/ocropy) and Tesseract (https://github.com/tesseract-ocr/tesseract), but so far, none of the methods have managed to successfully train a mixed model that recognizes all of the data in the corpus, which would be essential for an efficient re-OCRing of the corpus. The difficulty lies in the fact that the corpus is printed in the two main languages of Finland (Finnish and Swedish) and in two font families (Blackletter and Antiqua). In this paper, we explore the training of a variety of OCR models with deep neural networks (DNN). First, we find an optimal DNN for our data and, with additional training data, successfully train high-quality mixed-language models. Furthermore, we revisit the effect of confidence voting on the OCR results with different model combinations. Finally, we perform post-correction on the new OCR results and perform error analysis. The results show a significant boost in accuracy, resulting in 1.7% CER on the Finnish and 2.7% CER on the Swedish test set. The greatest accomplishment of the study is the successful training of one mixed language model for the entire corpus and finding a voting setup that further improves the results.Peer reviewe
Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
Machine learning techniques are presented for automatic recognition of the
historical letters (XI-XVIII centuries) carved on the stoned walls of St.Sophia
cathedral in Kyiv (Ukraine). A new image dataset of these carved Glagolitic and
Cyrillic letters (CGCL) was assembled and pre-processed for recognition and
prediction by machine learning methods. The dataset consists of more than 4000
images for 34 types of letters. The explanatory data analysis of CGCL and
notMNIST datasets shown that the carved letters can hardly be differentiated by
dimensionality reduction methods, for example, by t-distributed stochastic
neighbor embedding (tSNE) due to the worse letter representation by stone
carving in comparison to hand writing. The multinomial logistic regression
(MLR) and a 2D convolutional neural network (CNN) models were applied. The MLR
model demonstrated the area under curve (AUC) values for receiver operating
characteristic (ROC) are not lower than 0.92 and 0.60 for notMNIST and CGCL,
respectively. The CNN model gave AUC values close to 0.99 for both notMNIST and
CGCL (despite the much smaller size and quality of CGCL in comparison to
notMNIST) under condition of the high lossy data augmentation. CGCL dataset was
published to be available for the data science community as an open source
resource.Comment: 11 pages, 9 figures, accepted for 25th International Conference on
Neural Information Processing (ICONIP 2018), 14-16 December, 2018 (Siem Reap,
Cambodia
Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive
The German Broadcasting Archive (DRA) maintains the cultural heritage of
radio and television broadcasts of the former German Democratic Republic (GDR).
The uniqueness and importance of the video material stimulates a large
scientific interest in the video content. In this paper, we present an
automatic video analysis and retrieval system for searching in historical
collections of GDR television recordings. It consists of video analysis
algorithms for shot boundary detection, concept classification, person
recognition, text recognition and similarity search. The performance of the
system is evaluated from a technical and an archival perspective on 2,500 hours
of GDR television recordings.Comment: TPDL 2016, Hannover, Germany. Final version is available at Springer
via DO
- …