182 research outputs found
Unsupervised Learning of Individuals and Categories from Images
Motivated by the existence of highly selective, sparsely firing cells observed in the human medial temporal lobe (MTL), we present an unsupervised method for learning and recognizing object categories from unlabeled images. In our model, a network of nonlinear neurons learns a sparse representation of its inputs through an unsupervised expectation-maximization process. We show that the application of this strategy to an invariant feature-based description of natural images leads to the development of units displaying sparse, invariant selectivity for particular individuals or image categories much like those observed in the MTL data
A robust sparse representation model for hyperspectral image classification
Sparse representation has been extensively investigated for hyperspectral image (HSI) classification and led to substantial improvements in the performance over the traditional methods, such as support vector machine (SVM). However, the existing sparsity-based classification methods typically assume Gaussian noise, neglecting the fact that HSIs are often corrupted by different types of noise in practice. In this paper, we develop a robust classification model that admits realistic mixed noise, which includes Gaussian noise and sparse noise. We combine a model for mixed noise with a prior on the representation coefficients of input data within a unified framework, which produces three kinds of robust classification methods based on sparse representation classification (SRC), joint SRC and joint SRC on a super-pixels level. Experimental results on simulated and real data demonstrate the effectiveness of the proposed method and clear benefits from the introduced mixed-noise model
A deep representation for depth images from synthetic data
Convolutional Neural Networks (CNNs) trained on large scale RGB databases
have become the secret sauce in the majority of recent approaches for object
categorization from RGB-D data. Thanks to colorization techniques, these
methods exploit the filters learned from 2D images to extract meaningful
representations in 2.5D. Still, the perceptual signature of these two kind of
images is very different, with the first usually strongly characterized by
textures, and the second mostly by silhouettes of objects. Ideally, one would
like to have two CNNs, one for RGB and one for depth, each trained on a
suitable data collection, able to capture the perceptual properties of each
channel for the task at hand. This has not been possible so far, due to the
lack of a suitable depth database. This paper addresses this issue, proposing
to opt for synthetically generated images rather than collecting by hand a 2.5D
large scale database. While being clearly a proxy for real data, synthetic
images allow to trade quality for quantity, making it possible to generate a
virtually infinite amount of data. We show that the filters learned from such
data collection, using the very same architecture typically used on visual
data, learns very different filters, resulting in depth features (a) able to
better characterize the different facets of depth images, and (b) complementary
with respect to those derived from CNNs pre-trained on 2D datasets. Experiments
on two publicly available databases show the power of our approach
SemiSiROC: Semisupervised Change Detection With Optical Imagery and an Unsupervised Teacher Model
Change detection (CD) is an important yet challenging task in remote sensing. In this article, we underline that the combination of unsupervised and supervised methods in a semisupervised framework improves CD performance. We rely on half-sibling regression for optical change detection (SiROC) as an unsupervised teacher model to generate pseudolabels (PLs) and select only the most confident PLs for pretraining different student models. Our results are robust to three different competitive student models, two semisupervised PL baselines, two benchmark datasets, and a variety of loss functions. While the performance gains are highest with a limited number of labels, a notable effect of PL pretraining persists when more labeled data are used. Further, we outline that the confidence selection of SiROC is indeed effective and that the performance gains generalize to scenes that were not used for PL training. Through the PL pretraining, SemiSiROC allows student models to learn more refined shapes of changes and makes them less sensitive to differences in acquisition conditions
Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription
[EN] We present a process for cost-effective transcription of cursive handwritten text
images that has been tested on a 1,000-page 17th-century book about botanical
species. The process comprised two main tasks, namely: (1) preprocessing: page
layout analysis, text line detection, and extraction; and (2) transcription of the
extracted text line images. Both tasks were carried out with semiautomatic pro-
cedures, aimed at incrementally minimizing user correction effort, by means of
computer-assisted line detection and interactive handwritten text recognition
technologies. The contribution derived from this work is three-fold. First, we
provide a detailed human-supervised transcription of a relatively large historical
handwritten book, ready to be searchable, indexable, and accessible to cultural
heritage scholars as well as the general public. Second, we have conducted the
first longitudinal study to date on interactive handwriting text recognition, for
which we provide a very comprehensive user assessment of the real-world per-
formance of the technologies involved in this work. Third, as a result of this
process, we have produced a detailed transcription and document layout infor-
mation (i.e. high-quality labeled data) ready to be used by researchers working on
automated technologies for document analysis and recognition.This work is supported by the European Commission through the EU projects HIMANIS (JPICH program, Spanish, grant Ref. PCIN-2015-068) and READ (Horizon-2020 program, grant Ref. 674943); and the Universitat Politecnica de Valencia (grant number SP20130189). This work was also part of the Valorization and I+D+i Resources program of VLC/CAMPUS and has been funded by the Spanish MECD as part of the International Excellence Campus program.Toselli, AH.; Leiva, LA.; Bordes-Cabrera, I.; Hernández-Tornero, C.; Bosch Campos, V.; Vidal, E. (2018). Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription. Digital Scholarship in the Humanities. 33(1):173-202. https://doi.org/10.1093/llc/fqw064S173202331Bazzi, I., Schwartz, R., & Makhoul, J. (1999). An omnifont open-vocabulary OCR system for English and Arabic. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(6), 495-504. doi:10.1109/34.771314Causer, T., Tonra, J., & Wallace, V. (2012). Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham*. Literary and Linguistic Computing, 27(2), 119-137. doi:10.1093/llc/fqs004Ramel, J. Y., Leriche, S., Demonet, M. L., & Busson, S. (2007). User-driven page layout analysis of historical printed books. International Journal of Document Analysis and Recognition (IJDAR), 9(2-4), 243-261. doi:10.1007/s10032-007-0040-6Romero, V., Fornés, A., Serrano, N., Sánchez, J. A., Toselli, A. H., Frinken, V., … Lladós, J. (2013). The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recognition, 46(6), 1658-1669. doi:10.1016/j.patcog.2012.11.024Romero, V., Toselli, A. H., & Vidal, E. (2012). Multimodal Interactive Handwritten Text Transcription. Series in Machine Perception and Artificial Intelligence. doi:10.1142/8394Toselli, A. H., Romero, V., Pastor, M., & Vidal, E. (2010). Multimodal interactive transcription of text images. Pattern Recognition, 43(5), 1814-1825. doi:10.1016/j.patcog.2009.11.019Toselli, A. H., Vidal, E., Romero, V., & Frinken, V. (2016). HMM word graph based keyword spotting in handwritten document images. Information Sciences, 370-371, 497-518. doi:10.1016/j.ins.2016.07.063Bunke, H., Bengio, S., & Vinciarelli, A. (2004). Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 709-720. doi:10.1109/tpami.2004.1
Schroedinger Eigenmaps for Manifold Alignment of Multimodal Hyperspectral Images
Multimodal remote sensing is an upcoming field as it allows for many views of the same region of interest. Domain adaption attempts to fuse these multimodal remotely sensed images by utilizing the concept of transfer learning to understand data from different sources to learn a fused outcome. Semisupervised Manifold Alignment (SSMA) maps multiple Hyperspectral images (HSIs) from high dimensional source spaces to a low dimensional latent space where similar elements reside closely together. SSMA preserves the original geometric structure of respective HSIs whilst pulling similar data points together and pushing dissimilar data points apart. The SSMA algorithm is comprised of a geometric component, a similarity component and dissimilarity component. The geometric component of the SSMA method has roots in the original Laplacian Eigenmaps (LE) dimension reduction algorithm and the projection functions have roots in the original Locality Preserving Projections (LPP) dimensionality reduction framework. The similarity and dissimilarity component is a semisupervised component that allows expert labeled information to improve the image fusion process. Spatial-Spectral Schroedinger Eigenmaps (SSSE) was designed as a semisupervised enhancement to the LE algorithm by augmenting the Laplacian matrix with a user-defined potential function. However, the user-defined enhancement has yet to be explored in the LPP framework. The first part of this thesis proposes to use the Spatial-Spectral potential within the LPP algorithm, creating a new algorithm we call the Schroedinger Eigenmap Projections (SEP). Through experiments on publicly available data with expert-labeled ground truth, we perform experiments to compare the performance of the SEP algorithm with respect to the LPP algorithm. The second part of this thesis proposes incorporating the Spatial Spectral potential from SSSE into the SSMA framework. Using two multi-angled HSI’s, we explore the impact of incorporating this potential into SSMA
- …