28 research outputs found

    Recent Advances of Manifold Regularization

    Get PDF
    Semi-supervised learning (SSL) that can make use of a small number of labeled data with a large number of unlabeled data to produce significant improvement in learning performance has been received considerable attention. Manifold regularization is one of the most popular works that exploits the geometry of the probability distribution that generates the data and incorporates them as regularization terms. There are many representative works of manifold regularization including Laplacian regularization (LapR), Hessian regularization (HesR) and p-Laplacian regularization (pLapR). Based on the manifold regularization framework, many extensions and applications have been reported. In the chapter, we review the LapR and HesR, and we introduce an approximation algorithm of graph p-Laplacian. We study several extensions of this framework for pairwise constraint, p-Laplacian learning, hypergraph learning, etc

    PiCoCo: Pixelwise Contrast and Consistency Learning for Semisupervised Building Footprint Segmentation

    Get PDF
    Building footprint segmentation from high-resolution remote sensing (RS) images plays a vital role in urban planning, disaster response, and population density estimation. Convolutional neural networks (CNNs) have been recently used as a workhorse for effectively generating building footprints. However, to completely exploit the prediction power of CNNs, large-scale pixel-level annotations are required. Most state-of-the-art methods based on CNNs are focused on the design of network architectures for improving the predictions of building footprints with full annotations, while few works have been done on building footprint segmentation with limited annotations. In this article, we propose a novel semisupervised learning method for building footprint segmentation, which can effectively predict building footprints based on the network trained with few annotations (e.g., only 0.0324 km2 out of 2.25-km2 area is labeled). The proposed method is based on investigating the contrast between the building and background pixels in latent space and the consistency of predictions obtained from the CNN models when the input RS images are perturbed. Thus, we term the proposed semisupervised learning framework of building footprint segmentation as PiCoCo, which is based on the enforcement of Pixelwise Contrast and Consistency during the learning phase. Our experiments, conducted on two benchmark building segmentation datasets, validate the effectiveness of our proposed framework as compared to several state-of-the-art building footprint extraction and semisupervised semantic segmentation methods

    Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images

    Get PDF
    Land-use classification using remote sensing images covers a wide range of applications. With more detailed spatial and textural information provided in very high resolution (VHR) remote sensing images, a greater range of objects and spatial patterns can be observed than ever before. This offers us a new opportunity for advancing the performance of land-use classification. In this paper, we first introduce an effective midlevel visual elements-oriented land-use classification method based on “partlets,” which are a library of pretrained part detectors used for midlevel visual elements discovery. Taking advantage of midlevel visual elements rather than low-level image features, a partlets-based method represents images by computing their responses to a large number of part detectors. As the number of part detectors grows, a main obstacle to the broader application of this method is its computational cost. To address this problem, we next propose a novel framework to train coarse-to-fine shared intermediate representations, which are termed “sparselets,” from a large number of pretrained part detectors. This is achieved by building a single-hidden-layer autoencoder and a single-hidden-layer neural network with an L0-norm sparsity constraint, respectively. Comprehensive evaluations on a publicly available 21-class VHR land-use data set and comparisons with state-of-the-art approaches demonstrate the effectiveness and superiority of this paper

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE

    Aprendizado de representações e correspondências baseadas em grafos para tarefas de classificação

    Get PDF
    Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Muitas situações do mundo real podem ser modeladas por meio de objetos e seus relacionamentos, como, por exemplo, estradas conectando cidades em um mapa. Grafo é um conceito derivado da abstração dessas situações. Grafos são uma poderosa representação estrutural que codifica relações entre objetos e entre seus componentes em um único formalismo. Essa representação é tão poderosa que é aplicada em uma ampla gama de aplicações, de bioinformática a redes sociais. Dessa maneira, diversos problemas de reconhecimento de padrões são modelados para utilizar representações baseadas em grafos. Em problemas de classificação, os relacionamentos presentes entre objetos ou entre seus componentes são explorados para obter soluções efetivas e/ou eficientes. Nesta tese, nós investigamos o uso de grafos em problemas de classificação. Nós propomos duas linhas de pesquisa na tese: 1) uma representação baseada em grafos associados a objetos multi-modais; e 2) uma abordagem baseada em aprendizado para identificar correspondências entre grafos. Inicialmente, nós investigamos o uso do método Sacola de Grafos Visuais para representar regiões na classificação de imagens de sensoriamento remoto, considerando a distribuição espacial de pontos de interesse dentro da imagem. Quando é feita a combinação de representações de cores e textura, nós obtivemos resultados efetivos em duas bases de dados da literatura (Monte Santo e Campinas). Em segundo lugar, nós propomos duas novas extensões do método de Sacola de Grafos para a representação de objetos multi-modais. Ao utilizar essas abordagens, nós combinamos visões complementares de diferentes modalidades (por exemplo, descrições visuais e textuais). Nós validamos o uso dessas abordagens no problema de detecção de enchentes proposto pela iniciativa MediaEval, obtendo 86,9\% de acurácia nos 50 primeiros resultados retornados. Nós abordamos o problema de corresponência de grafos ao propor um arcabouço original para aprender a função de custo no método de distância de edição de grafos. Nós também apresentamos algumas implementações utilizando métodos de reconhecimento em cenário aberto e medidas de redes complexas para caracterizar propriedades locais de grafos. Até onde sabemos, nós fomos os primeiros a tratar o processo de aprendizado de custo como um problema de reconhecimento em cenário aberto e os primeiros a explorar medidas de redes complexas em tais problemas. Nós obtivemos resultados efetivos, que são comparáveis a diversos métodos da literatura em problemas de classificação de grafosAbstract: Many real-world situations can be modeled through objects and their relationships, like the roads connecting cities in a map. Graph is a concept derived from the abstraction of these situations. Graphs are a powerful structural representation, which encodes relationship among objects and among their components into a single formalism. This representation is so powerful that it is applied to a wide range of applications, ranging from bioinformatics to social networks. Thus, several pattern recognition problems are modeled to use graph-based representations. In classification problems, the relationships among objects or among their components are exploited to achieve effective and/or efficient solutions. In this thesis, we investigate the use of graphs in classification problems. Two research venues are followed: 1) proposal of graph-based multimodal object representations; and 2) proposal of learning-based approaches to support graph matching. Firstly, we investigated the use of the recently proposed Bag-of-Visual-Graphs method in the representation of regions in a remote sensing classification problem, considering the spatial distribution of interest points within the image. When we combined color and texture representations, we obtained effective results in two datasets of the literature (Monte Santo and Campinas). Secondly, we proposed two new extensions of the Bag-of-Graphs method to the representation of multimodal objects. By using these approaches, we can combine complementary views of different modalities (e.g., visual and textual descriptions). We validated the use of these approaches in the flooding detection problem proposed by the MediaEval initiative, achieving 86.9\% of accuracy at the Precision@50. We addressed the graph matching problem by proposing an original framework to learn the cost function in a graph edit distance method. We also presented a couple of formulations using open-set recognition methods and complex network measurements to characterize local graph properties. To the best of our knowledge, we were the first to conduct the cost learning process as an open-set recognition problem and to exploit complex network measurements in such problems. We have achieved effective results, which are comparable to several baselines in graph classification problemsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2016/18429-141584/2016-5CAPESFAPESPCNP

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

    Motion capture data processing, retrieval and recognition.

    Get PDF
    Character animation plays an essential role in the area of featured film and computer games. Manually creating character animation by animators is both tedious and inefficient, where motion capture techniques (MoCap) have been developed and become the most popular method for creating realistic character animation products. Commercial MoCap systems are expensive and the capturing process itself usually requires an indoor studio environment. Procedural animation creation is often lacking extensive user control during the generation progress. Therefore, efficiently and effectively reusing MoCap data can brings significant benefits, which has motivated wider research in terms of machine learning based MoCap data processing. A typical work flow of MoCap data reusing can be divided into 3 stages: data capture, data management and data reusing. There are still many challenges at each stage. For instance, the data capture and management often suffer from data quality problems. The efficient and effective retrieval method is also demanding due to the large amount of data being used. In addition, classification and understanding of actions are the fundamental basis of data reusing. This thesis proposes to use machine learning on MoCap data for reusing purposes, where a frame work of motion capture data processing is designed. The modular design of this framework enables motion data refinement, retrieval and recognition. The first part of this thesis introduces various methods used in existing motion capture processing approaches in literature and a brief introduction of relevant machine learning methods used in this framework. In general, the frameworks related to refinement, retrieval, recognition are discussed. A motion refinement algorithm based on dictionary learning will then be presented, where kinematical structural and temporal information are exploited. The designed optimization method and data preprocessing technique can ensure a smooth property for the recovered result. After that, a motion refinement algorithm based on matrix completion is presented, where the low-rank property and spatio-temporal information is exploited. Such model does not require preparing data for training. The designed optimization method outperforms existing approaches in regard to both effectiveness and efficiency. A motion retrieval method based on multi-view feature selection is also proposed, where the intrinsic relations between visual words in each motion feature subspace are discovered as a means of improving the retrieval performance. A provisional trace-ratio objective function and an iterative optimization method are also included. A non-negative matrix factorization based motion data clustering method is proposed for recognition purposes, which aims to deal with large scale unsupervised/semi-supervised problems. In addition, deep learning models are used for motion data recognition, e.g. 2D gait recognition and 3D MoCap recognition. To sum up, the research on motion data refinement, retrieval and recognition are presented in this thesis with an aim to tackle the major challenges in motion reusing. The proposed motion refinement methods aim to provide high quality clean motion data for downstream applications. The designed multi-view feature selection algorithm aims to improve the motion retrieval performance. The proposed motion recognition methods are equally essential for motion understanding. A collection of publications by the author of this thesis are noted in publications section

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum
    corecore