327 research outputs found

    Aprendizado de máquina aplicado a dados geográficos abertos

    Get PDF
    Orientador: Alexandre Xavier FalcãoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Dados geográficos são utilizados em várias aplicações, tais como mapeamento, navegação e planificação urbana. Em particular, serviços de mapeamento são frequentemente utilizados e requerem informação geográfica atualizada. No entanto, devido a limitações orçamentárias, mapas oficiais (e.g. governamentias) sofrem de imprecisões temporais e de completude. Neste contexto projetos crowdsourcing, assim como os sistemas de informação geográfica voluntária, surgiram como uma alternativa para obter dados geográficos atualizados. OpenStreetMap (OSM) é um dos maiores projetos desse tipo com milhões de usuários (consumidores e produtores de informação) em todo o mundo e os dados coletados pelo OSM estão disponíveis gratuitamente. Uma desvantagem do OSM é o fato de poder ser editado por voluntários com diferentes habilidades de anotação, o que torna a qualidade das anotações heterogêneas em diferentes regiões geográficas. Apesar desse problema de qualidade, os dados do OSM têm sido amplamente utilizados em várias aplicações, como por exemplo no mapeamento de uso da terra. Por outro lado, é crucial melhorar a qualidade dos dados em OSM de forma que as aplicações que dependam de informações precisas, por exemplo, roteamento de carros, se tornem mais eficazes. Nesta tese, revisamos e propomos métodos baseados em aprendizado de máquina para melhorar a qualidade dos dados em OSM. Apresentamos métodos automáticos e interativos focados na melhoria dos dados em OSM para fins humanitários. Os métodos apresentados podem corrigir as anotações do OSM de edifícios em áreas rurais e permitem realizar a anotação eficiente de coqueiros a partir de imagens aéreas. O primeiro é útil na resposta a crises que afetam áreas vulneráveis, enquanto que o último é útil para monitoramento ambiental e avaliação pós-desastre. Nossa metodologia para correção automática das anotações de prédios rurais existentes em OSM consiste em três tarefas: correção de alinhamento, remoção de anotações incorretas e adição de anotações ausentes de construções. Esta metodologia obtém melhores resultados do que os métodos de segmentação semântica supervisionados e, mais importante, produz resultados vetoriais adequados para o processamento de dados geográficos. Dado que esta estratégia automática poderia não alcançar resultados precisos em algumas regiões, propomos uma abordagem interativa que reduz os esforços de humanos ao corrigir anotações de prédios rurais. Essa estratégia reduz drasticamente a quantidade de dados que os usuários precisam analisar, encontrando automaticamente a maioria dos erros de anotação existentes. A anotação de objetos de imagens aéreas é uma tarefa demorada, especialmente quando o número de objetos é grande. Assim, propomos uma metodologia na qual o processo de anotação é realizado em um espaço 2D, obtido da projeção do espaço de características das imagens. Esse método permite anotar com eficiência mais objetos do que o método tradicional de fotointerpretação, coletando amostras rotuladas mais eficazes para treinar um classificador para detecção de objetosAbstract: Geographical data are used in several applications, such as mapping, navigation, and urban planning. Particularly, mapping services are routinely used and require up-to-date geographical data. However, due to budget limitations, authoritative maps suffer from completeness and temporal inaccuracies. In this context, crowdsourcing projects, such as Volunteer Geographical Information (VGI) systems, have emerged as an alternative to obtain up-to-date geographical data. OpenStreetMap (OSM) is one of the largest VGI projects with millions of users (consumers and producers of information) around the world and the collected data in OSM are freely available. OSM is edited by volunteers with different annotation skills, which makes the annotation quality heterogeneous in different geographical regions. Despite these quality issues, OSM data have been extensively used in several applications (e.g., landuse mapping). On the other hand, it is crucial to improve the quality of the data in OSM such that applications that depend on accurate information become more effective (e.g., car routing). In this thesis, we review and propose methods based on machine learning to improve the quality of the data in OSM. We present automatic and interactive methods focused on improving OSM data for humanitarian purposes. The methods can correct the OSM annotations of building footprints in rural areas and can provide efficient annotation of coconut trees from aerial images. The former is helpful in the response to crises that affect vulnerable areas, while the later is useful for environmental monitoring and post-disaster assessment. Our methodology for automatic correction of the existing OSM annotations of rural buildings consists of three tasks: alignment correction, removal of incorrect annotations, and addition of missing building annotations. This methodology obtains better results than supervised semantic segmentation methods and, more importantly, it outputs vectorial footprints suitable for geographical data processing. Given that this automatic strategy could not attain accurate results in some regions, we propose an interactive approach which reduces the human efforts when correcting rural building annotations in OSM. This strategy drastically reduces the amount of data that the users need to analyze by automatically finding most of the existing annotation errors. The annotation of objects from aerial imagery is a time-consuming task, especially when the number of objects is high. Thus, we propose a methodology in which the annotation process is performed in a 2D space of projected image features. This method allows to efficiently annotate more objects than using traditional photointerpretation, collecting more effective labeled samples to train a classifier for object detectionDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2016/14760-5 , 2017/10086-0CAPESFAPES

    Tools of Trade of the Next Blue-Collar Job? Antecedents, Design Features, and Outcomes of Interactive Labeling Systems

    Get PDF
    Supervised machine learning is becoming increasingly popular - and so is the need for annotated training data. Such data often needs to be manually labeled by human workers, not unlikely to negatively impact the involved workforce. To alleviate this issue, a new information systems class has emerged - interactive labeling systems. However, this young, but rapidly growing field lacks guidance and structure regarding the design of such systems. Against this backdrop, this paper describes antecedents, design features, and outcomes of interactive labeling systems. We perform a systematic literature review, identifying 188 relevant articles. Our results are presented as a morphological box with 14 dimensions, which we evaluate using card sorting. By additionally offering this box as a web-based artifact, we provide actionable guidance for interactive labeling system development for scholars and practitioners. Lastly, we discuss imbalances in the article distribution of our morphological box and suggest future work directions

    The 5th Conference of PhD Students in Computer Science

    Get PDF

    GEOBIA 2016 : Solutions and Synergies., 14-16 September 2016, University of Twente Faculty of Geo-Information and Earth Observation (ITC): open access e-book

    Get PDF

    IST Austria Thesis

    Get PDF
    The human ability to recognize objects in complex scenes has driven research in the computer vision field over couple of decades. This thesis focuses on the object recognition task in images. That is, given the image, we want the computer system to be able to predict the class of the object that appears in the image. A recent successful attempt to bridge semantic understanding of the image perceived by humans and by computers uses attribute-based models. Attributes are semantic properties of the objects shared across different categories, which humans and computers can decide on. To explore the attribute-based models we take a statistical machine learning approach, and address two key learning challenges in view of object recognition task: learning augmented attributes as mid-level discriminative feature representation, and learning with attributes as privileged information. Our main contributions are parametric and non-parametric models and algorithms to solve these frameworks. In the parametric approach, we explore an autoencoder model combined with the large margin nearest neighbor principle for mid-level feature learning, and linear support vector machines for learning with privileged information. In the non-parametric approach, we propose a supervised Indian Buffet Process for automatic augmentation of semantic attributes, and explore the Gaussian Processes classification framework for learning with privileged information. A thorough experimental analysis shows the effectiveness of the proposed models in both parametric and non-parametric views

    The Role of Synthetic Data in Improving Supervised Learning Methods: The Case of Land Use/Land Cover Classification

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information ManagementIn remote sensing, Land Use/Land Cover (LULC) maps constitute important assets for various applications, promoting environmental sustainability and good resource management. Although, their production continues to be a challenging task. There are various factors that contribute towards the difficulty of generating accurate, timely updated LULC maps, both via automatic or photo-interpreted LULC mapping. Data preprocessing, being a crucial step for any Machine Learning task, is particularly important in the remote sensing domain due to the overwhelming amount of raw, unlabeled data continuously gathered from multiple remote sensing missions. However a significant part of the state-of-the-art focuses on scenarios with full access to labeled training data with relatively balanced class distributions. This thesis focuses on the challenges found in automatic LULC classification tasks, specifically in data preprocessing tasks. We focus on the development of novel Active Learning (AL) and imbalanced learning techniques, to improve ML performance in situations with limited training data and/or the existence of rare classes. We also show that much of the contributions presented are not only successful in remote sensing problems, but also in various other multidisciplinary classification problems. The work presented in this thesis used open access datasets to test the contributions made in imbalanced learning and AL. All the data pulling, preprocessing and experiments are made available at https://github.com/joaopfonseca/publications. The algorithmic implementations are made available in the Python package ml-research at https://github.com/joaopfonseca/ml-research

    Usage-driven Maintenance of Knowledge Organization Systems

    Full text link
    Knowledge Organization Systems (KOS) are typically used as background knowledge for document indexing in information retrieval. They have to be maintained and adapted constantly to reflect changes in the domain and the terminology. In this thesis, approaches are provided that support the maintenance of hierarchical knowledge organization systems, like thesauri, classifications, or taxonomies, by making information about the usage of KOS concepts available to the maintainer. The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated. For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visualization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training. This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially regarding the handling of different qualities of KOSs due to automatic and semiautomatic maintenance

    Return to Baguia: an ethnographic museum collection on the edge of living memory

    Get PDF
    The question of what significance ethnographic museum collections might hold for source communities in the current era, particularly when collections sit on the edge of living memory, is explored in this thesis through a case-study of the Baguia Collection and its virtual return to the Makasae people of Baguia Sub-district, Timor-Leste, in 2014. The Baguia Collection was acquired by Dr Alfred Bühler on behalf of the Museum der Kulturen Basel, Switzerland, in 1935 using salvage ethnology methodologies. This diasporic collection now exists in Switzerland as a record of Bühler's accomplishments and of Swiss ethnographic history, and as a time capsule of Makasae heritage. This research explores an initial phase of engagement between the residents of Baguia and the Baguia Collection. Makasae responses to this Collection, which consists of 691 material culture objects and over 300 historical photos, raise issues pertinent to contemporary museology practice as it seeks to identify appropriate relational processes in collaborating with source communities. The research findings support proposals for the flexible, pro-technological access and digital return of museum collections to source communities, yet considers the inherent limitations and complexities in this methodology as well. I argue that the Baguia Collection has shared heritage values and that digital access arrangements will enhance the restitution of cultural knowledge and its subsequent inter-generational transmission in Baguia while also providing the Museum der Kulturen Basel with more updated and relevant information about the Collection. My project demonstrates that access to digital images of the Collection has enabled residents of Baguia to assert their cultural authority over the Collection, and that with further digital access they would activate the Collection to meet their own development agendas. By animating the Collection through 'acts of transfer' the Baguia community illustrated the potential for the Collection to become a source of metacultural production that reinvigorates contemporary Makasae identity and develops Makasae social and cultural capital, while ultimately enhancing their capacity to aspire
    corecore