4 research outputs found

    Towards Multi-Level Classification in Deep Plant Identification

    Get PDF
    Tesis de Graduación (Doctorado académico en Ingeniería) Instituto Tecnológico de Costa Rica, 2018.In the last decade, automatic identification of organisms based on computer vision techniques has been a hot topic for both biodiversity scientists and machine learning specialists. Early on, plants became particularly attractive as a subject of study for two main reasons. On the one hand, quick and accurate inventories of plants are critical for biodiversity conservation; for example, they are indispensable in conducting ecosystem inventories, defining models for environmental service payments, and tracking populations of invasive plant species, among others. On the other hand, plants are a more tractable group than, for instance, insects. First of all, the number of species is smaller (around 400,000 compared to more than 8 million). Secondly, they are better understood by the scientific community, particularly with respect to their morphometric features. Thirdly, there are large, fast growing databases of digital images of plants generated by both scientists and the general public. Finally, an incremental approach based first on "flat elements" such as leaves and then the whole plant made it feasible to use computer vision techniques early on. As a result, even mobile apps for the general public are available nowadays. This document presents the key results obtained while tackling the general problem of fully automating the identification of plant species based solely on images. It describes the key findings in a research path that started with a restricted scope, namely, identification of plants from Costa Rica by using a morphometric approach that considers images of fresh leaves only. Then, species from other regions of the world were included, but still using hand-crafted feature extractors. A key methodological turn was the subsequent use of Deep Learning techniques on images of any components of a plant. Then we studied and compared the accuracy of a Deep Learning approach to do identifications based on datasets of images of fresh plants and compared it with datasets of herbarium sheet images for the first time. Among the results obtained during this research, potential biases in automatic plant identification dataset were found and characterized. Feasibility of doing transfer learning between different regions of the world was also proven. Even more importantly, it was for the first time demonstrated that herbarium sheets are a good resource to do identifications of plants mounted on herbarium sheets, which provides additional levels of importance to herbaria around the globe. Finally, as a culmination of this research path, this document presents the results of developing a novel multi-level classification approach that uses knowledge about higher taxonomic levels to carry out not only family and genus level identifications but also to try to improve the accuracy of species level identifications. This last step focuses on the creation of a hierarchical loss function based on known plant taxonomies, coupled with multilevel Deep Learning architectures to guide the model optimization with the prior knowledge of a given class hierarchy.En la última década, la identificación automática de organismos basada en técnicas de visión artificial ha sido un tema popular tanto entre los científicos de la biodiversidad como para los especialistas en aprendizaje automático. Al principio, las plantas se volvieron particularmente atractivas como tema de estudio por dos razones principales. Por un lado, los inventarios rápidos y precisos de plantas son críticos para la conservación de la biodiversidad; por ejemplo, son indispensables para realizar inventarios de ecosistemas, definir modelos para pagos de servicios ambientales y rastrear poblaciones de especies de plantas invasoras, entre otros. Por otro lado, las plantas son un grupo más manejable que, por ejemplo, los insectos. En primer lugar, la cantidad de especies es menor (alrededor de 400,000 en comparación con más de 8 millones de insectos). En segundo lugar, la comunidad científica las comprende mejor, en particular con respecto a sus características morfométricas. En tercer lugar, existen grandes bases de datos de imágenes digitales de plantas generadas tanto por científicos como por el público en general. Finalmente, un enfoque incremental basado primero en "elementos planos" como hojas y luego en toda la planta hizo posible el uso de técnicas de visión por computadora desde el principio. Como resultado, incluso las aplicaciones móviles para el público en general están disponibles en la actualidad. Este documento presenta los resultados clave obtenidos mientras se aborda el problema general de automatizar por completo la identificación de especies de plantas basándose únicamente en imágenes. Describe los hallazgos clave en un camino de investigación que comenzó con un alcance restringido, a saber, la identificación de plantas de Costa Rica mediante el uso de un enfoque morfométrico que considera imágenes de hojas frescas solamente. Luego, se incluyeron especies de otras regiones del mundo, pero todavía se utilizaban extractores de características hechos a mano. Un giro metodológico clave fue el uso posterior de técnicas de aprendizaje profundo (deep learning) en imágenes de cualquier componente de una planta. Luego, estudiamos y comparamos la exactitud de un enfoque de aprendizaje profundo para realizar identificaciones basadas en conjuntos de datos de imágenes de plantas frescas y las comparamos con conjuntos de datos de imágenes de hojas de herbario por primera vez. Entre los resultados obtenidos durante esta investigación, se encontraron y caracterizaron posibles sesgos en el conjunto de datos de identificación automática de plantas. La viabilidad de hacer un aprendizaje de transferencia (transfer learning) entre diferentes regiones del mundo también se demostró. Aún más importante, por primera vez se demostró que las láminas de herbario son un buen recurso para hacer identificaciones de plantas montadas sobre láminas de herbario, lo que proporciona niveles adicionales de importancia para herbarios en todo el mundo. Finalmente, como una culminación de este camino de investigación, este documento presenta los resultados del desarrollo de un nuevo enfoque de clasificación multi-nivel (multi-level) que utiliza el conocimiento sobre niveles taxonómicos superiores para llevar a cabo identificaciones a nivel de familia y género, y también para tratar de mejorar la exactitud de identificaciones a nivel de especie. Este último paso se centra en la creación de una función de pérdida jerárquica basada en taxonomías de plantas conocidas, junto con arquitecturas de aprendizaje profundo de niveles múltiples para guiar la optimización del modelo con el conocimiento previo de una jerarquía de clases dada

    Plant Identification in an Open-world (LifeCLEF 2016)

    Get PDF
    International audienceThe LifeCLEF plant identification challenge aims at evaluating plant identification methods and systems at a very large scale, close to the conditions of a real-world biodiversity monitoring scenario. The 2016-th edition was actually conducted on a set of more than 110K images illustrating 1000 plant species living in West Europe, built through a large-scale participatory sensing platform initiated in 2011 and which now involves tens of thousands of contributors. The main novelty over the previous years is that the identification task was evaluated as an open-setrecognition problem, i.e. a problem in which the recognition system has to be robust to unknown and never seen categories. Beyond the brute-force classification across the known classes of the training set, the big challenge was thus to automatically reject the false positive classification hits that are caused by the unknown classes. This overview presents more precisely the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes

    Plant Identification in an Open-world (LifeCLEF 2016)

    Get PDF
    International audienceThe LifeCLEF plant identification challenge aims at evaluating plant identification methods and systems at a very large scale, close to the conditions of a real-world biodiversity monitoring scenario. The 2016-th edition was actually conducted on a set of more than 110K images illustrating 1000 plant species living in West Europe, built through a large-scale participatory sensing platform initiated in 2011 and which now involves tens of thousands of contributors. The main novelty over the previous years is that the identification task was evaluated as an open-setrecognition problem, i.e. a problem in which the recognition system has to be robust to unknown and never seen categories. Beyond the brute-force classification across the known classes of the training set, the big challenge was thus to automatically reject the false positive classification hits that are caused by the unknown classes. This overview presents more precisely the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes

    LifeCLEF 2016: Multimedia Life Species Identification Challenges

    Get PDF
    International audienceUsing multimedia identification tools is considered as one of the most promising solutions to help bridge the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g., iSpot, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipment have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching real world requirements. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and fine-grained classification problems in 3 domains. Each task is based on large volumes of real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders to reflect realistic usage scenarios. For each task, we report the methodology, the data sets as well as the results and the main outcom
    corecore