Search CORE

5 research outputs found

Digital repositories and linked data: lessons learned and challenges

Author: Espinoza Mejia Jorge Mauricio
Gonzalez Toral Hernan Santiago
Saquicela Galarza Victor Hugo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Los repositorios digitales han sido utilizados por Universidades y Bibliotecas para almacenar sus contenidos bibliográficos, científicos y / o institucionales, y luego poner a disposición del público sus metadatos correspondientes en la web y a través del protocolo OAI-PMH. Sin embargo, estos metadatos no son lo suficientemente descriptivos para que un documento sea fácilmente detectable. Si bien el surgimiento de las tecnologías de Web Semántica ha suscitado el interés de los proveedores de Repositorios Digitales por publicar y enriquecer su contenido utilizando tecnologías de Datos Vinculados (LD), esas instituciones han utilizado enfoques de generación diferente y, en ciertos casos, soluciones ad-hoc para resolver usos particulares. casos, pero ninguno de ellos ha realizado una comparación entre los enfoques existentes para demostrar cuál es la mejor solución antes de su aplicación. Para abordar esta pregunta, Hemos realizado un estudio de referencia que compara dos enfoques de generación de uso común y también describe nuestra experiencia, lecciones aprendidas y desafíos encontrados durante el proceso de publicación de un repositorio digital DSpace como LD. Los resultados muestran que el método sencillo para extraer datos de un repositorio digital es a través del protocolo estándar OAI-PMH, cuyo rendimiento en términos de tiempo de ejecución es mucho más corto que el enfoque de base de datos, mientras que las tareas adicionales de limpieza de datos son mínimas. © 2019, Springer Nature Switzerland AG. Los resultados muestran que el método sencillo para extraer datos de un repositorio digital es a través del protocolo estándar OAI-PMH, cuyo rendimiento en términos de tiempo de ejecución es mucho más corto que el enfoque de base de datos, mientras que las tareas adicionales de limpieza de datos son mínimas. © 2019, Springer Nature Switzerland AG. Los resultados muestran que el método sencillo para extraer datos de un repositorio digital es a través del protocolo estándar OAI-PMH, cuyo rendimiento en términos de tiempo de ejecución es mucho más corto que el enfoque de base de datos, mientras que las tareas adicionales de limpieza de datos son mínimas.Digital repositories have been used by Universities and Libraries to store their bibliographic, scientific, and/or institutional contents, and then make their corresponding metadata publicly available to the web and through the OAI-PMH protocol. However, such metadata is not descriptive enough for a document to be easily discoverable. Even though the emergence of Semantic Web technologies have produced the interest of Digital Repository providers to publish and enrich their content using Linked Data (LD) technologies, those institutions have used different generation approaches, and in certain cases ad-hoc solutions to solve particular use cases, but none of them has performed a comparison between existing approaches in order to demonstrate which one is the best solution prior to its application. In order to address this question, we have performed a benchmark study that compares two commonly used generation approaches, and also describes our experience, lessons learned and challenges found during the process of publishing a DSpace digital repository as LD. Results show that the straightforward method for extracting data from a digital repository is through the standard OAI-PMH protocol, whose performance in terms of execution time is much shorter than the database approach, while additional data cleaning tasks are minimal.Cayo Santa Marí

Repositorio de la Universidad de Cuenca

Land cover classification of high resolution images from an ecuadorian andean zone using deep convolutional neural networks and transfer learning

Author: Gonzalez Toral Hernan Santiago
Lupercio Novillo Rosa Lucia
Saquicela Galarza Victor Hugo
Publication venue
Publication date: 01/01/2020
Field of study

Recientemente, han surgido modelos de aprendizaje profundo o Deep Learning como un método popular para aplicar modelos de aprendizaje automático en una variedad de dominios, como en la percepción remota en donde se han propuesto diferentes enfoques para la clasificación de cobertura y uso del suelo. Sin embargo, la disponibilidad de conjuntos de datos suficientemente grande con muestras etiquetadas, dificulta el entrenamiento de dichos modelos, esto conlleva a obtener modelos sub óptimos que no son capaces de generalizar correctamente los diferentes tipos de cobertura del suelo. Este escenario sucede a menudo por lo que es considerado como un desafío importante que debe abordarse. En este artículo, se presenta un enfoque para realizar clasificación de cobertura del suelo a partir de un pequeño conjunto de datos de imágenes de alta resolución espacial perteneciente a una área en los Andes de Ecuador, se utiliza redes neuronales convolucionales profundas y técnicas como: aprendizaje por transferencia, aumento de datos, entre otros ajustes a los parámetros del modelo. Los resultados demostraron que este método es capaz de alcanzar una buena precisión de clasificación si está respaldado por buenas estrategias para aumentar el número de muestras en un conjunto de datos desequilibrado.Different deep learning models have recently emerged as a popular method to apply machine learning in a variety of domains including remote sensing, where several approaches for the classification of land cover and use have been proposed. However, acquiring a suitably large data set with labelled samples for training such models is often a significant challenge to tackle, that leads to suboptimal models not being able to generalize well over different types of land cover. In this paper, we present an approach to perform land cover classification on a small dataset of high-resolution imagery from an area in the Andes of Ecuador using deep convolutional neural networks and techniques such as transfer learning, data augmentation, and some finetuning considerations. Results demonstrated that this method can achieve good classification accuracies if it is backed with good strategies to increase the number of samples in an imbalanced dataset

Repositorio de la Universidad de Cuenca

Land cover classification of high resolution images from an ecuadorian andean zone using deep convolutional neural networks and transfer learning

Author: Gonzalez Toral Hernan Santiago
Lupercio Novillo Rosa Lucia
Saquicela Galarza Victor Hugo
Publication venue: Universidad de La Habana
Publication date: 01/01/2019
Field of study

Recientemente, han surgido modelos de aprendizaje profundo o Deep Learning como un método popular para aplicar modelos de aprendizaje automático en una variedad de dominios, como en la percepción remota en donde se han propuesto diferentes enfoques para la clasificación de cobertura y uso del suelo. Sin embargo, la disponibilidad de conjuntos de datos suficientemente grande con muestras etiquetadas, dificulta el entrenamiento de dichos modelos, esto conlleva a obtener modelos sub óptimos que no son capaces de generalizar correctamente los diferentes tipos de cobertura del suelo. Este escenario sucede a menudo por lo que es considerado como un desafío importante que debe abordarse. En este artículo, se presenta un enfoque para realizar clasificación de cobertura del suelo a partir de un pequeño conjunto de datos de imágenes de alta resolución espacial perteneciente a una área en los Andes de Ecuador, se utiliza redes neuronales convolucionales profundas y técnicas como: aprendizaje por transferencia, aumento de datos, entre otros ajustes a los parámetros del modelo. Los resultados demostraron que este método es capaz de alcanzar una buena precisión de clasificación si está respaldado por buenas estrategias para aumentar el número de muestras en un conjunto de datos desequilibrado.Different deep learning models have recently emerged as a popular method to apply machine learning in a variety of domains including remote sensing, where several approaches for the classification of land cover and use have been proposed. However, acquiring a suitably large data set with labelled samples for training such models is often a significant challenge to tackle, that leads to suboptimal models not being able to generalize well over different types of land cover. In this paper, we present an approach to perform land cover classification on a small dataset of high-resolution imagery from an area in the Andes of Ecuador using deep convolutional neural networks and techniques such as transfer learning, data augmentation, and some fine-tuning considerations. Results demonstrated that this method can achieve good classification accuracies if it is backed with good strategies to increase the number of samples in an imbalanced datasetQueved

Repositorio de la Universidad de Cuenca

A ranking-based approach for supporting the initial selection of primary studies in a Systematic Literature Review

Author: Freire Zurita Renan Gonzalo
Gonzalez Toral Hernan Santiago
Gualan Saavedra Ronald Marcelo
Saquicela Galarza Victor Hugo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Traditionally most of the steps involved in a Systematic Literature Review (SLR) process are manually executed, causing inconvenience of time and effort, given the massive amount of primary studies available online. This has motivated a lot of research focused on automating the process. Current state-of-the-art methods combine active learning methods and manual selection of primary studies from a smaller set so they can maximize the finding of relevant papers while at the same time minimizing the number of manually reviewed papers. In this work, we propose a novel strategy to further improve these methods whose early success heavily depends on an effective selection of initial papers to be read by researchers using a PCAbased method which combines different document representation and similarity metric approaches to cluster and rank the content within the corpus related to an enriched representation of research questions within the SLR protocol. Validation was carried out over four publicly available data sets corresponding to SLR studies from the Software Engineering domain. The proposed model proved to be more efficient than a BM25 baseline model as a mechanism to select the initial set of relevant primary studies within the top 100 rank, which makes it a promising method to bootstrap an active learning cycle.Panam

Repositorio de la Universidad de Cuenca

A general process for the semantic annotation and enrichment of electronic program guides

Author: Espinoza Mejia Jorge Mauricio
Gonzalez Toral Hernan Santiago
Palacio Baus Kenneth Samuel
Saquicela Galarza Victor Hugo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Electronic Program Guides (EPGs) are usual resources aimed to inform the audience about the programming being transmitted by TV stations and cable/satellite TV providers. However, they only provide basic metadata about the TV programs, while users may want to obtain additional information related to the content they are currently watching. This paper proposes a general process for the semantic annotation and subsequent enrichment of EPGs using external knowledge bases and natural language processing techniques with the aim to tackle the lack of immediate availability of related information about TV programs. Additionally, we define an evaluation approach based on a distributed representation of words that can enable TV content providers to verify the effectiveness of the system and perform an automatic execution of the enrichment process. We test our proposal using a real-world dataset and demonstrate its effectiveness by using different knowledge bases, word representation models and similarity measures. Results showed that DBpedia and Google Knowledge Graph knowledge bases return the most relevant content during the enrichment process, while word2vec and fasttext models with Words Mover’s Distance as similarity function can be combined to validate the effectiveness of the retrieval task.Electronic Program Guides (EPGs) are usual resources aimed to inform the audience about the programming being transmitted by TV stations and cable/satellite TV providers. However, they only provide basic metadata about the TV programs, while users may want to obtain additional information related to the content they are currently watching. This paper proposes a general process for the semantic annotation and subsequent enrichment of EPGs using external knowledge bases and natural language processing techniques with the aim to tackle the lack of immediate availability of related information about TV programs. Additionally, we define an evaluation approach based on a distributed representation of words that can enable TV content providers to verify the effectiveness of the system and perform an automatic execution of the enrichment process. We test our proposal using a real-world dataset and demonstrate its effectiveness by using different knowledge bases, word representation models and similarity measures. Results showed that DBpedia and Google Knowledge Graph knowledge bases return the most relevant content during the enrichment process, while word2vec and fasttext models with Words Mover’s Distance as similarity function can be combined to validate the effectiveness of the retrieval task.Cayo Santa Marí

Repositorio de la Universidad de Cuenca