    A systematic review of data quality issues in knowledge discovery tasks

    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

    Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    Un Mosaico de Conservación, Desarrollo Humano y Tensiones en el Corredor Amboró-Madidi

    El corredor de conservación Amboró-Madidi es de prioridad global debido a su alta biodiversidad y endemismo. Al mismo tiempo, las tierras fiscales de la región son apreciadas por migrantes sin tierra del altiplano. En este documento contrastamos la necesidad de tierra para actividades agrícolas con las prioridades de conservación, elaborando un mapa que identifica los siguientes tres tipos de áreas: 1) Áreas para uso agrícola y desarrollo humano, 2) Áreas de tension entre desarrollo humano y conservación, y 3) Áreas de conservación y protección que no requieren acciones de conservación.Conservación, Desarrollo Humano, Amboro-Madidi, Bolivia

    A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks

    Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3, for top 7 84.55%, while in top 10 78.35%.The authors are grateful to the research groups: Control Learning Systems Optimization Group (CAOS) of the Carlos III University of Madrid and Telematics Engineering Group (GIT) of the University of Cauca for the technical support. In addition, the authors are grateful to COLCIENCIAS for PhD scholarship granted to PhD. David Camilo Corrales. This work has been also supported by: Project Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT financed by Convocatoria 04C-2018 Banco de Proyectos Conjuntos UEES-Sostenibilidad of Project Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca, ID-3848. The Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    From Theory to Practice: A Data Quality Framework for Classification Tasks

    The data preprocessing is an essential step in knowledge discovery projects. The experts affirm that preprocessing tasks take between 50% to 70% of the total time of the knowledge discovery process. In this sense, several authors consider the data cleaning as one of the most cumbersome and critical tasks. Failure to provide high data quality in the preprocessing stage will significantly reduce the accuracy of any data analytic project. In this paper, we propose a framework to address the data quality issues in classification tasks DQF4CT. Our approach is composed of: (i) a conceptual framework to provide the user guidance on how to deal with data problems in classification tasks; and (ii) an ontology that represents the knowledge in data cleaning and suggests the proper data cleaning approaches. We presented two case studies through real datasets: physical activity monitoring (PAM) and occupancy detection of an office room (OD). With the aim of evaluating our proposal, the cleaned datasets by DQF4CT were used to train the same algorithms used in classification tasks by the authors of PAM and OD. Additionally, we evaluated DQF4CT through datasets of the Repository of Machine Learning Databases of the University of California, Irvine (UCI). In addition, 84% of the results achieved by the models of the datasets cleaned by DQF4CT are better than the models of the datasets authors.This work has also been supported by: Project: “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Convocatoria 03-2018 Publicación de artículos en revistas de alto impacto. Project: “Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT - ID 4633” financed by Convocatoria 04C–2018 “Banco de Proyectos Conjuntos UEES-Sostenibilidad” of Project “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    Financial and economic feasibility of sugar cane production in northern La Paz

    This sugar cane investment could yield a positive net present value of US12.1million.However,thesumisavalidprojectiononlyifthefollowingconditionsaremet:i)theagriculturalproductionsystemiscommunitybased;ii)substitutionoftraditionalcropsoccurs;iii)noadditionalsugarmillsareinstalledafterthefirstone;iv)landtenurerightsarerespected;v)theprojectdoesnotattractnewsettlerstotheregion;andvi)theBoliviangovernmentmaintainsitspositionagainstbiofuels.Iftheconditionsdonothold,thesugarcaneprojectwillincreasedeforestationandgeneratelossesofatleast 12.1 million. However, the sum is a valid projection only if the following conditions are met: i) the agricultural production system is community based; ii) substitution of traditional crops occurs; iii) no additional sugar mills are installed after the first one; iv) land tenure rights are respected; v) the project does not attract new settlers to the region; and vi) the Bolivian government maintains its position against bio-fuels. If the conditions do not hold, the sugar cane project will increase deforestation and generate losses of at least U.S. 13.6 million to the Bolivian econom

    Características sensoriales de papas tipo bastón fritas en aceites condimentados / Sensory characteristics of french fries using seasoned oils for frying

    ResumenSe evaluaron las características sensoriales de papas fritas tipo bastón hechas en aceite refinado de maízcondimentado con especias tales como ajo, cilantro, cebollín y ají picante. Las especies deshidratadas fueronañadidas a los aceites a las concentraciones de 0,5; 1,0 y 2,0 g 100 g-1 de aceite de maíz, mientras que el tratamientocontrol consistió en un aceite de maíz sin condimentar. La evaluación sensorial consistió en la aplicación de laspruebas triangular y discriminante. También se aplicó una prueba de aceptación basada en una escala hedónica.El perfil sensorial fue hecho con una prueba descriptiva. Los resultados mostraron diferencias en la percepción delas papas fritas hechas con los aceites condimentados y las papas hechas con el aceite control. La intensidad de lasensación picante fue proporcional a la concentración de ají picante empleado en el aceite de maíz. Los atributosmás relevantes en el perfil sensorial fueron el sabor a condimentado, el picante y el color de los bastones, mientrasque otros atributos como el aceite residual en el bastón, sensación harinosa, humedad y dureza del bastón nofueron afectadas por el tipo y concentración de las especias.Palabras clave: Aceite de maíz, especies vegetales, papas fritas, evaluación sensorialABSTRACTThe sensory characteristics of French fries in sticks made using refined corn oils which were seasoned withspecies such as garlic, coriander, chili and garlic chives were evaluated. The dehydrated species were added to theoils at concentrations of 0.5, 1.0 and 2.0 g 100 g-1 of corn oil, while the control treatment consisted of a corn oilwithout seasoning. The sensory evaluation consisted in the application of triangular and discriminant tests. Anacceptance test based on a hedonic scale was also applied. The sensory profile was made with a descriptive test.The results showed differences in the perception between the French fries made with the seasoned and controloils. The intensity of the pungent sensation was proportional to the concentration of the species in the corn oil.The most relevant attributes of the sensory profile were the seasoned taste, the pungency and the stick color, whilethe other attributes: residual oil in the stick, flour sensation, humidity and stick hardness were unaffected by thetype and concentration of the species.Key words: Corn oil, species, French fries, sensory evaluation

    Un nuevo conjunto de datos para la detección de roya en cultivos de café Colombianos basado en clasificadores

    Coffee production is the main agricultural activity in Colombia. More than 350.000 Colombian families depend on coffee harvest. Since coffee rust disease was first reported in the country in 1983, these families have had to face severe consequences. Recently, machine learning approaches have built a dataset for monitoring coffee rust incidence that involves weather conditions and physic crop properties. This background encouraged us to build a dataset for coffee rust detection in Colombian crops through data mining process as Cross Industry Standard Process for Data Mining (CRISP-DM). In this paper we define a proper data to generate accurate models; once the dataset is built, this is tested using classifiers as: Support Vector Regression, Backpropagation Neural Networks and Regression Trees.La producción de café es la principal actividad agrícola en Colombia. Más de 350.000 familias colombianas dependen de la cosecha de café. En este sentido, la roya fue reportada por primera vez en el país en 1983, y desde entonces estas familias han tenido que enfrentar graves consecuencias. Recientemente, diversos enfoques basados en aprendizaje automático han construido un conjunto de datos para el monitoreo de la incidencia de la roya del café, teniendo en cuenta las condiciones climáticas y las propiedades físicas de los cultivos. Estas investigaciones motivaron la creación de un conjunto de datos para la detección de la roya en cultivos Colombianos a través del proceso de minería de datos CRISP-DM. En este trabajo se definió un conjunto de datos con el objetivo de generar clasificadores precisos; una vez construido el conjunto de datos, fue probado mediante tres clasificadores: Maquinas de vector de regresión, Redes neuronales con propagación hacia atrás y Árboles de regresión

    Net carbon emissions from deforestation in Bolivia during 1990-2000 and 2000-2010: results from a carbon bookkeeping model

    Accurate estimates of global carbon emissions are critical for understanding global warming. This paper estimates net carbon emissions from land use change in Bolivia during the periods 1990-2000 and 2000-2010 using a model that takes into account deforestation, forest degradation, forest regrowth, gradual carbon decomposition and accumulation, as well as heterogeneity in both above ground and below ground carbon contents at the 10 by 10 km grid level. The approach permits detailed maps of net emissions by region and type of land cover. We estimate that net CO2 emissions from land use change in Bolivia increased from about 65 million tons per year during 1990-2000 to about 93 million tons per year during 2000-2010, while CO2 emissions per capita and per unit of GDP have remained fairly stable over the sample period. If we allow for estimated biomass to increase in mature forests, net CO2 emissions drop to close to zero. Finally, we find these results are robust to alternative methods of calculating emissions

    Sensoriamento remoto para culturas agrícolas baseado em um quadricóptero de baixo custo

    Este artículo presenta una propuesta para recolectar información de cultivos agrícolas mediante un cuadricóptero de bajo costo, llamado AR Drone 2.0. Para lograr el objetivo se diseña un sistema de teledetección que enmarca desafíos identificados en la presente investigación, tales como, la adquisición de fotografías aéreas de todo un cultivo y la navegación del AR Drone en zonas no planas. El proyecto se encuentra en una fase temprana de desarrollo. La primera etapa indaga la plataforma y las herramientas  hardware y software para construir el prototipo propuesto; la segunda, describe los experimentos de desempeño de los sensores de estabilidad y altura del AR Drone, con el fin de diseñar una estrategia de control de altura en cultivos no planos.  Además, se evalúan algoritmos de planificación de ruta basados en la ruta más corta mediante grafos (Dijkstra, A* y propagación de frente de onda) usando un cuadricóptero simulado. La implementación de los algoritmos de la ruta más corta es el comienzo de la cobertura total de un cultivo. Las observaciones del comportamiento del cuadricóptero en el simulador Gazebo y las pruebas reales, demuestran la viabilidad de ejecutar el proyecto, usando el AR Drone como plataforma de un sistema de teledetección para agricultura de precisión.This paper presents a proposal for information gathering from crops by means of a low-cost quadcopter known as the AR Drone 2.0. To achieve this, we designed a system for remote sensing that addresses challenges identified in the present research, such as acquisition of aerial photographs of an entire crop and AR Drone navigation on non-planar areas arises. The project is currently at an early stage of development. The first stage describes platform and hardware/software tools used to build the proposed prototype. Second stage characterizes performance experiments of sensors stability and altitude in AR Drone, in order to design an altitude strategy control over non-flat crops. In addition, path planning algorithms based on shortest route by graphs (Dijkstra, A* and wavefront propagation) are evaluated with simulated quadcopter. The implementation of the shortest path algorithms is the beginning to full coverage of a crop. Observations of quadcopter behavior in Gazebo simulator and real tests demonstrate viability to execute the project by using AR Drone like platform of a remote sensing system to precision agriculture.Este artigo apresenta uma proposta para a coleta de informações sobre as culturas agrícolas utilizando um quadricóptero de baixo custo, chamado AR Drone 2.0. Para atingir o objetivo proposto foi desenhado um sistema de sensoriamento remoto que determina desafios, tais como a aquisição de fotografias aéreas de toda a colheita e a navegação do AR Drone em áreas não planas. O projeto está atualmente na sua fase de desenvolvimento. A primeira fase examina a plataforma e as ferramentas de hardware e de software necessárias para construir o protótipo proposto; a segunda fase descreve os experimentos de desempenho da estabilidade e da altura do AR Drone, a fim de conceber uma estratégia para o controle de altura em colheitas não planas; aliás, são avaliados algoritmos de planificação de rota com base na rota mais curta mediante grafos (Dijkstra, A *, e propagação da frente de onda) usando um quadricóptero simulado. A implementação dos algoritmos da rota mais curta é o início da cobertura total de uma colheita. Tanto as observações do comportamento do quadricóptero no simulador Gazebo, como os testes reais, demonstram a viabilidade de implementar o projeto usando o AR Drone como uma plataforma para um sistema de sensoriamento remoto para a agricultura de precisão