132 research outputs found

    Exploring Different Levels of Class Nomenclature in Random Forest Classification of Sentinel-2 Data

    Get PDF
    Moraes, D., Benevides, P., Costa, H., Moreira, F. D., & Caetano, M. (2022). Exploring Different Levels of Class Nomenclature in Random Forest Classification of Sentinel-2 Data. In IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium: Proceedings (pp. 2279-2282). (International Geoscience and Remote Sensing Symposium (IGARSS); Vol. 2022-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IGARSS46834.2022.9883798--------- Funding:The work has been supported by project foRESTER (PCIF ISSI/0102/20 17), SCAPEFIRE (PCIF IMOS/0046/ 2017) and by Centro de Investigçãao em Gestae de Informação (MagIC), all funded by the Portuguese Foundation for Science and Technology (FCT). Value-added data processed by CNES for the Theia data centre www.theia-land.fr using Copernicus products. The processing uses algorithms developed by Theia's Scientific Expertise Centres.The current land cover mapping paradigm relies on automatic classification of satellite images, with supervised methods being the most used, implying training data to have a crucial role. Aspects such as training sample size and quality should be carefully considered. This paper proposes assessing the use of a detailed class nomenclature to reinforce class diversity in the training sample. A Random Forest (RF) classification of Sentinel-2 multi-temporal data was conducted. Additionally, the effect of sample size and class distribution were evaluated. The results indicate that the use of a detailed nomenclature provided better results in terms of classification accuracy. With respect to sample distribution, adopting class sizes proportional to their occurrence in a reference land cover map exhibited superior performance in comparison to an equal size approach. The effect of sample size on classification performance was limited, as previous studies with RF suggested.authorsversionpublishe

    KLASIFIKASI PENUTUP LAHAN MENGGUNAKAN DATA LIDAR DENGAN PENDEKATAN MACHINE LEARNING (LAND COVER CLASSIFICATION USING LIDAR DATA WITH MACHINE LEARNING APPROACH)

    Get PDF
    Lidar merupakan salah satu teknologi penginderaan jauh. Data lidar banyak digunakan dan telah dikembangkan untuk kebutuhan pemetaan, perencanaan detail tata ruang, serta analisa bencana alam. Dalam perkembangannya untuk pengelolaan data lidar banyak digunakan aplikasi perangkat lunak maupun dengan menggunakan algoritma yang dibangun seperti machine learning. Tujuan dari penelitian ini adalah memanfaatkan data lidar untuk klasifikasi penutup lahan dengan menggunakan machne learning, yaitu Support Vecktor Machine (SVM). Lokasi penelitian adalah desa Tanjung Karang, Kota Mataram, Lombok. klasifikasi yang diterapkan adalah supervised classification dimana dibutuhkan data training untuk melakukan klasifikasi. Kelas penutup lahan yang diprediksi pada penelitian ini terbatas pada objek bangunan, vegetasi, jalan, lahan terbuka. Data yang digunakan utnuk klasifikasi adalah data turunan dari lidar yaitu DTM, DSM, nDSM dan Intensity. Skema klasfikasi yang digunakan adalah dengan single band dan kombinasi multi band. Untuk data referensi menggunakan peta topografi (Peta Rupabumi Indonesia). Hasil penelitian menunjukkan bahwa klasifikasi dengan skema kombinasi band memiliki akurasi yang lebih baik dibandingkan dengan skema single band, mengalami peningkatan sekitar 15-20%. Hal ini menunjukkan bahwa ada faktor saling melengkapi antar band untuk dapat mengidentifikasi objek dalam proses klasifikasi

    Influence of Sample Size in Land Cover Classification Accuracy Using Random Forest and Sentinel-2 Data in Portugal

    Get PDF
    Moraes, D., Benevides, P., Costa, H., Moreira, F. D., & Caetano, M. (2021). Influence of Sample Size in Land Cover Classification Accuracy Using Random Forest and Sentinel-2 Data in Portugal. In IGARSS 2021 - 2021 IEEE International Geoscience and Remote Sensing Symposium: Proceedings (pp. 4232-4235). IEEE. https://doi.org/10.1109/IGARSS47720.2021.9553924Classification accuracy of remote sensing images with supervised learning depends on the quality and characteristics of training samples. Size is a key aspect of a sample and its impact on classification depends on several factors, including the classifier employed, dimension on the feature space and land cover characteristics. Random Forest classifier is considered to be of low sensitivity to variations in sample size. However, further investigation is required when feature spaces are large and training is performed with spectral subclasses of the land cover classes to be mapped. This paper proposes to assess the impact of sample size in the classification accuracy of Random Forest using multitemporal Sentinel-2 data and a detailed set of training subclasses to produce a map with general land cover classes. The results revealed similar classification accuracies after major reductions in sample size.authorsversionpublishe

    Identify The Authenticity of Rupiah Currency Using K Nearest Neighbor (K-NN) Algorithm

    Get PDF
    The rupiah currency is a valid exchange rate used in transactions in the Republic of Indonesia. The Rupiah is often falsified as paper currency. Rupiah paper has a unique texture characteristic so that if processed digitally, it will be easy to distinguish from fake ones.  Designing the authenticity of Rupiah currency system using the K-NN method aims to facilitate the authenticity of the currency and test the accuracy of the method used. The method used in this research is the method of Gray Level Co-occurrence Matrix (GLCM) as a method of feature extraction and K-Nearest Neighbor (K-NN) algorithm used in the identification process. The testing phase uses data for 18 currency images. The results showed an accuracy rate of 100% for the value k = 1, 77.78% for the value k = 3, and 55.56% for the value k = 5. The highest level of accuracy in a currency authenticity identification system occurs when the value of k = 1 is 100%. The value of k on the classification input using the K-NN can determine the level of accuracy of the classification process

    Modelling Stand Variables of Beech Coppice Forest Using Spectral Sentinel-2A Data and the Machine Learning Approach

    Get PDF
    Background and Purpose: Coppice forests have a particular socio-economic and ecological role in forestry and environmental management. Their production sustainability and spatial stability become imperative for forestry sector as well as for local and global communities. Recently, integrated forest inventory and remotely sensed data analysed with non-parametrical statistical methods have enabled more detailed insight into forest structural characteristics. The aim of this research was to estimate forest attributes of beech coppice forest stands in the Sarajevo Canton through the integration of inventory and Sentinel S2A satellite data using machine learning methods. Materials and Methods: Basal area, mean stand diameter, growing stock and total volume data were determined from the forest inventory designed for represented stands of coppice forests. Spectral data were collected from bands of Sentinel S2A satellite image, vegetation indices (difference, normalized difference and ratio vegetation index) and biophysical variables (fraction of absorbed photosynthetically active radiation, leaf area index, fraction of vegetation cover, chlorophyll content in the leaf and canopy water content). Machine learning rule-based M5 model tree (M5P) and random forest (RF) methods were used for forest attribute estimation. Predictor subset selection was based on wrapping assuming M5P and RF learning schemes. Models were developed on training data subsets (402 sample plots) and evaluations were performed on validation data subsets (207 sample plots). Performance of the models was evaluated by the percentage of the root mean squared error over the mean value (rRMSE) and the square of the correlation coefficient between the observed and estimated stand variables. Results and Conclusions: Predictor subset selection resulted in a varied number of predictors for forest attributes and methods with their larger contribution in RF (between 8 and 11). Spectral biophysical variables dominated in subsets. The RF resulted in smaller errors for training sets for all attributes than M5P, while both methods delivered very high errors for validation sets (rRMSE above 50%). The lowest rRMSE of 50% was obtained for stand basal area. The observed variability explained by the M5P and RF models in training subsets was about 30% and 95% respectively, but those values were lower in test subsets (below 12%) but still significant. Differences of the sample and modelled forest attribute means were not significant, while modelled variability for all forest attributes was significantly lower (p<0.01). It seems that additional information is needed to increase prediction accuracy, so stand information (management classes, site class, soil type, canopy closure and others), new sampling strategy and new spectral products could be integrated and examined in further more complex modelling of forest attributes

    Avaliação do impacto das amostras de treinamento na acurácia da classificação random forest dos sistemas integrados de produção agropecuária.

    Get PDF
    Ao conduzir uma classificação supervisionada com algoritmos de aprendizado de máquina, como o Random Forest, a estratégia de balanceamento das amostras é fundamental, pois impacta diretamente nos resultados. Estes classificadores são sensíveis às proporções das amostras de treinamento das diferentes classes. Compreender como estes fatores influenciam na classificação de áreas de produção agropecuária, sobretudo de sistemas minoritários e complexos como o iLP (Integração Lavoura-Pecuária) são de extrema importância para contribuir com metodologias de monitoramento. Para avaliar o impacto do balanceamento, foram testados três grupos de dados de aprendizagem do Random Forest: (i) Bset01: dados balanceados entre três classes prioritárias no estado do Mato Grosso; (ii) Bset02: dados desbalanceados com as proporções refletindo a realidade de campo e (iii) Bset03: superestimando a classe rara iLP. Os melhores valores de fscore da classe iLP foram para Bset01 (0,81) e Bset02 (0,83), com um erro de comissão mais alto para Bset01, sugerindo uma melhor performance do Bset02

    Menurunkan Presentase Kredit Macet Nasabah Dengan Menggunakan Algoritma K-Nearest Neighbor

    Get PDF
    Abstrak: FIF adalah salah satu Lembaga keuangan yang menyediakan berbagai macam alternatif pinjaman uang bagi nasabah. Sejatinya dalam pemberian kredit kepada nasabah pihak Lembaga keuangan mengalami berbagai masalah atau resikko. Salah satu masalah atau resiko yang dialami Lembaga Keuangan dalam pemberian kredit adalah perilaku nasabh yang macet dalam pembayaran kredit yang pada akhirnya menyebabkan kredit macet. Hal ini merupakan masalah yang serius yang perlu diperhatikan oleh pihak penyedia layanan keuangan untuk lebih berhati-hati dalam menentukan nasabah karena dalam pemberian kredit sangat beresiko khusuusnya pada PT FIF Goup Cabang Arjawinangun. Teknik Pengambilan data yang digunakan dalam pembuatan tugas akhir ini adalah dengan menggunakan observasi, wawancara, studi dokumentasi, dan data nasabah PT FIF Goup Cabang Arjawinangun. Sementara itu Teknik pengolahan data menggunakan prinsip tahapan knowledge discovery in database (KDD) yang terdiri dari data, Data Cleaning, Data Information, Data mining, Patternevalution, knowledge. Sementara itu atribut yang digunakan adalah dari nomort NIK, Kelancaran, Prediksi, Confident macet, confident lancer asset, dan omset perbulan dari nasabah. Metode K-NN dengan jumlah dataset sebanyak 296 data menghasilkan nilai akurasi sebesar 71%. &nbsp; Kata kunci: Kredit, K-Nearest Neighbor (KNN), Prediksi. &nbsp; Abstract: FIF is a financial institution that provides various kinds of money loan alternatives for customers, one of which is through the provision of loans in the form of credit to customers. In fact, in providing credit to customers, financial institutions experience various problems or risks. One of the problems or risks experienced by financial institutions In the provision of credit is the behavior of customers who are bad in credit payments which ultimately causes bad credit. This is a serious problem that financial service providers need to pay attention to to be more careful in determining customers because in providing credit is very risky, especially at PT FIF Goup Cabang Arjawinangun The data collection technique used in the making of this final project is to use observation, interviews, study documentation, and customer data of PT FIF Goup Cabang Arjawinangun Meanwhile, data processing techniques use the principles of knowledge discovery in databases (KDD) stages consisting of data, data cleaning, data transformation, data mining, pattern evolution, knowledge. Meanwhile, the attributes used are the NIK number, fluency, prediction, bad confidence, smooth confidence, assets, and turnover per month from customers. The K-NN method with a total dataset of 296 data yields an accuracy value of 71%. &nbsp; Keywords: Credit, K-Nearest Neighbor (KNN), Prediction

    ASSESSMENT OF IMAGE CLASSIFICATION ALGORITHMS FOR LAND COVER CLASSIFICATIONS IN TULLY, NY

    Get PDF
    The identification, delineation, and mapping of landcover is integral for resource management and planning as it establishes a baseline for thematic mapping and change detection analysis. The availability of high-resolution satellite imagery and the development of machine learning algorithms have significantly improved the prediction and accuracy of landcover classification. In this study, landcover classification is performed on seven-band Landsat 9 imagery and eight-band PlanetScope imagery for the village of Tully, NY, with an area of 900 square kilometers. The resolution of Landsat imagery is 30 meters, whereas the resolution of PlanetScope imagery is 3 meters. Classification schema is developed in ArcGIS Pro with five classification levels: conifer forest, hardwood forest, agriculture, developed, and water. Pixel-based supervised classification is performed using Support Vector Machine (SVM), Random Tress (RT), K-Nearest Neighbor (K-NN), and Maximum Likelihood Classifier (MLC). The reference dataset is acquired by an image interpreter using high-resolution imagery for map accuracy assessment. All the classification methods for Landsat imagery have more than 78% accuracy, but SVM performed best with 82% accuracy. For PlanetScope imagery, SVM performed best with 85% accuracy, whereas MLC had the lowest accuracy of 77%
    corecore