Search CORE

1,393 research outputs found

Türk Perakende Şirketindeki Çevirimiçi Alış Verişler için İlişkililik Kurallarını Çıkarılması: Durum Çalışması

Author: Elif Şafak Sivri
Mustafa Cem Kasapbaşı
Publication venue: 'Duzce Universitesi Bilim ve Teknoloji Dergisi'
Publication date: 01/07/2019
Field of study

Directory of Open Access Journals

The <em>K</em>-Means Algorithm Evolution

Author: Almanza-Ortega Nelva Nely
Martínez-Rebollar Alicia
Pazos-Rangel Rodolfo
Pérez-Ortega Joaquín
Vega-Villalobos Andrea
Zavala-Díaz Crispín
Publication venue: 'IntechOpen'
Publication date: 03/04/2019
Field of study

IntechOpen

Crossref

Using Pre-Fire High Point Cloud Density LiDAR Data to Predict Fire Severity in Central Portugal

Author: Fernandes Paulo M.
Fernández-Guisuraga José Manuel
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

[EN], The wall-to-wall prediction of fuel structural characteristics conducive to high fire severity is essential to provide integrated insights for implementing pre-fire management strategies designed to mitigate the most harmful ecological effects of fire in fire-prone plant communities. Here, we evaluate the potential of high point cloud density LiDAR data from the Portuguese áGiLTerFoRus project to characterize pre-fire surface and canopy fuel structure and predict wildfire severity. The study area corresponds to a pilot LiDAR flight area of around 21,000 ha in central Portugal intersected by a mixed-severity wildfire that occurred one month after the LiDAR survey. Fire severity was assessed through the differenced Normalized Burn Ratio (dNBR) index computed from pre- and post-fire Sentinel-2A Level 2A scenes. In addition to continuous data, fire severity was also categorized (low or high) using appropriate dNBR thresholds for the plant communities in the study area. We computed several metrics related to the pre-fire distribution of surface and canopy fuels strata with a point cloud mean density of 10.9 m−2. The Random Forest (RF) algorithm was used to evaluate the capacity of the set of pre-fire LiDAR metrics to predict continuous and categorized fire severity. The accuracy of RF regression and classification model for continuous and categorized fire severity data, respectively, was remarkably high (pseudo-R2 = 0.57 and overall accuracy = 81%) considering that we only focused on variables related to fuel structure and loading. The pre-fire fuel metrics with the highest contribution to RF models were proxies for horizontal fuel continuity (fractional cover metric) and the distribution of fuel loads and canopy openness up to a 10 m height (density metrics), indicating increased fire severity with higher surface fuel load and higher horizontal and vertical fuel continuity. Results evidence that the technical specifications of LiDAR acquisitions framed within the áGiLTerFoRus project enable accurate fire severity predictions through point cloud data with high density.SIPortuguese Foundation for Science and Technolog

Directory of Open Access Journals

Leon University (Spain)

Data Clustering: Algorithms and Its Applications

Author: Isewon Itunuoluwa
Oladipupo O. O.
Oyelade O. J.
Publication venue
Publication date: 01/01/2019
Field of study

Data is useless if information or knowledge that can be used for further reasoning cannot be inferred from it. Cluster analysis, based on some criteria, shares data into important, practical or both categories (clusters) based on shared common characteristics. In research, clustering and classification have been used to analyze data, in the field of machine learning, bioinformatics, statistics, pattern recognition to mention a few. Different methods of clustering include Partitioning (K-means), Hierarchical (AGNES), Density-based (DBSCAN), Grid-based (STING), Soft clustering (FANNY), Model-based (SOM) and Ensemble clustering. Challenges and problems in clustering arise from large datasets, misinterpretation of results and efficiency/performance of clustering algorithms, which is necessary for choosing clustering algorithms. In this paper, application of data clustering was systematically discussed in view of the characteristics of the different clustering techniques that make them better suited or biased when applied to several types of data, such as uncertain data, multimedia data, graph data, biological data, stream data, text data, time series data, categorical data and big data. The suitability of the available clustering algorithms to different application areas was presented. Also investigated were some existing cluster validity methods used to evaluate the goodness of the clusters produced by the clustering algorithms

Covenant University Repository

2014 Annual Research Symposium Abstract Book

Author: Trinity College
Publication venue: Trinity College Digital Repository
Publication date: 01/04/2014
Field of study

2014 annual volume of abstracts for science research projects conducted by students at Trinity College

Trinity College

Recommended from our members

Parallelizing k-means with hadoop/mahout for big data analytics

Author: Cui Jianbin
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the degree of Master of Philosophy and awarded by Brunel University LondonThe rapid development of Internet and cloud computing technologies has led to explosive generation and processing of huge amounts of data. The ever increasing data volumes bring great values to societies, but in the meantime bring forward a number of challenges. Data mining techniques have been widely used in decision analysis in financial, medical, management, business and many other fields. However, how to analyse and mine valuable information from the massive data has become a crucial problem as the traditional methods are hardly to achieve high scalability in data processing. Recently, MapReduce has emerged into a major programming model in dealing with big data analytics. Apache Hadoop, which is an open-source implementation of MapReduce, has been widely taken up by the community. Hadoop facilitates the utilization of a large number of inexpensive commodity computers. In addition, Hadoop provides support in dealing with faults which is especially useful for long running jobs. Mahout is a new open-source project of Apache, providing a number of machine learning and data mining algorithms based on the Hadoop platform. As a machine learning technique, K-means has been widely used in data analytics through clustering. However, K-means experiences high overhead in computation when the size of data to be analysed is large. This thesis parallelizes K-means using the MapReduce model and implements a parallel K-means with Mahout on the Hadoop platform. The parallel K-means reduces the computation time significantly in comparison with the standard K-means in dealing with a large data set. In addition, this thesis further evaluates the impact of Hadoop parameters on the performance of the Hadoop framework

Brunel University Research Archive

Integrating Remote Sensing and Machine Learning to Assess Forest Health and Susceptibility to Pest-induced Damage

Author: Bhattarai Rajeev
Publication venue: DigitalCommons@UMaine
Publication date: 15/12/2023
Field of study

Spruce budworm (Choristoneura fumiferana; SBW) outbreaks are cyclically occurring phenomena in the northeastern USA and neighboring Canadian provinces. These outbreaks are often of landscape level causing impaired growth and mortality of the host species namely spruce (Picea sp.) and balsam fir (Abies balsamea (L.) Mill.). Acknowledging the recent SBW outbreak in Canadian provinces like Quebec and New Brunswick neighboring the state of Maine, our study devised comprehensive techniques to assess the susceptibility of Maine forests to SBW attack. This study aims to harness the power of remote sensing data and machine learning algorithms to model and map the susceptibility of forest in terms of host species availability and abundance (basal area per hectare; BAPH, and leaf area index; LAI), their maturity and the defense mechanism prevalent. In terms of host species abundance mapping our study explores the integration of satellite remote sensing data to model BAPH and LAI of two economically vital SBW host species, red spruce (Picea rubens Sarg.) and balsam fir, in Maine USA. Combining Sentinel-1 synthetic aperture radar (SAR), Sentinel-2 multispectral, and site variables, we used Random Forest (RF) and Multi-Layer Perceptron (MLP) algorithms for modeling LAI and BAPH. The results demonstrated the superiority of RF over MLP, achieving smaller normalized root mean square error (nRMSE) by 0.01 and 0.06 for LAI and BAPH, respectively. Notably, Sentinel-2 variables, especially the red-edge spectral vegetation indices, played a significant role in both LAI and BAPH estimation, with the minor inclusion of site variables, particularly elevation. In addition, using various satellite remote sensing data such as Sentinel-1 C-band SAR, PALSAR L-band SAR and Sentinel-2 multispectral, along with site variables, the study developed large-scale SBW stand impact types and susceptibility maps for the entire state of Maine. The susceptibility of the forest was assessed based on the availability of SBW host species and their maturity. Integrating machine-learning algorithms, RF and MLP, the best model, utilizing site (elevation and aspect) and Sentinel-2 data achieved an overall accuracy of 83.4% to predict SBW host species. Furthermore, combining the host species data with age data from Land Change Monitoring, Assessment, and Projection (LCMAP) products we could produce the SBW susceptibility map based on stand impact types with an overall accuracy of 88.3%. Moreover, the work builds upon the assessment of susceptibility of SBW host species taking into account the concentration of several canopy traits using remote sensing and site data. The study focused on various foliar traits affecting insect herbivory, including nutritive such as nitrogen (N), phosphorous (P), potassium (K), and copper (Cu), non-nutritive such as iron (Fe) and calcium (Ca), and defensive parameters such as equivalent water thickness (EWT) and leaf mass per area (LMA). Using Sentinel-2 and site data, we developed trait estimation models using machine-learning algorithms like Random Forest (RF), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM). The accuracy of the developed model was evaluated based on the normalized root mean square error (nRMSE). Based on the model performances, we selected XGB algorithm to estimate Ca, EWT, Fe, and K whereas Cu, LMA, N, and P were estimated using RF algorithm. Regarding the variables used, almost all the best performing models included Sentinel-2 red-edge indices and depth to water table (DWT) as the most important variables. Ultimately, the study proposed a novel framework connecting the concentrations of foliar traits in SBW host foliage to tree susceptibility to the pest, enabling the assessment of host susceptibility on a landscape level. To sum up, this study highlights the advantages and effectiveness of integrating satellite remote sensing data for enhanced pest management, providing valuable insights into tree attributes and susceptibility to spruce budworm outbreaks in Northeast USA. The findings offer essential tools for forest stakeholders to improve management strategies and mitigate potential forthcoming SBW outbreaks in the region

University of Maine

k-NN 검색 및 k-NN 그래프 생성을 위한 고속 근사 알고리즘

Author: Youngki Park
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 이상구.Finding k-nearest neighbors (k-NN) is an essential part of recommeder systems, information retrieval, and many data mining and machine learning algorithms. However, there are two main problems in finding k-nearest neighbors: 1) Existing approaches require a huge amount of time when the number of objects or dimensions is scale up. 2) The k-NN computation methods do not show the consistent performance over different search tasks and types of data. In this dissertation, we present fast and versatile algorithms for finding k-nearest neighbors in order to cope with these problems. The main contributions are summarized as follows: first, we present an efficient and scalable algorithm for finding an approximate k-NN graph by filtering node pairs whose large value dimensions do not match at all. Second, a fast collaborative filtering algorithm that utilizes k-NN graph is presented. The main idea of this approach is to reverse the process of finding k-nearest neighbors in item-based collaborative filtering. Last, we propose a fast approximate algorithm for k-NN search by selecting query-specific signatures from a signature pool to pick high-quality k-NN candidates.The experimental results show that the proposed algorithms guarantee a high level of accuracy while also being much faster than the other algorithms over different types of search tasks and datasets.Abstract i Contents iii List of Figures vii List of Tables xi Chapter 1 Introduction 1 1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Fast Approximation . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Our Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . 7 1.2.3 Reversed CF . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2 Background and Related Work 14 2.1 k-NN Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Locality Sensitive Hashing . . . . . . . . . . . . . . . . . . 15 2.1.2 LSH-based k-NN Search . . . . . . . . . . . . . . . . . . . 16 2.2 k-NN Graph Construction . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 LSH-based Approach . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Clustering-based Approach . . . . . . . . . . . . . . . . . 19 2.2.3 Heuristic-based Approach . . . . . . . . . . . . . . . . . . 20 2.2.4 Similarity Join . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3 Fast Approximate k-NN Graph Construction 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Constructing a k-Nearest Neighbor Graph . . . . . . . . . . . . . 29 3.3.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Prefix Selection Scheme . . . . . . . . . . . . . . . . . . . 32 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.2 Graph Construction Time . . . . . . . . . . . . . . . . . . 39 3.4.3 Graph Accuracy . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 44 3.5.2 Performance Comparison . . . . . . . . . . . . . . . . . . 48 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Chapter 4 Fast Collaborative Filtering 53 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Fast Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 58 4.3.1 Nearest Neighbor Graph Construction . . . . . . . . . . . 58 4.3.2 Fast Recommendation Algorithm . . . . . . . . . . . . . . 60 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 64 4.4.2 Overall Comparison . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 Effects of Parameter Changes . . . . . . . . . . . . . . . . 68 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Chapter 5 Fast Approximate k-NN Search 72 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Data-dependent LSH . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Signature Pool Generation . . . . . . . . . . . . . . . . . . 76 5.2.3 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 79 5.2.4 Optimization Techniques . . . . . . . . . . . . . . . . . . 83 5.3 S2LSH for Graph Construction . . . . . . . . . . . . . . . . . . . 84 5.3.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 84 5.3.2 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 84 5.3.3 Optimization Techniques . . . . . . . . . . . . . . . . . . 85 5.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 87 5.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . 91 5.5.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . 97 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Chapter 6 Conclusion 103 Bibliography 105 초록 113Docto

SNU Open Repository and Archive

Enhancing Resiliency in Baltimore's Urban Forest

Author: Gutierrez Dania
Huber Kristiane
Poon Wing Sze
Robinson Becca
Publication venue
Publication date: 01/04/2015
Field of study

The City of Baltimore, as part of their climate adaptation strategy, has pledged to double their tree canopy by 2017 in the hopes of mitigating a variety of climatic hazards that are projected to worsen in the future. These hazards include the length and magnitude of heat and precipitation events, sealevel rise, and increased prevalence of extreme weather events such as tornados and coastal storms. In keeping with this goal of forest expansion, one of the strategies (strategy NS-2) put forth in Baltimore’s Disaster Preparedness and Planning Project (DP3) is to “increase and enhance the resilience and health of Baltimore’s Urban Forest.” To help the City of Baltimore meet their goal of successfully and sustainably expanding their urban forest, we have completed a five-staged approach centered around the creation of an interactive spatial decision support tool: (1) identification of urban forestry best practices and analysis of precedence to inform successful tree selection and planting; (2) a review of existing urban forestry practices and policies in other cities to identify cities leading the way on planning and growing a resilient urban forest and synthesizing lessons, strategies implemented, and challenges in these locations, (3) the integration of the USDA vegetation database outlining the preferred growing conditions and a variety of other attributes for the majority of eastern hardwood species, with a spatial database that includes site-specific environmental, situational, and risk factors; (4) the creation of a user-friendly interactive tool that ranks trees from the vegetation database based on site-specific characteristics; and (5) beta-testing of the tool with a variety of Baltimore stakeholders to generate buy-in, ensure its usability and longevity as a solution, and provide recommendations for future iterations of this model. Throughout the tool development process, we aimed to create an interface that is replicable across other cities, given the amount of need we have identified for a tool of this caliber, specificity, and integrated considerations. Based on beta-testing results with 17 stakeholders, carried out in Baltimore in late March, 2015, our tool was well-received and supported, and we anticipate that Baltimore officials will work to publically implement the tool in the coming months.Master of ScienceNatural Resources and EnvironmentUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/111049/1/Enhancing Resiliency in Baltimore's Urban Forest Final Report#268_2015.pd

Deep Blue Documents at the University of Michigan

A comparison of statistical machine learning methods in heartbeat detection and classification

Author: A.L. Goldberger
G.J. McLachlan
H. Feichtinger
J.A. Freeman
P. Chazal de
R.A. Johnson
R.O. Duda
T. Ince
Y.H. Hu
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2012
Field of study

In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

Crossref

Research Archive of Indian Institute of Technology Hyderabad