Search CORE

167 research outputs found

Global-Scale Resource Survey and Performance Monitoring of Public OGC Web Map Services

Author: Cao Jun
Cheng Xiaoqiang
Gui Zhipeng
Liu Xiaojing
Wu Huayi
Publication venue: 'MDPI AG'
Publication date: 01/06/2016
Field of study

One of the most widely-implemented service standards provided by the Open Geospatial Consortium (OGC) to the user community is the Web Map Service (WMS). WMS is widely employed globally, but there is limited knowledge of the global distribution, adoption status or the service quality of these online WMS resources. To fill this void, we investigated global WMSs resources and performed distributed performance monitoring of these services. This paper explicates a distributed monitoring framework that was used to monitor 46,296 WMSs continuously for over one year and a crawling method to discover these WMSs. We analyzed server locations, provider types, themes, the spatiotemporal coverage of map layers and the service versions for 41,703 valid WMSs. Furthermore, we appraised the stability and performance of basic operations for 1210 selected WMSs (i.e., GetCapabilities and GetMap). We discuss the major reasons for request errors and performance issues, as well as the relationship between service response times and the spatiotemporal distribution of client monitoring sites. This paper will help service providers, end users and developers of standards to grasp the status of global WMS resources, as well as to understand the adoption status of OGC standards. The conclusions drawn in this paper can benefit geospatial resource discovery, service performance evaluation and guide service performance improvements.Comment: 24 pages; 15 figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

A Robust and Efficient Boundary Point Detection Method by Measuring Local Direction Dispersion

Author: Gui Zhipeng
Peng Dehua
Wu Huayi
Publication venue
Publication date: 07/12/2023
Field of study

Boundary points pose a significant challenge for machine learning tasks, including classification, clustering, and dimensionality reduction. Due to the similarity of features, boundary areas can result in mixed-up classes or clusters, leading to a crowding problem in dimensionality reduction. To address this challenge, numerous boundary point detection methods have been developed, but they are insufficiently to accurately and efficiently identify the boundary points in non-convex structures and high-dimensional manifolds. In this work, we propose a robust and efficient method for detecting boundary points using Local Direction Dispersion (LoDD). LoDD considers that internal points are surrounded by neighboring points in all directions, while neighboring points of a boundary point tend to be distributed only in a certain directional range. LoDD adopts a density-independent K-Nearest Neighbors (KNN) method to determine neighboring points, and defines a statistic-based metric using the eigenvalues of the covariance matrix of KNN coordinates to measure the centrality of a query point. We demonstrated the validity of LoDD on five synthetic datasets (2-D and 3-D) and ten real-world benchmarks, and tested its clustering performance by equipping with two typical clustering methods, K-means and Ncut. Our results show that LoDD achieves promising and robust detection accuracy in a time-efficient manner.Comment: 11 pages, 6 figures, 3 table

arXiv.org e-Print Archive

MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion

Author: Gui Zhipeng
Peng Dehua
Wu Huayi
Publication venue
Publication date: 07/12/2023
Field of study

As the most typical graph clustering method, spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability. Classical spectral clustering measures the edge weights of graph using pairwise Euclidean-based metric, and solves the optimal graph partition by relaxing the constraints of indicator matrix and performing Laplacian decomposition. However, Euclidean-based similarity might cause skew graph cuts when handling non-spherical data distributions, and the relaxation strategy introduces information loss. Meanwhile, spectral clustering requires specifying the number of clusters, which is hard to determine without enough prior knowledge. In this work, we leverage the path-based similarity to enhance intra-cluster associations, and propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition. This algorithm enables the identification of arbitrary shaped clusters and is robust to noise. To reduce the computational complexity of similarity calculation, we transform optimal path search into generating the maximum spanning tree (MST), and develop a fast MST (FastMST) algorithm to further improve its time-efficiency. Moreover, we define a density gradient factor (DGF) for separating the weakly connected clusters. The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition. The source code of MeanCut is available at https://github.com/ZPGuiGroupWhu/MeanCut-Clustering.Comment: 17 pages, 8 figures, 6 table

arXiv.org e-Print Archive

Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect

Author: Gui Zhipeng
Peng Dehua
Wu Huayi
Publication venue
Publication date: 07/01/2024
Field of study

The characteristics of data like distribution and heterogeneity, become more complex and counterintuitive as the dimensionality increases. This phenomenon is known as curse of dimensionality, where common patterns and relationships (e.g., internal and boundary pattern) that hold in low-dimensional space may be invalid in higher-dimensional space. It leads to a decreasing performance for the regression, classification or clustering models or algorithms. Curse of dimensionality can be attributed to many causes. In this paper, we first summarize five challenges associated with manipulating high-dimensional data, and explains the potential causes for the failure of regression, classification or clustering tasks. Subsequently, we delve into two major causes of the curse of dimensionality, distance concentration and manifold effect, by performing theoretical and empirical analyses. The results demonstrate that nearest neighbor search (NNS) using three typical distance measurements, Minkowski distance, Chebyshev distance, and cosine distance, becomes meaningless as the dimensionality increases. Meanwhile, the data incorporates more redundant features, and the variance contribution of principal component analysis (PCA) is skewed towards a few dimensions. By interpreting the causes of the curse of dimensionality, we can better understand the limitations of current models and algorithms, and drive to improve the performance of data analysis and machine learning tasks in high-dimensional space.Comment: 17 pages, 11 figure

arXiv.org e-Print Archive

Scalable manifold learning by uniform landmark sampling and constrained locally linear embedding

Author: Gui Zhipeng
Peng Dehua
Wei Wenzhang
Wu Huayi
Publication venue
Publication date: 05/01/2024
Field of study

As a pivotal approach in machine learning and data science, manifold learning aims to uncover the intrinsic low-dimensional structure within complex nonlinear manifolds in high-dimensional space. By exploiting the manifold hypothesis, various techniques for nonlinear dimension reduction have been developed to facilitate visualization, classification, clustering, and gaining key insights. Although existing manifold learning methods have achieved remarkable successes, they still suffer from extensive distortions incurred in the global structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here, we propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into the learned space based on the constrained locally linear embedding (CLLE). We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types, and applied it to analyze the single-cell transcriptomics and detect anomalies in electrocardiogram (ECG) signals. scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure. The experiments demonstrate notable robustness in embedding quality as the sample rate decreases.Comment: 33 pages, 10 figure

arXiv.org e-Print Archive

Identifying Inequality of School Facilities and Resource Distribution Problems in Abbottabad City, Pakistan through Geographical Visualization

Author: Tanveer Hafsa
Tanveer Hashir
us-Shan Rafi
Wu Huayi
Publication venue: Journal of Resources Development and Management
Publication date: 10/01/2018
Field of study

Providing quality education in isolated areas is one of the major issues in recent times, mainly in the underdeveloped countries, especially in South Asia. As the population is increasing rapidly in Pakistan, the resources cannot meet the requirements of quality education. One such example is the Abbottabad District of Pakistan with 1900 schools in 51 union councils; the government educational authorities have not established any system for their proper management and monitoring. The main reason behind the lack of resource management is the absence of effective visualization systems as well as the distance of schools from the main city. Mapping schools geographically to visualize them for analysis and managing resources is an efficient and effective way to make better decisions. The purpose of this study is to geographically identify the inequality in distribution of school’s facilities and resources that can help educational authorities to diagnose problems when making decision and managing schools. Use of static conventional paper methods has consistently lead to poor results, because it is difficult to manage large amounts of school. In this research, the complete educational data was calculated from both government and local resources and then properly arranged in a database. That database was then connected to a web server to represent it publicly on a web-based application. The resulting map represents the spatial distribution of schools, depicting the improper distribution of almost 25% of the schools in different areas of Abbottabad. This research provides evidence that using GIS aided decision support and recommendation system will facilitate in better resource planning for education in developing nations. Keywords: Geographical Visualization, Resource Management, Geographical Distribution, Abbottabad

International Institute for Science, Technology and Education (IISTE): E-Journals