171 research outputs found
A Probabilistic Embedding Clustering Method for Urban Structure Detection
Urban structure detection is a basic task in urban geography. Clustering is a
core technology to detect the patterns of urban spatial structure, urban
functional region, and so on. In big data era, diverse urban sensing datasets
recording information like human behaviour and human social activity, suffer
from complexity in high dimension and high noise. And unfortunately, the
state-of-the-art clustering methods does not handle the problem with high
dimension and high noise issues concurrently. In this paper, a probabilistic
embedding clustering method is proposed. Firstly, we come up with a
Probabilistic Embedding Model (PEM) to find latent features from high
dimensional urban sensing data by learning via probabilistic model. By latent
features, we could catch essential features hidden in high dimensional data
known as patterns; with the probabilistic model, we can also reduce uncertainty
caused by high noise. Secondly, through tuning the parameters, our model could
discover two kinds of urban structure, the homophily and structural
equivalence, which means communities with intensive interaction or in the same
roles in urban structure. We evaluated the performance of our model by
conducting experiments on real-world data and experiments with real data in
Shanghai (China) proved that our method could discover two kinds of urban
structure, the homophily and structural equivalence, which means clustering
community with intensive interaction or under the same roles in urban space.Comment: 6 pages, 7 figures, ICSDM201
Revealing intra-urban spatial structure through an exploratory analysis by combining road network abstraction model and taxi trajectory data
The unprecedented urbanization in China has dramatically changed the urban
spatial structure of cities. With the proliferation of individual-level
geospatial big data, previous studies have widely used the network abstraction
model to reveal the underlying urban spatial structure. However, the
construction of network abstraction models primarily focuses on the topology of
the road network without considering individual travel flows along with the
road networks. Individual travel flows reflect the urban dynamics, which can
further help understand the underlying spatial structure. This study therefore
aims to reveal the intra-urban spatial structure by integrating the road
network abstraction model and individual travel flows. To achieve this goal, we
1) quantify the spatial interaction relatedness of road segments based on the
Word2Vec model using large volumes of taxi trip data, then 2) characterize the
road abstraction network model according to the identified spatial interaction
relatedness, and 3) implement a community detection algorithm to reveal
sub-regions of a city. Our results reveal three levels of hierarchical spatial
structures in the Wuhan metropolitan area. This study provides a data-driven
approach to the investigation of urban spatial structure via identifying
traffic interaction patterns on the road network, offering insights to urban
planning practice and transportation management
Conflating point of interest (POI) data: A systematic review of matching methods
Point of interest (POI) data provide digital representations of places in the
real world, and have been increasingly used to understand human-place
interactions, support urban management, and build smart cities. Many POI
datasets have been developed, which often have different geographic coverages,
attribute focuses, and data quality. From time to time, researchers may need to
conflate two or more POI datasets in order to build a better representation of
the places in the study areas. While various POI conflation methods have been
developed, there lacks a systematic review, and consequently, it is difficult
for researchers new to POI conflation to quickly grasp and use these existing
methods. This paper fills such a gap. Following the protocol of Preferred
Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we conduct a
systematic review by searching through three bibliographic databases using
reproducible syntax to identify related studies. We then focus on a main step
of POI conflation, i.e., POI matching, and systematically summarize and
categorize the identified methods. Current limitations and future opportunities
are discussed afterwards. We hope that this review can provide some guidance
for researchers interested in conflating POI datasets for their research
Recommended from our members
Crowdsourced Data Mining for Urban Activity: A Review of Data Sources, Applications and Methods
The penetration of devices integrated with location-based services and internet services has generated massive data about the everyday life of citizens and tracked their activities happening in cities. Crowdsourced data, such as social media data, POIs data and collaborative websites, generated by the crowd, has become fine-grained proxy data of urban activity and widely used in research in urban studies. However, due to the heterogeneity of data types of crowdsourced data and the limitation of previous studies mainly focusing on a specific application, a systematic review of crowdsourced data mining for urban activity is still lacking. In order to fill the gap, this paper conducts a literature search in the Web of Science database, selecting 226 highly related papers published between 2013 and 2019. Based on those papers, the review firstly conducts a bibliometric analysis identifying underpinning domains, pivot scholars and papers around this topic. The review also synthesises previous research into three parts: main applications of different data sources and data fusion; application of spatial analysis in mobility patterns, functional areas and event detection; application of socio-demographic and perception analysis in city attractiveness, demographic characteristics and sentiment analysis. The challenges of this type of data are also discussed in the end. This study provides a systematic and current review for both researchers and practitioners interested in the applications of crowdsourced data mining for urban activity.This research is funded by a scholarship from the China Scholarship Counci
A Data-driven, High-performance and Intelligent CyberInfrastructure to Advance Spatial Sciences
abstract: In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of EO data including remote sensing data and other sensor observation data about earthquake, climate, ocean, hydrology, volcano, glacier, etc., are being collected on a daily basis by a wide range of organizations. In addition to the observation data, human-generated data including microblogs, photos, consumption records, evaluations, unstructured webpages and other Volunteered Geographical Information (VGI) are incessantly generated and shared on the Internet.
Meanwhile, the emerging cyberinfrastructure rapidly increases our capacity for handling such massive data with regard to data collection and management, data integration and interoperability, data transmission and visualization, high-performance computing, etc. Cyberinfrastructure (CI) consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high-performance networks to improve research productivity and enable breakthroughs that are not otherwise possible.
The Geospatial CI (GCI, or CyberGIS), as the synthesis of CI and GIScience has inherent advantages in enabling computationally intensive spatial analysis and modeling (SAM) and collaborative geospatial problem solving and decision making.
This dissertation is dedicated to addressing several critical issues and improving the performance of existing methodologies and systems in the field of CyberGIS. My dissertation will include three parts: The first part is focused on developing methodologies to help public researchers find appropriate open geo-spatial datasets from millions of records provided by thousands of organizations scattered around the world efficiently and effectively. Machine learning and semantic search methods will be utilized in this research. The second part develops an interoperable and replicable geoprocessing service by synthesizing the high-performance computing (HPC) environment, the core spatial statistic/analysis algorithms from the widely adopted open source python package – Python Spatial Analysis Library (PySAL), and rich datasets acquired from the first research. The third part is dedicated to studying optimization strategies for feature data transmission and visualization. This study is intended for solving the performance issue in large feature data transmission through the Internet and visualization on the client (browser) side.
Taken together, the three parts constitute an endeavor towards the methodological improvement and implementation practice of the data-driven, high-performance and intelligent CI to advance spatial sciences.Dissertation/ThesisDoctoral Dissertation Geography 201
a framework to explore correlations between space-based and place-based user-generated content
Tang, V., & Painho, M. (2023). Content-location relationships: a framework to explore correlations between space-based and place-based user-generated content. International Journal Of Geographical Information Science, 37(8), 1840–1871. https://doi.org/10.1080/13658816.2023.2213869 ---The authors acknowledge the funding from the Portuguese national funding agency for science, research and technology (Fundação para a Ciência e a Tecnologia – FCT) through the CityMe project (EXPL/GES-URB/1429/2021; https://cityme.novaims.unl.pt/) and the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.The use of social media and location-based networks through GPS-enabled devices provides geospatial data for a plethora of applications in urban studies. However, the extent to which information found in geo-tagged social media activity corresponds to the spatial context is still a topic of debate. In this article, we developed a framework aimed at retrieving the thematic and spatial relationships between content originated from space-based (Twitter) and place-based (Google Places and OSM) sources of geographic user-generated content based on topics identified by the embedding-based BERTopic model. The contribution of the framework lies on the combination of methods that were selected to improve previous works focused on content-location relationships. Using the city of Lisbon (Portugal) to test our methodology, we first applied the embedding-based topic model to aggregated textual data coming from each source. Results of the analysis evidenced the complexity of content-location relationships, which are mostly based on thematic profiles. Nonetheless, the framework can be employed in other cities and extended with other metrics to enrich the research aimed at exploring the correlation between online discourse and geography.publishersversionpublishe
- …