2,534,277 research outputs found
Regional surname affinity: a spatial network approach
OBJECTIVE
We investigate surname affinities among areas of modern‐day China, by constructing a spatial network, and making community detection. It reports a geographical genealogy of the Chinese population that is result of population origins, historical migrations, and societal evolutions.
MATERIALS AND METHODS
We acquire data from the census records supplied by China's National Citizen Identity Information System, including the surname and regional information of 1.28 billion registered Chinese citizens. We propose a multilayer minimum spanning tree (MMST) to construct a spatial network based on the matrix of isonymic distances, which is often used to characterize the dissimilarity of surname structure among areas. We use the fast unfolding algorithm to detect network communities.
RESULTS
We obtain a 10‐layer MMST network of 362 prefecture nodes and 3,610 edges derived from the matrix of the Euclidean distances among these areas. These prefectures are divided into eight groups in the spatial network via community detection. We measure the partition by comparing the inter‐distances and intra‐distances of the communities and obtain meaningful regional ethnicity classification.
DISCUSSION
The visualization of the resulting communities on the map indicates that the prefectures in the same community are usually geographically adjacent. The formation of this partition is influenced by geographical factors, historic migrations, trade and economic factors, as well as isolation of culture and language. The MMST algorithm proves to be effective in geo‐genealogy and ethnicity classification for it retains essential information about surname affinity and highlights the geographical consanguinity of the population.National Natural Science Foundation of China, Grant/Award Numbers: 61773069, 71731002; National Social Science Foundation of China, Grant/Award Number: 14BSH024; Foundation of China of China Scholarships Council, Grant/Award Numbers: 201606045048, 201706040188, 201706040015; DOE, Grant/Award Number: DE-AC07-05Id14517; DTRA, Grant/Award Number: HDTRA1-14-1-0017; NSF, Grant/Award Numbers: CHE-1213217, CMMI-1125290, PHY-1505000 (61773069 - National Natural Science Foundation of China; 71731002 - National Natural Science Foundation of China; 14BSH024 - National Social Science Foundation of China; 201606045048 - Foundation of China of China Scholarships Council; 201706040188 - Foundation of China of China Scholarships Council; 201706040015 - Foundation of China of China Scholarships Council; DE-AC07-05Id14517 - DOE; HDTRA1-14-1-0017 - DTRA; CHE-1213217 - NSF; CMMI-1125290 - NSF; PHY-1505000 - NSF)Published versio
Multi-Step Processing of Spatial Joins
Spatial joins are one of the most important operations for combining spatial objects of several relations. In this paper, spatial join processing is studied in detail for extended spatial objects in twodimensional data space. We present an approach for spatial join processing that is based on three steps. First, a spatial join is performed on the minimum bounding rectangles of the objects returning a set of candidates. Various approaches for accelerating this step of join processing have been examined at the last year’s conference [BKS 93a]. In this paper, we focus on the problem how to compute the answers from the set of candidates which is handled by
the following two steps. First of all, sophisticated approximations
are used to identify answers as well as to filter out false hits from
the set of candidates. For this purpose, we investigate various types
of conservative and progressive approximations. In the last step, the
exact geometry of the remaining candidates has to be tested against
the join predicate. The time required for computing spatial join
predicates can essentially be reduced when objects are adequately
organized in main memory. In our approach, objects are first decomposed
into simple components which are exclusively organized
by a main-memory resident spatial data structure. Overall, we
present a complete approach of spatial join processing on complex
spatial objects. The performance of the individual steps of our approach
is evaluated with data sets from real cartographic applications.
The results show that our approach reduces the total execution
time of the spatial join by factors
Spatial adaptation in heteroscedastic regression: Propagation approach
The paper concerns the problem of pointwise adaptive estimation in regression
when the noise is heteroscedastic and incorrectly known. The use of the local
approximation method, which includes the local polynomial smoothing as a
particular case, leads to a finite family of estimators corresponding to
different degrees of smoothing. Data-driven choice of localization degree in
this case can be understood as the problem of selection from this family. This
task can be performed by a suggested in Katkovnik and Spokoiny (2008) FLL
technique based on Lepski's method. An important issue with this type of
procedures - the choice of certain tuning parameters - was addressed in
Spokoiny and Vial (2009). The authors called their approach to the parameter
calibration "propagation". In the present paper the propagation approach is
developed and justified for the heteroscedastic case in presence of the noise
misspecification. Our analysis shows that the adaptive procedure allows a
misspecification of the covariance matrix with a relative error of order
1/log(n), where n is the sample size.Comment: 47 pages. This is the final version of the paper published in at
http://dx.doi.org/10.1214/08-EJS180 the Electronic Journal of Statistics
(http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Estimating Travel Cost Model: Spatial Approach
travel cost model, spatial analysis, Environmental Economics and Policy,
Spatial Weighting Matrix Selection in Spatial Lag Econometric Model
This paper investigates the choice of spatial weighting matrix in a spatial lag model framework. In the empirical literature the choice of spatial weighting matrix has been characterized by a great deal of arbitrariness. The number of possible spatial weighting matrices is large, which until recently was considered to prevent investigation into the appropriateness of the empirical choices. Recently Kostov (2010) proposed a new approach that transforms the problem into an equivalent variable selection problem. This article expands the latter transformation approach into a two-step selection procedure. The proposed approach aims at reducing the arbitrariness in the selection of spatial weighting matrix in spatial econometrics. This allows for a wide range of variable selection methods to be applied to the high dimensional problem of selection of spatial weighting matrix. The suggested approach consists of a screening step that reduces the number of candidate spatial weighting matrices followed by an estimation step selecting the final model. An empirical application of the proposed methodology is presented. In the latter a range of different combinations of screening and estimation methods are employed and found to produce similar results. The proposed methodology is shown to be able to approximate and provide indications to what the ‘true’ spatial weighting matrix could be even when it is not amongst the considered alternatives. The similarity in results obtained using different methods suggests that their relative computational costs could be primary reasons for their choice. Some further extensions and applications are also discussed
Modified Linear Projection for Large Spatial Data Sets
Recent developments in engineering techniques for spatial data collection
such as geographic information systems have resulted in an increasing need for
methods to analyze large spatial data sets. These sorts of data sets can be
found in various fields of the natural and social sciences. However, model
fitting and spatial prediction using these large spatial data sets are
impractically time-consuming, because of the necessary matrix inversions.
Various methods have been developed to deal with this problem, including a
reduced rank approach and a sparse matrix approximation. In this paper, we
propose a modification to an existing reduced rank approach to capture both the
large- and small-scale spatial variations effectively. We have used simulated
examples and an empirical data analysis to demonstrate that our proposed
approach consistently performs well when compared with other methods. In
particular, the performance of our new method does not depend on the dependence
properties of the spatial covariance functions.Comment: 29 pages, 5 figures, 4 table
Recover Fine-Grained Spatial Data from Coarse Aggregation
In this paper, we study a new type of spatial sparse recovery problem, that
is to infer the fine-grained spatial distribution of certain density data in a
region only based on the aggregate observations recorded for each of its
subregions. One typical example of this spatial sparse recovery problem is to
infer spatial distribution of cellphone activities based on aggregate mobile
traffic volumes observed at sparsely scattered base stations. We propose a
novel Constrained Spatial Smoothing (CSS) approach, which exploits the local
continuity that exists in many types of spatial data to perform sparse recovery
via finite-element methods, while enforcing the aggregated observation
constraints through an innovative use of the ADMM algorithm. We also improve
the approach to further utilize additional geographical attributes. Extensive
evaluations based on a large dataset of phone call records and a demographical
dataset from the city of Milan show that our approach significantly outperforms
various state-of-the-art approaches, including Spatial Spline Regression (SSR).Comment: Accepted by ICDM 2017, 6 page
- …
