2,381 research outputs found
Optimal Stratification of Univariate Populations via stratify R Package
Stratification reduces the variance of sample estimates for population parameters by creating homogeneous
strata. Often, surveyors stratify the population using the most convenient variables such
as age, sex, region, etc. Such convenient methods often do not produce internally homogeneous
strata, hence, the precision of the estimates of the variables of interest could be further improved.
This paper introduces an R-package called ’stratifyR’ whereby it proposes a method for optimal
stratification of survey populations for a univariate study variable that follows a particular distribution
estimated from a data set that is available to the surveyor. The stratification problem is
formulated as a mathematical programming problem and solved by using a dynamic programming
technique. Methods for several distributions such as uniform, weibull, gamma, normal, lognormal,
exponential, right-triangular, cauchy and pareto are presented. The package is able to construct
optimal stratification boundaries (OSB) and calculate optimal sample sizes (OSS) under Neyman
allocation. Several examples, using simulated data, are presented to illustrate the stratified designs
that can be constructed with the proposed methodology. Results reveal that the proposed method
computes OSB that are precise and comparable to the established methods. All the calculations
presented in this paper were carried out using the stratifyR package that will be made available on
CRAN
On optimum stratification
In this manuscript, we discuss the problem of determining the optimum stratification of a study (or main) variable based on the auxiliary variable that follows a uniform distribution. If the stratification of survey variable is made using the auxiliary variable it may lead to substantial gains in precision of the estimates. This problem is formulated as a Nonlinear Programming Problem (NLPP), which turn out to multistage decision problem and is solved using dynamic programming technique
Heuristic Algorithm for Univariate Stratification Problem
In sampling theory, stratification corresponds to a technique used in
surveys, which allows segmenting a population into homogeneous subpopulations
(strata) to produce statistics with a higher level of precision. In particular,
this article proposes a heuristic to solve the univariate stratification
problem - widely studied in the literature. One of its versions sets the number
of strata and the precision level and seeks to determine the limits that define
such strata to minimize the sample size allocated to the strata. A
heuristic-based on a stochastic optimization method and an exact optimization
method was developed to achieve this goal. The performance of this heuristic
was evaluated through computational experiments, considering its application in
various populations used in other works in the literature, based on 20
scenarios that combine different numbers of strata and levels of precision.
From the analysis of the obtained results, it is possible to verify that the
heuristic had a performance superior to four algorithms in the literature in
more than 94% of the cases, particularly concerning the known algorithms of
Kozak and Lavallee-Hidiroglou.Comment: 25 pages and 7 figure
Stratification of skewed populations
In this research an algorithm is derived for stratifying skewed populations which is much simpler to implement than any of those currently available. It is based on the suggestion by numerous researchers in the field that it is desirable when stratifying skewed populations to arrange for equal coefficients of variation in each subinterval. Our new algorithm makes the breaks in geometric progression and achieves near-equal stratum coefficients of variation when the populations are skewed. Simulation studies on real skewed populations have shown that the new method compares favourably to those commonly used in terms of precision of the estimator of the mean.
We also apply the geometric method to the Lavallée-Hidiroglou (1988) algorithm, an iterative method designed specifically for skewed populations. We show that by taking geometric boundaries as the starting points results in most cases in quicker convergence of the algorithm and achieves smaller sample sizes than the default starting points for the same precision.
Finally, geometric stratification is applied to the Pareto distribution, a typical model of skewed data. We show that if any finite range of this distribution is broken into a given number of strata, with boundaries obtained using geometric progression, then the stratum coefficients of variation are equal
Optimal stratification in stratified designs using weibull - distributed auxiliary information
Sampling has evolved into a universally accepted approach for gathering information and data mining as it is widely accepted that a reasonably modest-sized sample can sufficiently characterize a much larger population. In stratified sampling designs, the whole population is divided into homogeneous strata in order to achieve higher precision in the estimation. This paper proposes an efficient method of constructing optimum stratum boundaries (OSB) and determining optimum sample size (OSS) for the survey variable. The survey variable may not be available in practice since the variable of
interest is unavailable prior to conducting the survey. Thus, the method is based on the auxiliary variable which is usually readily available from past surveys. To illustrate the application as an example using a real
data, the auxiliary variable considered for this problem follows Weibull distribution. The stratification problem is formulated as a Mathematical Programming Problem (MPP) that seeks minimization of the variance of the estimated population parameter under Neyman allocation. The
solution procedure employs the dynamic programming technique, which results in substantial gains in the precision of the estimates of the population characteristics
Report of the Workshop on Survey Design and Data Analysis (WKSAD) [21- 25 June, 2004, Aberdeen, UK]
Contributors: Knut Korsbrekke, Michael Penningto
The interpretation and characterisation of lineaments identified from Landsat TM imagery of SW England
Two Landsat TM scenes of SW England and a sub-scene of North Cornwall have been
analysed visually in order to examine the effect of resolution on lineament interpretation. Images
were viewed at several different scales as a result of varying image resolution whilst maintaining a
fixed screen pixel size. Lineament analysis at each scale utilised GIS techniques and involved
several stages: initial lineament identification and digitisation; removal of lineaments related to
anthropogenic features to produce cleansed lineament maps; compilation of lineament attributes
using ARC/INFO; cluster analysis for identification of lineament directional families; and line
sampling of lineament maps in order to determine spacing.
SW England lies within the temperate zone of Europe and the extensive agricultural cover
and infrastructure conceal the underlying geology. The consequences of this for lineament
analysis were examined using sub-images of North Cornwall. Here anthropogenic features are
visible at all resolutions between 30m and 120m pixel sizes but lie outside the observation
threshold at 150m. Having confidence that lineaments at this resolution are of non-anthropogenic
origin optimises lineament identification since the image may be viewed in greater detail. On this
basis, lineament analysis of SW England was performed using image resolutions of 150m.
Valuable geological information below the observation threshold in 150m resolution images is
likely, however, to be contained in the lineament maps produced from higher resolution images.
For images analysed at higher resolutions, therefore, knowledge-based rules were established in
order to cleanse the lineament populations.
Compiled lineament maps were 'ground truthed' (primarily involving comparison with
published geological maps but included phases of field mapping) in order to characterise their
geological affinities. The major lineament trends were correlated to lithotectonic boundaries, and
cross-cutting fractures sets. Major lineament trends produced distinct frequency/orientation
maxima. Multiple minor geological structures, however, produced semi-overlapping groups. A
clustering technique was devised to resolve overlapping groups into lineament directional families.
The newly defined lineament directional families were further analysed in two ways:
(i) Analysis of the spatial density of the length and frequency of lineaments indicates that
individual and multiple lineament directional families vary spatially and are compartmentalised into
local tectonic domains, often bounded by major lineaments. Hence, such density maps provide
useful additional information about the structural framework of SW England.
(ii) Lineament spacing and length of the lineament directional families were analysed for
the effect of scale and geological causes on their frequency/size distributions. Spacing of fracture
lineaments were found to be power-law, whereas lengths showed power-law and non-power-law
distributions. Furthermore the type of frequency/size distribution for a lineament directional family
can change with increasing resolution
Sampling in the evaluation of ore deposits
Sampling is an error generating process and these errors should be reduced to a minimum if an accurate ore reserve estimation is to be made from the sample values. Error in sampling can arise from the sampling procedure as well as where and how each sample is taken from the deposit . Sampling procedure involves sample collection, sample reduction and analysis, and the error from each of these three stages has an equal influence on the total error of the process. Error due to sampling procedure should be identified and eliminated at an early stage in the evaluation programme. An ore deposit should be subdivided into sampling strata along geological boundaries, and once these boundaries have been established they should be adhered to for the evaluation programme. The sampling of each stratum depends on the small-scale structures in which the grade is distributed, and this distribution in relation to sample size controls sample variance, sample bias and the volume of influence of each sample. Cluster sampling can be used where an impractically large sample is necessary to reduce sample variance or increase the volume of influence of samples. Sample bias can be reduced by composing a large number of small samples . Sampling patterns should be designed with reference to the volumes of influence of samples, and in favourable geology, geostatistical or statistical techniques can be used to predict the precision of an ore reserve estimation 1n terms of the number of samples taken. Different are deposits have different sampling characteristics and problems which can be directly related to the geology of the mineralization. If geology is disregarded when sampling an are deposit, an evaluation programme cannot claim to give an accurate estimate of the ore reserves
- …