3,152 research outputs found
Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification
High spatial resolution (1â5 m) remotely sensed datasets are increasingly being used to map land covers over large geographic areas using supervised machine learning algorithms. Although many studies have compared machine learning classification methods, sample selection methods for acquiring training and validation data for machine learning, and cross-validation techniques for tuning classifier parameters are rarely investigated, particularly on large, high spatial resolution datasets. This work, therefore, examines four sample selection methodsâsimple random, proportional stratified random, disproportional stratified random, and deliberative samplingâas well as three cross-validation tuning approachesâk-fold, leave-one-out, and Monte Carlo methods. In addition, the effect on the accuracy of localizing sample selections to a small geographic subset of the entire area, an approach that is sometimes used to reduce costs associated with training data collection, is investigated. These methods are investigated in the context of support vector machines (SVM) classification and geographic object-based image analysis (GEOBIA), using high spatial resolution National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters, covering a 2,609 km2 regional-scale area in northeastern West Virginia, USA. Stratified-statistical-based sampling methods were found to generate the highest classification accuracy. Using a small number of training samples collected from only a subset of the study area provided a similar level of overall accuracy to a sample of equivalent size collected in a dispersed manner across the entire regional-scale dataset. There were minimal differences in accuracy for the different cross-validation tuning methods. The processing time for Monte Carlo and leave-one-out cross-validation were high, especially with large training sets. For this reason, k-fold cross-validation appears to be a good choice. Classifications trained with samples collected deliberately (i.e., not randomly) were less accurate than classifiers trained from statistical-based samples. This may be due to the high positive spatial autocorrelation in the deliberative training set. Thus, if possible, samples for training should be selected randomly; deliberative samples should be avoided
The Preventive Effects of Arrest on Intimate Partner Violence: Research, Policy and Theory
This research addresses the limitations of prior analyses and reviews of five experiments testing for the specific deterrent effect of arrest on intimate partner violence by applying to individual level data consistent eligibility criteria, common independent and outcome measures, and appropriate statistical tests. Based on 4,032 cases involving adult males who assaulted their female intimate partners, multivariate regression analyses show consistent but modest reductions in subsequent offenses targeting the original victim that is attributable to arresting the suspect. Although the reductions attributable to arrest are similar across all five studies, other factors, such as the suspect\u27s prior arrest record, are stronger predictors of subsequent offenses. The effect of arrest is also modest compared with the general decline in offenses toward the same victim during the follow-up period
Role of the Euclidean Signature in Lattice Calculations of Quasidistributions and Other Nonlocal Matrix Elements
Lattice quantum chromodynamics (QCD) provides the only known systematic, nonperturbative method for first-principles calculations of nucleon structure. However, for quantities such as light-front parton distribution functions (PDFs) and generalized parton distributions (GPDs), the restriction to Euclidean time prevents direct calculation of the desired observable. Recently, progress has been made in relating these quantities to matrix elements of spatially nonlocal, zero-time operators, referred to as quasidistributions. Still, even for these time-independent matrix elements, potential subtleties have been identified in the role of the Euclidean signature. In this work, we investigate the analytic behavior of spatially nonlocal correlation functions and demonstrate that the matrix elements obtained from Euclidean lattice QCD are identical to those obtained using the Lehmann-Symanzik-Zimmermann reduction formula in Minkowski space. After arguing the equivalence on general grounds, we also show that it holds in a perturbative calculation, where special care is needed to identify the lattice prediction. Finally we present a proof of the uniqueness of the matrix elements obtained from Minkowski and Euclidean correlation functions to all order in perturbation theory
Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data
The size of the training data set is a major determinant of classification accuracy. Neverthe- less, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algo- rithms applied to classify large-area high-spatial-resolution (HR) (1â5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project
PDFs in small boxes
PDFs can be studied directly using lattice QCD by evaluating matrix elements
of non-local operators. A number of groups are pursuing numerical calculations
and investigating possible systematic uncertainties. One systematic that has
received less attention is the effect of calculating in a finite spacetime
volume. Here we present first attempts to assess the role of the finite volume
for spatially non-local operators. We find that these matrix elements may
suffer from large finite-volume artifacts and more careful investigation is
needed.Comment: 6 pages, 3 figures, Conference: The 36th Annual International
Symposium on Lattice Field Theory - LATTICE2018, 22-28 July, 2018, Michigan
State University, East Lansing, Michigan, US
Variation of malaria transmission and morbidity with altitude in Tanzania and with introduction of alphacypermethrin treated nets
BACKGROUND: Highland areas with naturally less intense malaria transmission may provide models of how lowland areas might become if transmission was permanently reduced by sustained vector control. It has been argued that vector control should not be attempted in areas of intense transmission. METHODS: Mosquitoes were sampled with light traps, pyrethrum spray and window exit traps. They were tested by ELISA for sporozoites. Incidence of malaria infection was measured by clearing existing infections from children with chlorproguanil-dapsone and then taking weekly blood samples. Prevalence of malaria infection and fever, anaemia and splenomegaly were measured in children of different age groups. All these measurements were made in highland and lowland areas of Tanzania before and after provision of bednets treated with alphacypermethrin. RESULTS: Entomological inoculation rates (EIR) were about 17 times greater in a lowland than a highland area, but incidence of infection only differed by about 2.5 times. Malaria morbidity was significantly less prevalent in the highlands than the lowlands. Treated nets in the highlands and lowlands led to 69â75% reduction in EIR. Malaria morbidity showed significant decline in younger children at both altitudes after introduction of treated nets. In children aged 6â12 the decline was only significant in the highlands CONCLUSIONS: There was no evidence that the health benefits to young children due to the nets in the lowlands were "paid for" by poorer health later in life. Our data support the idea of universal provision of treated nets, not a focus on areas of natural hypo-endemicity
Non-equilibrium dynamics and entropy production in the human brain
Living systems operate out of thermodynamic equilibrium at small scales,
consuming energy and producing entropy in the environment in order to perform
molecular and cellular functions. However, it remains unclear whether
non-equilibrium dynamics manifest at macroscopic scales, and if so, how such
dynamics support higher-order biological functions. Here we present a framework
to probe for non-equilibrium dynamics by quantifying entropy production in
macroscopic systems. We apply our method to the human brain, an organ whose
immense metabolic consumption drives a diverse range of cognitive functions.
Using whole-brain imaging data, we demonstrate that the brain fundamentally
operates out of equilibrium at large scales. Moreover, we find that the brain
produces more entropy -- operating further from equilibrium -- when performing
physically and cognitively demanding tasks. By simulating an Ising model, we
show that macroscopic non-equilibrium dynamics can arise from asymmetries in
the interactions at the microscale. Together, these results suggest that
non-equilibrium dynamics are vital for cognition, and provide a general tool
for quantifying the non-equilibrium nature of macroscopic systems.Comment: 18 pages, 14 figure
Large-Area, High Spatial Resolution Land Cover Mapping Using Random Forests, GEOBIA, and NAIP Orthophotography: Findings and Recommendations
Despite the need for quality land cover information, large-area, high spatial resolution land cover mapping has proven to be a difficult task for a variety of reasons including large data volumes, complexity of developing training and validation datasets, data availability, and heterogeneity in data and landscape conditions. We investigate the use of geographic object-based image analysis (GEOBIA), random forest (RF) machine learning, and National Agriculture Imagery Program (NAIP) orthophotography for mapping general land cover across the entire state of West Virginia, USA, an area of roughly 62,000 km2. We obtained an overall accuracy of 96.7% and a Kappa statistic of 0.886 using a combination of NAIP orthophotography and ancillary data. Despite the high overall classification accuracy, some classes were difficult to differentiate, as highlight by the low userâs and producerâs accuracies for the barren, impervious, and mixed developed classes. In contrast, forest, low vegetation, and water were generally mapped with accuracy. The inclusion of ancillary data and first- and second-order textural measures generally improved classification accuracy whereas band indices and object geometric measures were less valuable. Including super-object attributes improved the classification slightly; however, this increased the computational time and complexity. From the findings of this research and previous studies, recommendations are provided for mapping large spatial extents
- âŠ