1,377 research outputs found

    Delineation of high resolution climate regions over the Korean Peninsula using machine learning approaches

    Get PDF
    In this research, climate classification maps over the Korean Peninsula at 1 km resolution were generated using the satellite-based climatic variables of monthly temperature and precipitation based on machine learning approaches. Random forest (RF), artificial neural networks (ANN), k-nearest neighbor (KNN), logistic regression (LR), and support vector machines (SVM) were used to develop models. Training and validation of these models were conducted using in-situ observations from the Korea Meteorological Administration (KMA) from 2001 to 2016. The rule of the traditional Koppen-Geiger (K-G) climate classification was used to classify climate regions. The input variables were land surface temperature (LST) of the Moderate Resolution Imaging Spectroradiometer (MODIS), monthly precipitation data from the Tropical Rainfall Measuring Mission (TRMM) 3B43 product, and the Digital Elevation Map (DEM) from the Shuttle Radar Topography Mission (SRTM). The overall accuracy (OA) based on validation data from 2001 to 2016 for all models was high over 95%. DEM and minimum winter temperature were two distinct variables over the study area with particularly high relative importance. ANN produced more realistic spatial distribution of the classified climates despite having a slightly lower OA than the others. The accuracy of the models using high altitudinal in-situ data of the Mountain Meteorology Observation System (MMOS) was also assessed. Although the data length of the MMOS data was relatively short (2013 to 2017), it proved that the snowy, dry and cold winter and cool summer class (Dwc) is widely located in the eastern coastal region of South Korea. Temporal shifting of climate was examined through a comparison of climate maps produced by period: from 1950 to 2000, from 1983 to 2000, and from 2001 to 2013. A shrinking trend of snow classes (D) over the Korean Peninsula was clearly observed from the ANN-based climate classification results. Shifting trends of climate with the decrease/increase of snow (D)/temperate (C) classes were clearly shown in the maps produced using the proposed approaches, consistent with the results from the reanalysis data of the Climatic Research Unit (CRU) and Global Precipitation Climatology Centre (GPCC)

    Hospitals, Managed Care, and the Charity Caseload in California

    Get PDF
    Many observers have blamed HMOs for increasing financial pressures on private hospitals and causing them to cut back on the provision of charity care. We examine this issue using data on all hospital discharges in California between 1988 and 1996. We find that public hospitals in counties with higher HMO penetration do take on a larger share of the county's charity caseload. However, these public hospitals also take on larger shares of most other types of patients. At the hospital level, we find little evidence that either for-profit or non-profit private hospitals respond to HMO penetration by turning away uninsured and Medicaid patients. On the contrary, in the for-profit sector higher HMO penetration is linked to reductions in the share of privately insured patients in the caseload, and corresponding increases in the share of Medicare patients and Medicaid births. Since HMO penetration reduces the price paid by privately insured patients they may be less attractive to for-profit hospitals relative to the publicly insured.

    Universality in protein residue networks

    Get PDF
    Residue networks representing 595 nonhomologous proteins are studied. These networks exhibit universal topological characteristics as they belong to the topological class of modular networks formed by several highly interconnected clusters separated by topological cavities. There are some networks which tend to deviate from this universality. These networks represent small-size proteins having less than 200 residues. We explain such differences in terms of the domain structure of these proteins. On the other hand, we find that the topological cavities characterizing proteins residue networks match very well with protein binding sites. We then investigate the effect of the cutoff value used in building the residue network. For small cutoff values, less than 5Å, the cavities found are very large corresponding almost to the whole protein surface. On the contrary, for large cutoff value, more than 10.0 Å, only very large cavities are detected and the networks look very homogeneous. These findings are useful for practical purposes as well as for identifying "protein-like" complex networks. Finally, we show that the main topological class of residue networks is not reproduced by random networks growing according to Erdös-RĂ©nyi model or the preferential attachment method of BarabĂĄsi-Albert. However, the Watts-Strogatz (WS) model reproduces very well the topological class as well as other topological properties of residue network. We propose here a more biologically appealing modification of the WS model to describe residue networks

    Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery

    Get PDF
    Background: Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm. Results: Both selection algorithms assigned the highest importance score to the variables related with plant height around the grain filling stage. Some vegetation indices related variables were also selected by the algorithms mainly at earlier to mid growth stages and during the senescence. Compared with the yield prediction using all 172 variables derived from measured phenotypes, using the selected variables performed comparable or even better. We also noticed that the prediction accuracy on the adapted NE lines (r = 0.58–0.81) was higher than the other lines (r = 0.21–0.59) included in this study with different genetic backgrounds. Conclusions: With the ultra-high resolution plot imagery obtained by the UAS-based phenotyping we are now able to derive more features, such as the variation of plant height or vegetation indices within a plot other than just an averaged number, that are potentially very useful for the breeding purpose. However, too many features or variables can be derived in this way. The promising results from this study suggests that the selected set from those variables can have comparable prediction accuracies on the grain yield prediction than the full set of them but possibly resulting in a better allocation of efforts and resources on phenotypic data collection and processing

    Quantitative measures of crowd patterns in agent-based models of street protests

    Get PDF
    In this work we describe the introduction of quantitative measures of emergent crowd patterns in an existing Agent-Based model (ABM) of street protests with multiple actors (police, protester and ‘media’ agents). The model was applied to a scenario of a police force defending a government building which protesters seek to invade. The improved model provided a coherent ‘narrative’ of the simulations and highlighted the realistic and unrealistic aspects of the agents’ interactions. Two new types of police agents – ‘defensive’ and ‘offensive’ – were introduced, leading to a realistic model representation of police cordons defending a site or charging to disperse clusters of violent protesters. The new quantitative measures provided information on cluster size and orientation of clusters of violent protesters, as well as police coverage and protester breaching of the defensive perimeter, together with the time history of the bursts of localized fights and arrests. It was shown how the quantitative measures of the emergent properties can be used for both parameterization and validation of the model.info:eu-repo/semantics/acceptedVersio

    Dynamic User Segmentation and Usage Profiling

    Full text link
    Usage data of a group of users distributed across a number of categories, such as songs, movies, webpages, links, regular household products, mobile apps, games, etc. can be ultra-high dimensional and massive in size. More often this kind of data is categorical and sparse in nature making it even more difficult to interpret any underlying hidden patterns such as clusters of users. However, if this information can be estimated accurately, it will have huge impacts in different business areas such as user recommendations for apps, songs, movies, and other similar products, health analytics using electronic health record (EHR) data, and driver profiling for insurance premium estimation or fleet management. In this work, we propose a clustering strategy of such categorical big data, utilizing the hidden sparsity of the dataset. Most traditional clustering methods fail to give proper clusters for such data and end up giving one big cluster with small clusters around it irrespective of the true structure of the data clusters. We propose a feature transformation, which maps the binary-valued usage vector to a lower dimensional continuous feature space in terms of groups of usage categories, termed as covariate classes. The lower dimensional feature representations in terms of covariate classes can be used for clustering. We implemented the proposed strategy and applied it to a large sized very high-dimensional song playlist dataset for the performance validation. The results are impressive as we achieved similar-sized user clusters with minimal between-cluster overlap in the feature space (8%) on average). As the proposed strategy has a very generic framework, it can be utilized as the analytic engine of many of the above-mentioned business use cases allowing an intelligent and dynamic personal recommendation system or a support system for smart business decision-making
    • 

    corecore