165 research outputs found

    Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance

    Get PDF
    Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy

    Changes of Soil Biogeochemistry under Native and Exotic Plants Species

    Get PDF
    Invasive plant species are major threats to the biodiversity and ecosystem stability. The purpose of this study is to understand the impacts of invasive plants on soil nutrient cycling and ecological functions. Soil samples were collected from rhizosphere and non-rhizosphere of both native and exotic plants from three genera, Lantana, Ficus and Schinus, at Tree Tops Park in South Florida, USA. Experimental results showed that the cultivable bacterial population in the soil under Brazilian pepper (invasive Schinus) was approximately ten times greater than all other plants. Also, Brazilian pepper lived under conditions of significantly lower available phosphorus but higher phosphatase activities than other sampled sites. Moreover, the respiration rates and soil macronutrients in rhizosphere soils of exotic plants were significantly higher than those of the natives (Phosphorus, p=0.034; Total Nitrogen, p=0.0067; Total Carbon, p=0.0243). Overall, the soil biogeochemical status under invasive plants was different from those of the natives

    Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

    Full text link
    In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty -- in the Bayesian framework, a major difficulty is that if assigning the prior associated with a pp-dimensional measure, then there is zero posterior probability on any lower-dimensional subset with dimension d<pd<p; to avoid this caveat, one needs to choose another dimension-selection prior on dd, which often involves a highly combinatorial problem. To significantly reduce the modeling burden, we propose a new generative process for the prior: starting from a continuous random variable such as multivariate Gaussian, we transform it into a varying-dimensional space using the proximal mapping. This leads to a large class of new Bayesian models that can directly exploit the popular frequentist regularizations and their algorithms, such as the nuclear norm penalty and the alternating direction method of multipliers, while providing a principled and probabilistic uncertainty estimation. We show that this framework is well justified in the geometric measure theory, and enjoys a convenient posterior computation via the standard Hamiltonian Monte Carlo. We demonstrate its use in the analysis of the dynamic flow network data.Comment: 26 pages, 4 figure

    Streamlining Social Media Information Retrieval for Public Health Research with Deep Learning

    Full text link
    The utilization of social media in epidemic surveillance has been well established. Nonetheless, bias is often introduced when pre-defined lexicons are used to retrieve relevant corpus. This study introduces a framework aimed at curating extensive dictionaries of medical colloquialisms and Unified Medical Language System (UMLS) concepts. The framework comprises three modules: a BERT-based Named Entity Recognition (NER) model that identifies medical entities from social media content, a deep-learning powered normalization module that standardizes the extracted entities, and a semi-supervised clustering module that assigns the most probable UMLS concept to each standardized entity. We applied this framework to COVID-19-related tweets from February 1, 2020, to April 30, 2022, generating a symptom dictionary (available at https://github.com/ningkko/UMLS_colloquialism/) composed of 9,249 standardized entities mapped to 876 UMLS concepts and 38,175 colloquial expressions. This framework demonstrates encouraging potential in addressing the constraints of keyword matching information retrieval in social media-based public health research.Comment: Accepted to ICHI 2023 (The 11th IEEE International Conference on Healthcare Informatics) as a poster presentatio

    Effect of germination on nutritional quality of soybean

    Get PDF
    Abstract Soybean are rich in functional nutrients such as protein, Essential amino acid, Polyunsaturated fatty acid, minerals, vitamins and dietary fibre which can help maintain optimal body weight, prevent Alzheimer and cardiovascular disease. At the same time, it also contains some anti-nutritional factors which have adverse effects on the digestion, absorption and utilization of nutrients. Cellular and metabolic events are induced by water absorption during seed germination. After absorbing water, soybean dry seeds begin to mobilize organelles and enzyme activities, repair and activate physiological and biochemical processes, and start basic metabolism. After that, by controlling the key factors of metabolism, the cells complete the process of material transformation, energy metabolism, ROS balance, cell wall acidification and looseness, hypocotyl elongation, etc. and finish germination. As a result, cell metabolism causes significant changes in proteins, lipids, sugars, vitamins and minerals, and increases in some beneficial functional nutritional factors for humans and animals, the other anti-nutritional factors decreased obviously, which greatly improved the taste quality and economic value of soybean. This paper introduced the germination physiology, functional nutrients and their metabolic pathways during soybean germination. This study has important reference value for the study of theory and practice technology of soybean germination

    Improvement of hydrothermal stability of zeolitic imidazolate frameworks

    Get PDF
    The metal-organic framework ZIF-8, which undergoes hydrolysis under hydrothermal conditions, is endowed with high water-resistance after a shell-ligand-exchange-reaction. The stabilized ZIF-8 retains its structural characteristics with improved application performances in adsorption and membrane separation. © 2013 The Royal Society of Chemistry
    corecore