165 research outputs found
Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance
Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy
Changes of Soil Biogeochemistry under Native and Exotic Plants Species
Invasive plant species are major threats to the biodiversity and ecosystem stability. The purpose of this study is to understand the impacts of invasive plants on soil nutrient cycling and ecological functions. Soil samples were collected from rhizosphere and non-rhizosphere of both native and exotic plants from three genera, Lantana, Ficus and Schinus, at Tree Tops Park in South Florida, USA. Experimental results showed that the cultivable bacterial population in the soil under Brazilian pepper (invasive Schinus) was approximately ten times greater than all other plants. Also, Brazilian pepper lived under conditions of significantly lower available phosphorus but higher phosphatase activities than other sampled sites. Moreover, the respiration rates and soil macronutrients in rhizosphere soils of exotic plants were significantly higher than those of the natives (Phosphorus, p=0.034; Total Nitrogen, p=0.0067; Total Carbon, p=0.0243). Overall, the soil biogeochemical status under invasive plants was different from those of the natives
Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality
In statistical applications, it is common to encounter parameters supported
on a varying or unknown dimensional space. Examples include the fused lasso
regression, the matrix recovery under an unknown low rank, etc. Despite the
ease of obtaining a point estimate via the optimization, it is much more
challenging to quantify their uncertainty -- in the Bayesian framework, a major
difficulty is that if assigning the prior associated with a -dimensional
measure, then there is zero posterior probability on any lower-dimensional
subset with dimension ; to avoid this caveat, one needs to choose another
dimension-selection prior on , which often involves a highly combinatorial
problem. To significantly reduce the modeling burden, we propose a new
generative process for the prior: starting from a continuous random variable
such as multivariate Gaussian, we transform it into a varying-dimensional space
using the proximal mapping.
This leads to a large class of new Bayesian models that can directly exploit
the popular frequentist regularizations and their algorithms, such as the
nuclear norm penalty and the alternating direction method of multipliers, while
providing a principled and probabilistic uncertainty estimation.
We show that this framework is well justified in the geometric measure
theory, and enjoys a convenient posterior computation via the standard
Hamiltonian Monte Carlo. We demonstrate its use in the analysis of the dynamic
flow network data.Comment: 26 pages, 4 figure
Streamlining Social Media Information Retrieval for Public Health Research with Deep Learning
The utilization of social media in epidemic surveillance has been well
established. Nonetheless, bias is often introduced when pre-defined lexicons
are used to retrieve relevant corpus. This study introduces a framework aimed
at curating extensive dictionaries of medical colloquialisms and Unified
Medical Language System (UMLS) concepts. The framework comprises three modules:
a BERT-based Named Entity Recognition (NER) model that identifies medical
entities from social media content, a deep-learning powered normalization
module that standardizes the extracted entities, and a semi-supervised
clustering module that assigns the most probable UMLS concept to each
standardized entity. We applied this framework to COVID-19-related tweets from
February 1, 2020, to April 30, 2022, generating a symptom dictionary (available
at https://github.com/ningkko/UMLS_colloquialism/) composed of 9,249
standardized entities mapped to 876 UMLS concepts and 38,175 colloquial
expressions. This framework demonstrates encouraging potential in addressing
the constraints of keyword matching information retrieval in social media-based
public health research.Comment: Accepted to ICHI 2023 (The 11th IEEE International Conference on
Healthcare Informatics) as a poster presentatio
Effect of germination on nutritional quality of soybean
Abstract Soybean are rich in functional nutrients such as protein, Essential amino acid, Polyunsaturated fatty acid, minerals, vitamins and dietary fibre which can help maintain optimal body weight, prevent Alzheimer and cardiovascular disease. At the same time, it also contains some anti-nutritional factors which have adverse effects on the digestion, absorption and utilization of nutrients. Cellular and metabolic events are induced by water absorption during seed germination. After absorbing water, soybean dry seeds begin to mobilize organelles and enzyme activities, repair and activate physiological and biochemical processes, and start basic metabolism. After that, by controlling the key factors of metabolism, the cells complete the process of material transformation, energy metabolism, ROS balance, cell wall acidification and looseness, hypocotyl elongation, etc. and finish germination. As a result, cell metabolism causes significant changes in proteins, lipids, sugars, vitamins and minerals, and increases in some beneficial functional nutritional factors for humans and animals, the other anti-nutritional factors decreased obviously, which greatly improved the taste quality and economic value of soybean. This paper introduced the germination physiology, functional nutrients and their metabolic pathways during soybean germination. This study has important reference value for the study of theory and practice technology of soybean germination
Improvement of hydrothermal stability of zeolitic imidazolate frameworks
The metal-organic framework ZIF-8, which undergoes hydrolysis under hydrothermal conditions, is endowed with high water-resistance after a shell-ligand-exchange-reaction. The stabilized ZIF-8 retains its structural characteristics with improved application performances in adsorption and membrane separation. © 2013 The Royal Society of Chemistry
- …