69 research outputs found
Robust distance correlation for variable screening
High-dimensional data are commonly seen in modern statistical applications,
variable selection methods play indispensable roles in identifying the critical
features for scientific discoveries. Traditional best subset selection methods
are computationally intractable with a large number of features, while
regularization methods such as Lasso, SCAD and their variants perform poorly in
ultrahigh-dimensional data due to low computational efficiency and unstable
algorithm. Sure screening methods have become popular alternatives by first
rapidly reducing the dimension using simple measures such as marginal
correlation then applying any regularization methods. A number of screening
methods for different models or problems have been developed, however, none of
the methods have targeted at data with heavy tailedness, which is another
important characteristics of modern big data. In this paper, we propose a
robust distance correlation (``RDC'') based sure screening method to perform
screening in ultrahigh-dimensional regression with heavy-tailed data. The
proposed method shares the same good properties as the original model-free
distance correlation based screening while has additional merit of robustly
estimating the distance correlation when data is heavy-tailed and improves the
model selection performance in screening. We conducted extensive simulations
under different scenarios of heavy tailedness to demonstrate the advantage of
our proposed procedure as compared to other existing model-based or model-free
screening procedures with improved feature selection and prediction
performance. We also applied the method to high-dimensional heavy-tailed RNA
sequencing (RNA-seq) data of The Cancer Genome Atlas (TCGA) pancreatic cancer
cohort and RDC was shown to outperform the other methods in prioritizing the
most essential and biologically meaningful genes
Bayesian indicator variable selection of multivariate response with heterogeneous sparsity for multi-trait fine mapping
Variable selection has been played a critical role in contemporary statistics
and scientific discoveries. Numerous regularization and Bayesian variable
selection methods have been developed in the past two decades for variable
selection, but they mainly target at only one response. As more data being
collected nowadays, it is common to obtain and analyze multiple correlated
responses from the same study. Running separate regression for each response
ignores their correlation thus multivariate analysis is recommended. Existing
multivariate methods select variables related to all responses without
considering the possible heterogeneous sparsity of different responses, i.e.
some features may only predict a subset of responses but not the rest. In this
paper, we develop a novel Bayesian indicator variable selection method in
multivariate regression model with a large number of grouped predictors
targeting at multiple correlated responses with possibly heterogeneous sparsity
patterns. The method is motivated by the multi-trait fine mapping problem in
genetics to identify the variants that are causal to multiple related traits.
Our new method is featured by its selection at individual level, group level as
well as specific to each response. In addition, we propose a new concept of
subset posterior inclusion probability for inference to prioritize predictors
that target at subset(s) of responses. Extensive simulations with varying
sparsity and heterogeneity levels and dimension have shown the advantage of our
method in variable selection and prediction performance as compared to existing
general Bayesian multivariate variable selection methods and Bayesian fine
mapping methods. We also applied our method to a real data example in imaging
genetics and identified important causal variants for brain white matter
structural change in different regions.Comment: 29 pages, 3 figure
Shear strength recovery of sand with self-healing polymeric capsules
© 2024 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/Self-healing approaches are increasingly being explored in various fields as a potential method to recover damaged material properties. By self-recovering without external intervention, self-healing techniques emerge as a potential solution to arrest or prevent the development of large strains problems in soils (e.g., landslides) and other ground effects that influence the serviceability of structures (e.g., differential settlement). In this study, a microcapsule-based self-healing sand was developed, and its performance during mixing and compaction, shearing, and recovery of shear strength was demonstrated. The cargo used for sand improvement, a hardening oil, tung oil, was encapsulated in calcium alginate capsules by the ionic gelation method. The surface properties, internal structure, thermal stability and molecular structure of the capsules were evaluated by advanced material characterization techniques. The survivability of capsules during mixing and compaction was assessed by measuring the content of tung oil released into the sand, while their influence on sand shear strength and its recovery was assessed with shear box tests. The results showed that the capsules could rupture due to movement of the sand particles, releasing the tung oil cargo, leading to its hardening and minimizing its strain-softening response and enhancing up to 76% of the sand shear strength (at a normal stress of 10 kPa and capsules content of 4%). This study demonstrates the potential of a capsules-based self-healing system to provide ‘smart’ autonomous soil strength recovery and thus with potential to actively control the large strain behavior of soils.Peer reviewe
MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis
Aspect-based sentiment analysis is a long-standing research interest in the
field of opinion mining, and in recent years, researchers have gradually
shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA
tasks. However, the datasets currently used in the research are limited to
individual elements of specific tasks, usually focusing on in-domain settings,
ignoring implicit aspects and opinions, and with a small data scale. To address
these issues, we propose a large-scale Multi-Element Multi-Domain dataset
(MEMD) that covers the four elements across five domains, including nearly
20,000 review sentences and 30,000 quadruples annotated with explicit and
implicit aspects and opinions for ABSA research. Meanwhile, we evaluate
generative and non-generative baselines on multiple ABSA subtasks under the
open domain setting, and the results show that open domain ABSA as well as
mining implicit aspects and opinions remain ongoing challenges to be addressed.
The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}
Shelf Life Prediction of UHT Milk Packaging Based on BP Neural Network
To investigate the effects of initial protein, fat content, and storage temperature on the shelf life of UHT pure milk packaging, three types of UHT pure milk were used as research objects to experimentally measure sample browning index and protein hydrolysis index during storage at 23, 30, and 37 ℃. Integrate the dataset and determine specific input parameters based on its performance on the prediction set, and carry out UHT pure milk packaging shelf life prediction based on BP neural network. The results showed that the fitting degrees of the BP neural network model for the browning index and protein hydrolysis index of UHT milk were 0.9412 and 0.9527, respectively, and compared with traditional multiple linear regression model’s number of 0.8799 and 0.9211, the BP neural network model with optimized hidden layer neuron numbers had higher prediction accuracy for the changes in characteristic indicators during the storage period of UHT pure milk, providing technical support for rapid and accurate prediction of the shelf life of UHT pure milk with different formulas
Evaluation of Changes in the Characteristic Flavor of Ultra-high Temperature Sterilized Milk under the Effects of Temperature and Light
In order to study changes in the characteristic flavor of ultra-high temperature sterilized (UHT) milk under the influence of storage temperature and light, headspace solid phase microextraction (SPME) combined with gas chromatography-mass spectrometry (GC-MS) was used to detect the volatile flavor components of the product. Descriptive sensory evaluation, orthogonal partial least squares-discriminant analysis (OPLS-DA) and entropy weight method were used to determine the relationship between major characteristic flavors and characteristic substances. The effects of temperature and light flux on the flavor changes of different formulations of UHT milk were analyzed, and a model for comprehensive analysis of the characteristic flavors of UHT milk was developed based on the effects of initial unsaturated fatty acid content, temperature and light flux. The results of this research provide support for the quality control of different formulations of UHT milk
Evaluating the causal effect of tobacco smoking on white matter brain aging: a two-sample Mendelian randomization analysis in UK Biobank.
BACKGROUND AND AIMS: Tobacco smoking is a risk factor for impaired brain function, but its causal effect on white matter brain aging remains unclear. This study aimed to measure the causal effect of tobacco smoking on white matter brain aging.
DESIGN: Mendelian randomization (MR) analysis using two non-overlapping data sets (with and without neuroimaging data) from UK Biobank (UKB). The group exposed to smoking and control group consisted of current smokers and never smokers, respectively. Our main method was generalized weighted linear regression with other methods also included as sensitivity analysis.
SETTING: United Kingdom.
PARTICIPANTS: The study cohort included 23 624 subjects [10 665 males and 12 959 females with a mean age of 54.18 years, 95% confidence interval (CI) = 54.08, 54.28].
MEASUREMENTS: Genetic variants were selected as instrumental variables under the MR analysis assumptions: (1) associated with the exposure; (2) influenced outcome only via exposure; and (3) not associated with confounders. The exposure smoking status (current versus never smokers) was measured by questionnaires at the initial visit (2006-10). The other exposure, cigarettes per day (CPD), measured the average number of cigarettes smoked per day for current tobacco users over the life-time. The outcome was the \u27brain age gap\u27 (BAG), the difference between predicted brain age and chronological age, computed by training machine learning model on a non-overlapping set of never smokers.
FINDINGS: The estimated BAG had a mean of 0.10 (95% CI = 0.06, 0.14) years. The MR analysis showed evidence of positive causal effect of smoking behaviors on BAG: the effect of smoking is 0.21 (in years, 95% CI = 6.5 × 10
CONCLUSIONS: There appears to be a significant causal effect of smoking on the brain age gap, which suggests that smoking prevention can be an effective intervention for accelerated brain aging and the age-related decline in cognitive function
Elevated Blood Pressure Accelerates White Matter Brain Aging Among Late Middle-Aged Women: A Mendelian Randomization Study in the UK Biobank
BACKGROUND: Elevated blood pressure (BP) is a modifiable risk factor associated with cognitive impairment and cerebrovascular diseases. However, the causal effect of BP on white matter brain aging remains unclear.
METHODS: In this study, we focused on N  = 228 473 individuals of European ancestry who had genotype data and clinical BP measurements available (103 929 men and 124 544 women, mean age = 56.49, including 16 901 participants with neuroimaging data available) collected from UK Biobank (UKB). We first established a machine learning model to compute the outcome variable brain age gap (BAG) based on white matter microstructure integrity measured by fractional anisotropy derived from diffusion tensor imaging data. We then performed a two-sample Mendelian randomization analysis to estimate the causal effect of BP on white matter BAG in the whole population and subgroups stratified by sex and age brackets using two nonoverlapping data sets.
RESULTS: The hypertension group is on average 0.31 years (95% CI = 0.13-0.49; P  \u3c 0.0001) older in white matter brain age than the nonhypertension group. Women are on average 0.81 years (95% CI = 0.68-0.95; P  \u3c 0.0001) younger in white matter brain age than men. The Mendelian randomization analyses showed an overall significant positive causal effect of DBP on white matter BAG (0.37 years/10 mmHg, 95% CI 0.034-0.71, P  = 0.0311). In stratified analysis, the causal effect was found most prominent among women aged 50-59 and aged 60-69.
CONCLUSION: High BP can accelerate white matter brain aging among late middle-aged women, providing insights on planning effective control of BP for women in this age group
- …