27 research outputs found
MedLens: Improve mortality prediction via medical signs selecting and regression interpolation
Monitoring the health status of patients and predicting mortality in advance
is vital for providing patients with timely care and treatment. Massive medical
signs in electronic health records (EHR) are fitted into advanced machine
learning models to make predictions. However, the data-quality problem of
original clinical signs is less discussed in the literature. Based on an
in-depth measurement of the missing rate and correlation score across various
medical signs and a large amount of patient hospital admission records, we
discovered the comprehensive missing rate is extremely high, and a large number
of useless signs could hurt the performance of prediction models. Then we
concluded that only improving data-quality could improve the baseline accuracy
of different prediction algorithms. We designed MEDLENS, with an automatic
vital medical signs selection approach via statistics and a flexible
interpolation approach for high missing rate time series. After augmenting the
data-quality of original medical signs, MEDLENS applies ensemble classifiers to
boost the accuracy and reduce the computation overhead at the same time. It
achieves a very high accuracy performance of 0.96% AUC-ROC and 0.81% AUC-PR,
which exceeds the previous benchmark
An Integrative Paradigm for Enhanced Stroke Prediction: Synergizing XGBoost and xDeepFM Algorithms
Stroke prediction plays a crucial role in preventing and managing this
debilitating condition. In this study, we address the challenge of stroke
prediction using a comprehensive dataset, and propose an ensemble model that
combines the power of XGBoost and xDeepFM algorithms. Our work aims to improve
upon existing stroke prediction models by achieving higher accuracy and
robustness. Through rigorous experimentation, we validate the effectiveness of
our ensemble model using the AUC metric. Through comparing our findings with
those of other models in the field, we gain valuable insights into the merits
and drawbacks of various approaches. This, in turn, contributes significantly
to the progress of machine learning and deep learning techniques specifically
in the domain of stroke prediction
Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems
Recommender systems are expected to be assistants that help human users find
relevant information automatically without explicit queries. As recommender
systems evolve, increasingly sophisticated learning techniques are applied and
have achieved better performance in terms of user engagement metrics such as
clicks and browsing time. The increase in the measured performance, however,
can have two possible attributions: a better understanding of user preferences,
and a more proactive ability to utilize human bounded rationality to seduce
user over-consumption. A natural following question is whether current
recommendation algorithms are manipulating user preferences. If so, can we
measure the manipulation level? In this paper, we present a general framework
for benchmarking the degree of manipulations of recommendation algorithms, in
both slate recommendation and sequential recommendation scenarios. The
framework consists of four stages, initial preference calculation, training
data collection, algorithm training and interaction, and metrics calculation
that involves two proposed metrics. We benchmark some representative
recommendation algorithms in both synthetic and real-world datasets under the
proposed framework. We have observed that a high online click-through rate does
not necessarily mean a better understanding of user initial preference, but
ends in prompting users to choose more documents they initially did not favor.
Moreover, we find that the training data have notable impacts on the
manipulation degrees, and algorithms with more powerful modeling abilities are
more sensitive to such impacts. The experiments also verified the usefulness of
the proposed metrics for measuring the degree of manipulations. We advocate
that future recommendation algorithm studies should be treated as an
optimization problem with constrained user preference manipulations.Comment: 33 pages, 11 figures, 4 tables, ACM Transactions on Information
System
Large-scale Interactive Recommendation with Tree-structured Policy Gradient
Reinforcement learning (RL) has recently been introduced to interactive
recommender systems (IRS) because of its nature of learning from dynamic
interactions and planning for long-run performance. As IRS is always with
thousands of items to recommend (i.e., thousands of actions), most existing
RL-based methods, however, fail to handle such a large discrete action space
problem and thus become inefficient. The existing work that tries to deal with
the large discrete action space problem by utilizing the deep deterministic
policy gradient framework suffers from the inconsistency between the continuous
action representation (the output of the actor network) and the real discrete
action. To avoid such inconsistency and achieve high efficiency and
recommendation effectiveness, in this paper, we propose a Tree-structured
Policy Gradient Recommendation (TPGR) framework, where a balanced hierarchical
clustering tree is built over the items and picking an item is formulated as
seeking a path from the root to a certain leaf of the tree. Extensive
experiments on carefully-designed environments based on two real-world datasets
demonstrate that our model provides superior recommendation performance and
significant efficiency improvement over state-of-the-art methods
U-rank: Utility-oriented Learning to Rank with Implicit Feedback
Learning to rank with implicit feedback is one of the most important tasks in
many real-world information systems where the objective is some specific
utility, e.g., clicks and revenue. However, we point out that existing methods
based on probabilistic ranking principle do not necessarily achieve the highest
utility. To this end, we propose a novel ranking framework called U-rank that
directly optimizes the expected utility of the ranking list. With a
position-aware deep click-through rate prediction model, we address the
attention bias considering both query-level and item-level features. Due to the
item-specific attention bias modeling, the optimization for expected utility
corresponds to a maximum weight matching on the item-position bipartite graph.
We base the optimization of this objective in an efficient Lambdaloss
framework, which is supported by both theoretical and empirical analysis. We
conduct extensive experiments for both web search and recommender systems over
three benchmark datasets and two proprietary datasets, where the performance
gain of U-rank over state-of-the-arts is demonstrated. Moreover, our proposed
U-rank has been deployed on a large-scale commercial recommender and a large
improvement over the production baseline has been observed in an online A/B
testing
How Can Recommender Systems Benefit from Large Language Models: A Survey
Recommender systems (RS) play important roles to match users' information
needs for Internet applications. In natural language processing (NLP) domains,
large language model (LLM) has shown astonishing emergent abilities (e.g.,
instruction following, reasoning), thus giving rise to the promising research
direction of adapting LLM to RS for performance enhancements and user
experience improvements. In this paper, we conduct a comprehensive survey on
this research direction from an application-oriented view. We first summarize
existing research works from two orthogonal perspectives: where and how to
adapt LLM to RS. For the "WHERE" question, we discuss the roles that LLM could
play in different stages of the recommendation pipeline, i.e., feature
engineering, feature encoder, scoring/ranking function, and pipeline
controller. For the "HOW" question, we investigate the training and inference
strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to
tune LLMs or not, and whether to involve conventional recommendation model
(CRM) for inference. Detailed analysis and general development trajectories are
provided for both questions, respectively. Then, we highlight key challenges in
adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and
ethics. Finally, we summarize the survey and discuss the future prospects. We
also actively maintain a GitHub repository for papers and other related
resources in this rising direction:
https://github.com/CHIANGEL/Awesome-LLM-for-RecSys.Comment: 15 pages; 3 figures; summarization table in appendi
Adaptive optimal output regulation for wheel-legged robot Ollie: A data-driven approach
The dynamics of a robot may vary during operation due to both internal and external factors, such as non-ideal motor characteristics and unmodeled loads, which would lead to control performance deterioration and even instability. In this paper, the adaptive optimal output regulation (AOOR)-based controller is designed for the wheel-legged robot Ollie to deal with the possible model uncertainties and disturbances in a data-driven approach. We test the AOOR-based controller by forcing the robot to stand still, which is a conventional index to judge the balance controller for two-wheel robots. By online training with small data, the resultant AOOR achieves the optimality of the control performance and stabilizes the robot within a small displacement in rich experiments with different working conditions. Finally, the robot further balances a rolling cylindrical bottle on its top with the balance control using the AOOR, but it fails with the initial controller. Experimental results demonstrate that the AOOR-based controller shows the effectiveness and high robustness with model uncertainties and external disturbances
A functional variant in the Stearoyl-CoA desaturase gene promoter enhances fatty acid desaturation in pork
There is growing public concern about reducing saturated fat intake. Stearoyl-CoA desaturase (SCD) is the lipogenic enzyme responsible for the biosynthesis of oleic acid (18:1) by desaturating stearic acid (18:0). Here we describe a total of 18 mutations in the promoter and 3′ non-coding region of the pig SCD gene and provide evidence that allele T at AY487830:g.2228T>C in the promoter region enhances fat desaturation (the ratio 18:1/18:0 in muscle increases from 3.78 to 4.43 in opposite homozygotes) without affecting fat content (18:0+18:1, intramuscular fat content, and backfat thickness). No mutations that could affect the functionality of the protein were found in the coding region. First, we proved in a purebred Duroc line that the C-T-A haplotype of the 3 single nucleotide polymorphisms (SNPs) (g.2108C>T; g.2228T>C; g.2281A>G) of the promoter region was additively associated to enhanced 18:1/18:0 both in muscle and subcutaneous fat, but not in liver. We show that this association was consistent over a 10-year period of overlapping generations and, in line with these results, that the C-T-A haplotype displayed greater SCD mRNA expression in muscle. The effect of this haplotype was validated both internally, by comparing opposite homozygote siblings, and externally, by using experimental Duroc-based crossbreds. Second, the g.2281A>G and the g.2108C>T SNPs were excluded as causative mutations using new and previously published data, restricting the causality to g.2228T>C SNP, the last source of genetic variation within the haplotype. This mutation is positioned in the core sequence of several putative transcription factor binding sites, so that there are several plausible mechanisms by which allele T enhances 18:1/18:0 and, consequently, the proportion of monounsaturated to saturated fat.This research was supported by grants from the Spanish Ministry of Science and Innovation (AGL2009-09779 and AGL2012-33529). RRF is recipient of a PhD scholarship from the Spanish Ministry of Science and Innovation (BES-2010-034607). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of manuscript