95 research outputs found
Gait-Based Smart Pairing System for Personal Wearable Devices
With the rapid development of embedded technology and mobile computing, we have seen a growing number of Internet of Things (IoT) devices on the market. As the number of wearable devices belonging to the same user increases rapidly, secure pairing between legitimate devices becomes an important research problem. In this chapter, we propose the first gait-based shared key generation system that assists two devices to generate a common secure key by exploiting the user’s unique walking pattern. The system is based on the fact that sensors on different positions of the same user exhibit similar accelerometer signal when the user is walking. Therefore, the acceleration can be used as a shared secret information to generate a common key on different devices independently. Our experimental results show that the key generated by two independent devices on the same body is able to achieve 100% bit agreement rate. The proposed key generation protocol can establish a 128-bit key in 5 s (about 10 steps) with entropy varying from 0.93 to 1. We also find that the proposed scheme can run in real time on modern smartphone and require low system cost
Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation
In the video recommendation, watch time is commonly adopted as an indicator
of user interest. However, watch time is not only influenced by the matching of
users' interests but also by other factors, such as duration bias and noisy
watching. Duration bias refers to the tendency for users to spend more time on
videos with longer durations, regardless of their actual interest level. Noisy
watching, on the other hand, describes users taking time to determine whether
they like a video or not, which can result in users spending time watching
videos they do not like. Consequently, the existence of duration bias and noisy
watching make watch time an inadequate label for indicating user interest.
Furthermore, current methods primarily address duration bias and ignore the
impact of noisy watching, which may limit their effectiveness in uncovering
user interest from watch time. In this study, we first analyze the generation
mechanism of users' watch time from a unified causal viewpoint. Specifically,
we considered the watch time as a mixture of the user's actual interest level,
the duration-biased watch time, and the noisy watch time. To mitigate both the
duration bias and noisy watching, we propose Debiased and Denoised watch time
Correction (DCo), which can be divided into two steps: First, we employ a
duration-wise Gaussian Mixture Model plus frequency-weighted moving average for
estimating the bias and noise terms; then we utilize a sensitivity-controlled
correction function to separate the user interest from the watch time, which is
robust to the estimation error of bias and noise terms. The experiments on two
public video recommendation datasets and online A/B testing indicate the
effectiveness of the proposed method.Comment: Accepted by Recsys'2
Dexmedetomidine Versus Propofol Sedation Improves Sublingual Microcirculation After Cardiac Surgery: A Randomized Controlled Trial
ObjectivesTo compare the effects of dexmedetomidine and propofol on sublingual microcirculation in patients after cardiac surgery.DesignA prospective, randomized, single-blind study.SettingUniversity hospital.ParticipantsAdult patients undergoing elective valve surgery with cardiopulmonary bypass.InterventionsOn arrival in the intensive care unit (ICU), patients were assigned randomly to receive either dexmedetomidine (0.2-1.5 μg/kg/h) or propofol (5-50 μg/kg/min) with open-label titration to a target Richmond Agitation-Sedation Scale of 0 to –3.Measurements and Main ResultsSublingual microcirculation was recorded using sidestream dark-field imaging at ICU admission (baseline [T1]) and 4 hours (T2) and 24 hours after ICU admission (T3). At T2, median changes in perfused small-vessel density and the De Backer score from baseline were significantly greater in the dexmedetomidine group (n = 29) than in the propofol group (n = 32) (1.3 v 0 mm/mm2, p = 0.025; 0.9 v –0.1/mm, p = 0.005, respectively); median changes in small-vessel density and the proportion of perfused small vessels from baseline also tended to be higher in the dexmedetomidine group compared with the propofol group (1.0 v –0.1 mm/mm2, p = 0.050; 2.1% v 0.5%, p = 0.062, respectively). At T3, there still was a trend toward greater improvements in the small vessel density, proportion of perfused small-vessels, perfused small-vessel density, and De Backer score from baseline in the dexmedetomidine group than in the propofol group.ConclusionsThis trial demonstrated that dexmedetomidine sedation may be better able to improve microcirculation in cardiac surgery patients during the early postoperative period compared with propofol
FlashDecoding++: Faster Large Language Model Inference on GPUs
As the Large Language Model (LLM) becomes increasingly important in various
domains. However, the following challenges still remain unsolved in
accelerating LLM inference: (1) Synchronized partial softmax update. The
softmax operation requires a synchronized update operation among each partial
softmax result, leading to ~20% overheads for the attention computation in
LLMs. (2) Under-utilized computation of flat GEMM. The shape of matrices
performing GEMM in LLM inference is flat, leading to under-utilized computation
and >50% performance loss after padding zeros in previous designs. (3)
Performance loss due to static dataflow. Kernel performance in LLM depends on
varied input data features, hardware configurations, etc. A single and static
dataflow may lead to a 50.25% performance loss for GEMMs of different shapes in
LLM inference.
We present FlashDecoding++, a fast LLM inference engine supporting mainstream
LLMs and hardware back-ends. To tackle the above challenges, FlashDecoding++
creatively proposes: (1) Asynchronized softmax with unified max value.
FlashDecoding++ introduces a unified max value technique for different partial
softmax computations to avoid synchronization. (2) Flat GEMM optimization with
double buffering. FlashDecoding++ points out that flat GEMMs with different
shapes face varied bottlenecks. Then, techniques like double buffering are
introduced. (3) Heuristic dataflow with hardware resource adaptation.
FlashDecoding++ heuristically optimizes dataflow using different hardware
resource considering input dynamics. Due to the versatility of optimizations in
FlashDecoding++, FlashDecoding++ can achieve up to 4.86x and 2.18x speedup on
both NVIDIA and AMD GPUs compared to Hugging Face implementations.
FlashDecoding++ also achieves an average speedup of 1.37x compared to
state-of-the-art LLM inference engines on mainstream LLMs
- …