109 research outputs found
Mobile AR Depth Estimation: Challenges & Prospects -- Extended Version
Metric depth estimation plays an important role in mobile augmented reality
(AR). With accurate metric depth, we can achieve more realistic user
interactions such as object placement and occlusion detection. While
specialized hardware like LiDAR demonstrates its promise, its restricted
availability, i.e., only on selected high-end mobile devices, and performance
limitations such as range and sensitivity to the environment, make it less
ideal. Monocular depth estimation, on the other hand, relies solely on mobile
cameras, which are ubiquitous, making it a promising alternative for mobile AR.
In this paper, we investigate the challenges and opportunities of achieving
accurate metric depth estimation in mobile AR. We tested four different
state-of-the-art monocular depth estimation models on a newly introduced
dataset (ARKitScenes) and identified three types of challenges: hard-ware,
data, and model related challenges. Furthermore, our research provides
promising future directions to explore and solve those challenges. These
directions include (i) using more hardware-related information from the mobile
device's camera and other available sensors, (ii) capturing high-quality data
to reflect real-world AR scenarios, and (iii) designing a model architecture to
utilize the new information
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Diffusion models (LDMs) have achieved remarkable results in
synthesizing high-resolution images. However, the iterative sampling process is
computationally intensive and leads to slow generation. Inspired by Consistency
Models (song et al.), we propose Latent Consistency Models (LCMs), enabling
swift inference with minimal steps on any pre-trained LDMs, including Stable
Diffusion (rombach et al). Viewing the guided reverse diffusion process as
solving an augmented probability flow ODE (PF-ODE), LCMs are designed to
directly predict the solution of such ODE in latent space, mitigating the need
for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently
distilled from pre-trained classifier-free guided diffusion models, a
high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training.
Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method
that is tailored for fine-tuning LCMs on customized image datasets. Evaluation
on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve
state-of-the-art text-to-image generation performance with few-step inference.
Project Page: https://latent-consistency-models.github.io
Melodic Phrase Segmentation By Deep Neural Networks
Automated melodic phrase detection and segmentation is a classical task in
content-based music information retrieval and also the key towards automated
music structure analysis. However, traditional methods still cannot satisfy
practical requirements. In this paper, we explore and adapt various neural
network architectures to see if they can be generalized to work with the
symbolic representation of music and produce satisfactory melodic phrase
segmentation. The main issue of applying deep-learning methods to phrase
detection is the sparse labeling problem of training sets. We proposed two
tailored label engineering with corresponding training techniques for different
neural networks in order to make decisions at a sequential level. Experiment
results show that the CNN-CRF architecture performs the best, being able to
offer finer segmentation and faster to train, while CNN, Bi-LSTM-CNN and
Bi-LSTM-CRF are acceptable alternatives
An edge cloud and Fibonacci-Diffie-Hellman encryption scheme for secure printer data transmission
Network printers face increasing security threats from network attacks that can lead to sensitive information leakage and data tampering. To address these risks, we propose a novel Fibonacci-Diffie-Hellman (FIB-DH) encryption scheme using edge cloud collaboration. Our approach utilizes properties of third-order Fibonacci matrices combined with the Diffie-Hellman key exchange to encrypt printer data transmissions. The encrypted data is transmitted via edge cloud servers and verified by the receiver using inverse Fibonacci transforms. Our experiments demonstrate that the FIB-DH scheme can effectively improve printer data transmission security against common attacks compared to conventional methods. The results show reduced vulnerabilities to leakage and tampering attacks in our approach. This work provides an innovative application of cryptographic techniques to strengthen security for network printer communications
Association of Geriatric Nutritional Risk Index with Mortality in Hemodialysis Patients: A Meta-Analysis of Cohort Studies
Background/Aims: Geriatric nutritional risk index (GNRI) was developed as a “nutrition-related” risk index and was reported in different populations as associated with the risk of all-cause and cardiovascular morbidity and mortality. Therefore, GNRI can be used to classify patients according to a risk of complications in relation to conditions associated with protein-energy wasting (PEW). However, not all reports pointed to the prognostic ability of the GNRI. The purpose of this study was to assess the associations of GNRI with mortality in chronic hemodialysis patients. Methods: We electronically searched original articles published in peer-reviewed journals from their inception to September 2018 in The PubMed, Embase, and the Cochrane Library databases. The primary outcome was all-cause and cardiovascular mortality. We pooled unadjusted and adjusted odds ratios (ORs) with 95% confidence intervals (95% CIs) using Review Manager 5.3 software. Results: A total of 10,739 patients from 19 cohort studies published from 2010 to 2018 were included. A significant negative association was found between the GNRI and all-cause mortality in patients with chronic hemodialysis (OR, 0.90; 95% CI, 0.84-0.97, p=0.004) (per unit increase) and (OR, 2.15; 95% CI, 1.88-2.46, p<0.00001) (low vs. high GNRI). Moreover, there was also a significant negative association between the GNRI (per unit increase) and cardiovascular events (OR, 0.98; 95% CI, 0.97-1.00, p=0.01), as well as cardiovascular mortality (OR, 0.89; 95% CI, 0.80-0.99, p=0.03). Conclusion: Our findings supported the hypothesis that the low GNRI is associated with an increased risk of all-cause and cardiovascular mortality in chronic hemodialysis patients. Based on our literature review, GNRI has been found to be an effective tool for identifying patients with nutrition-related risk of all-cause and cardiovascular disease
- …