23 research outputs found
Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation
A cloud server spent a lot of time, energy and money to train a Viola-Jones
type object detector with high accuracy. Clients can upload their photos to the
cloud server to find objects. However, the client does not want the leakage of
the content of his/her photos. In the meanwhile, the cloud server is also
reluctant to leak any parameters of the trained object detectors. 10 years ago,
Avidan & Butman introduced Blind Vision, which is a method for securely
evaluating a Viola-Jones type object detector. Blind Vision uses standard
cryptographic tools and is painfully slow to compute, taking a couple of hours
to scan a single image. The purpose of this work is to explore an efficient
method that can speed up the process. We propose the Random Base Image (RBI)
Representation. The original image is divided into random base images. Only the
base images are submitted randomly to the cloud server. Thus, the content of
the image can not be leaked. In the meanwhile, a random vector and the secure
Millionaire protocol are leveraged to protect the parameters of the trained
object detector. The RBI makes the integral-image enable again for the great
acceleration. The experimental results reveal that our method can retain the
detection accuracy of that of the plain vision algorithm and is significantly
faster than the traditional blind vision, with only a very low probability of
the information leakage theoretically.Comment: 6 pages, 3 figures, To appear in the proceedings of the IEEE
International Conference on Multimedia and Expo (ICME), Jul 10, 2017 - Jul
14, 2017, Hong Kong, Hong Kon
The Effects of Yoga on College Students\u27 Mental Health: A Systematic Review
The mental health of college students is an increasingly serious public health problem. Effective and healthy interventions are needed. More and more research has been conducted on yoga, but there are few randomized controlled trials (RTC) on effects of yoga intervention on students\u27 mental health. Therefore, this study examined effects of quality of yoga intervention on mental health in college students. We used PubMed (Medline), Cochrane, Web of Science, CNKI, VIP Chinese Science and Technology Journal Database (VIP) and WanFang Database to search randomized controlled trials (RCTs) of yoga intervention in college students\u27 mental health. After the screening, 17 articles met the requirements and were included along with the utilization of the Cochrane bias risk assessment tool Rob2.0 to evaluate the quality of the included articles. Of 17 articles reviewed, three articles were rated as low risk of bias , five articles were rated as possibly at risk of bias , and nine articles were rated as high risk of bias . The 17 studies predominantly consist of low methodological quality and lack multi-centered, large-sample collaborative research. Almost all researchers mentioned the use of randomization in their articles, but they did not indicate which randomization method was used. There was no description of allocation concealment, blinding, case shedding, case follow-up etc., and it was impossible to judge whether the trial design was correct, or whether random grouping was, indeed, undertaken. This study found that most of the so-called randomized controlled trials are doubtful, which virtually reduces the strength and credibility of this study. Therefore, improving the research quality of yoga intervention and standardizing the writing of scientific research articles are the problems need to be solved in the current field of sports psychology research in China. Current evidence shows that yoga exercise can relax the body and mind, thereby improving the level of mental health, and complete yoga (exercise, breathing, meditation) significantly relieves the symptoms of depression. Performing yoga postures and exercises promotes blood circulation, effectively improves sleep, and regulates breathing to stabilize autonomic nerves, relieve stress, and eliminate mental tension. In the future, yoga practice can be used as a non-medical intervention to treat mental illness. The quality of the current randomized controlled trials of yoga intervention in the mental health of college students is generally low. Randomized controlled trials with reasonable methodological design, strict implementation, and sufficient follow-up time are still needed. It is recommended that researchers should strengthen the systematic study of clinical trial methodology and strictly refer to the Cochrane manual list for clinical research reports in order to improve the quality of literature reports
ModelScope Text-to-Video Technical Report
This paper introduces ModelScopeT2V, a text-to-video synthesis model that
evolves from a text-to-image synthesis model (i.e., Stable Diffusion).
ModelScopeT2V incorporates spatio-temporal blocks to ensure consistent frame
generation and smooth movement transitions. The model could adapt to varying
frame numbers during training and inference, rendering it suitable for both
image-text and video-text datasets. ModelScopeT2V brings together three
components (i.e., VQGAN, a text encoder, and a denoising UNet), totally
comprising 1.7 billion parameters, in which 0.5 billion parameters are
dedicated to temporal capabilities. The model demonstrates superior performance
over state-of-the-art methods across three evaluation metrics. The code and an
online demo are available at
\url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary}.Comment: Technical report. Project page:
\url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary
Measuring Pointwise -Usable Information In-Context-ly
In-context learning (ICL) is a new learning paradigm that has gained
popularity along with the development of large language models. In this work,
we adapt a recently proposed hardness metric, pointwise -usable
information (PVI), to an in-context version (in-context PVI). Compared to the
original PVI, in-context PVI is more efficient in that it requires only a few
exemplars and does not require fine-tuning. We conducted a comprehensive
empirical analysis to evaluate the reliability of in-context PVI. Our findings
indicate that in-context PVI estimates exhibit similar characteristics to the
original PVI. Specific to the in-context setting, we show that in-context PVI
estimates remain consistent across different exemplar selections and numbers of
shots. The variance of in-context PVI estimates across different exemplar
selections is insignificant, which suggests that in-context PVI are stable.
Furthermore, we demonstrate how in-context PVI can be employed to identify
challenging instances. Our work highlights the potential of in-context PVI and
provides new insights into the capabilities of ICL.Comment: EMNLP 2023 Finding
Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition
Open-set action recognition is to reject unknown human action cases which are
out of the distribution of the training set. Existing methods mainly focus on
learning better uncertainty scores but dismiss the importance of feature
representations. We find that features with richer semantic diversity can
significantly improve the open-set performance under the same uncertainty
scores. In this paper, we begin with analyzing the feature representation
behavior in the open-set action recognition (OSAR) problem based on the
information bottleneck (IB) theory, and propose to enlarge the
instance-specific (IS) and class-specific (CS) information contained in the
feature for better performance. To this end, a novel Prototypical Similarity
Learning (PSL) framework is proposed to keep the instance variance within the
same class to retain more IS information. Besides, we notice that unknown
samples sharing similar appearances to known samples are easily misclassified
as known classes. To alleviate this issue, video shuffling is further
introduced in our PSL to learn distinct temporal information between original
and shuffled samples, which we find enlarges the CS information. Extensive
experiments demonstrate that the proposed PSL can significantly boost both the
open-set and closed-set performance and achieves state-of-the-art results on
multiple benchmarks. Code is available at https://github.com/Jun-CEN/PSL.Comment: To appear at CVPR202
CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation
2D RGB images and 3D LIDAR point clouds provide complementary knowledge for
the perception system of autonomous vehicles. Several 2D and 3D fusion methods
have been explored for the LIDAR semantic segmentation task, but they suffer
from different problems. 2D-to-3D fusion methods require strictly paired data
during inference, which may not be available in real-world scenarios, while
3D-to-2D fusion methods cannot explicitly make full use of the 2D information.
Therefore, we propose a Bidirectional Fusion Network with Cross-Modality
Knowledge Distillation (CMDFusion) in this work. Our method has two
contributions. First, our bidirectional fusion scheme explicitly and implicitly
enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively,
which surpasses either one of the single fusion schemes. Second, we distillate
the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D
knowledge branch) so that the 3D network can generate 2D information even for
those points not in the FOV (field of view) of the camera. In this way, RGB
images are not required during inference anymore since the 2D knowledge branch
provides 2D information according to the 3D LIDAR input. We show that our
CMDFusion achieves the best performance among all fusion-based methods on
SemanticKITTI and nuScenes datasets. The code will be released at
https://github.com/Jun-CEN/CMDFusion
Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification
Recent advances in large language models (LLMs) have shown impressive ability
in biomedical question-answering, but have not been adequately investigated for
more specific biomedical applications. This study investigates the performance
of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical
tasks beyond question-answering. Because no patient data can be passed to the
OpenAI API public interface, we evaluated model performance with over 10000
samples as proxies for two fundamental tasks in the clinical domain -
classification and reasoning. The first task is classifying whether statements
of clinical and policy recommendations in scientific literature constitute
health advice. The second task is causal relation detection from the biomedical
literature. We compared LLMs with simpler models, such as bag-of-words (BoW)
with logistic regression, and fine-tuned BioBERT models. Despite the excitement
around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks
remained the best strategy. The simple BoW model performed on par with the most
complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
A diffusion probabilistic model (DPM), which constructs a forward diffusion
process by gradually adding noise to data points and learns the reverse
denoising process to generate new samples, has been shown to handle complex
data distribution. Despite its recent success in image synthesis, applying DPMs
to video generation is still challenging due to high-dimensional data spaces.
Previous methods usually adopt a standard diffusion process, where frames in
the same video clip are destroyed with independent noises, ignoring the content
redundancy and temporal correlation. This work presents a decomposed diffusion
process via resolving the per-frame noise into a base noise that is shared
among all frames and a residual noise that varies along the time axis. The
denoising pipeline employs two jointly-learned networks to match the noise
decomposition accordingly. Experiments on various datasets confirm that our
approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based
alternatives in high-quality video generation. We further show that our
decomposed formulation can benefit from pre-trained image diffusion models and
well-support text-conditioned video creation.Comment: Accepted to CVPR202
An intensity-enhanced method for handling mobile laser scanning point clouds
Currently, mobile laser scanning (MLS) systems can conveniently and rapidly measure the backscattered laser beam properties of the object surfaces in large-scale roadway scenes. Such properties is digitalized as the intensity value stored in the acquired point cloud data, and the intensity as an important information source has been widely used in a variety of applications, including road marking inventory, manhole cover detection, and pavement inspection. However, the collected intensity is often deviated from the object reflectance due to two main factors, i.e. different scanning distances and worn-out surfaces. Therefore, in this paper, we present a new intensity-enhanced method to gradually and efficiently achieve the intensity enhancement in the MLS point clouds. Concretely, to eliminate the intensity inconsistency caused by different scanning distances, the direct relationship between scanning distance and intensity value is modeled to correct the inconsistent intensity. To handle the low contrast between 3D points with different intensities, we proposed to introduce and adapt the dark channel prior for adaptively transforming the intensity information in point cloud scenes. To remove the isolated intensity noises, multiple filters are integrated to achieve the denoising in the regions with different point densities. The evaluations of our proposed method are conducted on four MLS datasets, which are acquired at different road scenarios with different MLS systems. Extensive experiments and discussions demonstrate that the proposed method can exhibit the remarkable performance on enhancing the intensities in MLS point clouds