Search CORE

23 research outputs found

Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation

Author: Chen Yingya
Ge Shiming
Jin Xin
Li Xiaodong
Song Chenggen
Yuan Peng
Zhao Geng
Publication venue
Publication date: 30/03/2017
Field of study

A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically.Comment: 6 pages, 3 figures, To appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 - Jul 14, 2017, Hong Kong, Hong Kon

arXiv.org e-Print Archive

Crossref

The Effects of Yoga on College Students\u27 Mental Health: A Systematic Review

Author: Bo Shen
Chen Ding
Yingya Pu
Yumei Jiang
Publication venue: 'IUScholarWorks'
Publication date: 01/02/2023
Field of study

The mental health of college students is an increasingly serious public health problem. Effective and healthy interventions are needed. More and more research has been conducted on yoga, but there are few randomized controlled trials (RTC) on effects of yoga intervention on students\u27 mental health. Therefore, this study examined effects of quality of yoga intervention on mental health in college students. We used PubMed (Medline), Cochrane, Web of Science, CNKI, VIP Chinese Science and Technology Journal Database (VIP) and WanFang Database to search randomized controlled trials (RCTs) of yoga intervention in college students\u27 mental health. After the screening, 17 articles met the requirements and were included along with the utilization of the Cochrane bias risk assessment tool Rob2.0 to evaluate the quality of the included articles. Of 17 articles reviewed, three articles were rated as low risk of bias , five articles were rated as possibly at risk of bias , and nine articles were rated as high risk of bias . The 17 studies predominantly consist of low methodological quality and lack multi-centered, large-sample collaborative research. Almost all researchers mentioned the use of randomization in their articles, but they did not indicate which randomization method was used. There was no description of allocation concealment, blinding, case shedding, case follow-up etc., and it was impossible to judge whether the trial design was correct, or whether random grouping was, indeed, undertaken. This study found that most of the so-called randomized controlled trials are doubtful, which virtually reduces the strength and credibility of this study. Therefore, improving the research quality of yoga intervention and standardizing the writing of scientific research articles are the problems need to be solved in the current field of sports psychology research in China. Current evidence shows that yoga exercise can relax the body and mind, thereby improving the level of mental health, and complete yoga (exercise, breathing, meditation) significantly relieves the symptoms of depression. Performing yoga postures and exercises promotes blood circulation, effectively improves sleep, and regulates breathing to stabilize autonomic nerves, relieve stress, and eliminate mental tension. In the future, yoga practice can be used as a non-medical intervention to treat mental illness. The quality of the current randomized controlled trials of yoga intervention in the mental health of college students is generally low. Randomized controlled trials with reasonable methodological design, strict implementation, and sufficient follow-up time are still needed. It is recommended that researchers should strengthen the systematic study of clinical trial methodology and strictly refer to the Cochrane manual list for clinical research reports in order to improve the quality of literature reports

Boise State University - ScholarWorks

ModelScope Text-to-Video Technical Report

Author: Chen Dayou
Wang Jiuniu
Wang Xiang
Yuan Hangjie
Zhang Shiwei
Zhang Yingya
Publication venue
Publication date: 12/08/2023
Field of study

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i.e., Stable Diffusion). ModelScopeT2V incorporates spatio-temporal blocks to ensure consistent frame generation and smooth movement transitions. The model could adapt to varying frame numbers during training and inference, rendering it suitable for both image-text and video-text datasets. ModelScopeT2V brings together three components (i.e., VQGAN, a text encoder, and a denoising UNet), totally comprising 1.7 billion parameters, in which 0.5 billion parameters are dedicated to temporal capabilities. The model demonstrates superior performance over state-of-the-art methods across three evaluation metrics. The code and an online demo are available at \url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary}.Comment: Technical report. Project page: \url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary

arXiv.org e-Print Archive

Measuring Pointwise $\mathcal{V}$ -Usable Information In-Context-ly

Author: Bitterman Danielle
Chen Shan
Gurevych Iryna
Li Yingya
Lu Sheng
Savova Guergana
Publication venue
Publication date: 08/12/2023
Field of study

In-context learning (ICL) is a new learning paradigm that has gained popularity along with the development of large language models. In this work, we adapt a recently proposed hardness metric, pointwise

\mathcal{V}

-usable information (PVI), to an in-context version (in-context PVI). Compared to the original PVI, in-context PVI is more efficient in that it requires only a few exemplars and does not require fine-tuning. We conducted a comprehensive empirical analysis to evaluate the reliability of in-context PVI. Our findings indicate that in-context PVI estimates exhibit similar characteristics to the original PVI. Specific to the in-context setting, we show that in-context PVI estimates remain consistent across different exemplar selections and numbers of shots. The variance of in-context PVI estimates across different exemplar selections is insignificant, which suggests that in-context PVI are stable. Furthermore, we demonstrate how in-context PVI can be employed to identify challenging instances. Our work highlights the potential of in-context PVI and provides new insights into the capabilities of ICL.Comment: EMNLP 2023 Finding

arXiv.org e-Print Archive

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

Author: Cen Jun
Chen Qifeng
Pei Yixuan
Qing Zhiwu
Wang Xiang
Zhang Shiwei
Zhang Yingya
Publication venue
Publication date: 25/03/2023
Field of study

Open-set action recognition is to reject unknown human action cases which are out of the distribution of the training set. Existing methods mainly focus on learning better uncertainty scores but dismiss the importance of feature representations. We find that features with richer semantic diversity can significantly improve the open-set performance under the same uncertainty scores. In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance. To this end, a novel Prototypical Similarity Learning (PSL) framework is proposed to keep the instance variance within the same class to retain more IS information. Besides, we notice that unknown samples sharing similar appearances to known samples are easily misclassified as known classes. To alleviate this issue, video shuffling is further introduced in our PSL to learn distinct temporal information between original and shuffled samples, which we find enlarges the CS information. Extensive experiments demonstrate that the proposed PSL can significantly boost both the open-set and closed-set performance and achieves state-of-the-art results on multiple benchmarks. Code is available at https://github.com/Jun-CEN/PSL.Comment: To appear at CVPR202

arXiv.org e-Print Archive

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

Author: Cen Jun
Chen Qifeng
Li Kun
Luo Maochun
Pei Yixuan
Zhang Shiwei
Zhang Yingya
Zheng Hang
Publication venue
Publication date: 09/07/2023
Field of study

2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems. 2D-to-3D fusion methods require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D fusion methods cannot explicitly make full use of the 2D information. Therefore, we propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation (CMDFusion) in this work. Our method has two contributions. First, our bidirectional fusion scheme explicitly and implicitly enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively, which surpasses either one of the single fusion schemes. Second, we distillate the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D knowledge branch) so that the 3D network can generate 2D information even for those points not in the FOV (field of view) of the camera. In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input. We show that our CMDFusion achieves the best performance among all fusion-based methods on SemanticKITTI and nuScenes datasets. The code will be released at https://github.com/Jun-CEN/CMDFusion

arXiv.org e-Print Archive

Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

Author: Aerts Hugo JWL
Bitterman Danielle S.
Chen Shan
Li Yingya
Lu Sheng
Savova Guergana K.
Van Hoang
Publication venue
Publication date: 05/04/2023
Field of study

Recent advances in large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates the performance of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical tasks beyond question-answering. Because no patient data can be passed to the OpenAI API public interface, we evaluated model performance with over 10000 samples as proxies for two fundamental tasks in the clinical domain - classification and reasoning. The first task is classifying whether statements of clinical and policy recommendations in scientific literature constitute health advice. The second task is causal relation detection from the biomedical literature. We compared LLMs with simpler models, such as bag-of-words (BoW) with logistic regression, and fine-tuned BioBERT models. Despite the excitement around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks remained the best strategy. The simple BoW model performed on par with the most complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie

arXiv.org e-Print Archive

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

Author: Chen Dayou
Huang Yan
Luo Zhengxiong
Shen Yujun
Tan Tieniu
Wang Liang
Zhang Yingya
Zhao Deli
Zhou Jingren
Publication venue
Publication date: 12/10/2023
Field of study

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-support text-conditioned video creation.Comment: Accepted to CVPR202

arXiv.org e-Print Archive

An intensity-enhanced method for handling mobile laser scanning point clouds

Author: Hao Chen
Huan Luo
Jonathon Li
Lina Fang
Yingya Guo
Publication venue: 'Elsevier BV'
Publication date: 01/03/2022
Field of study

Currently, mobile laser scanning (MLS) systems can conveniently and rapidly measure the backscattered laser beam properties of the object surfaces in large-scale roadway scenes. Such properties is digitalized as the intensity value stored in the acquired point cloud data, and the intensity as an important information source has been widely used in a variety of applications, including road marking inventory, manhole cover detection, and pavement inspection. However, the collected intensity is often deviated from the object reflectance due to two main factors, i.e. different scanning distances and worn-out surfaces. Therefore, in this paper, we present a new intensity-enhanced method to gradually and efficiently achieve the intensity enhancement in the MLS point clouds. Concretely, to eliminate the intensity inconsistency caused by different scanning distances, the direct relationship between scanning distance and intensity value is modeled to correct the inconsistent intensity. To handle the low contrast between 3D points with different intensities, we proposed to introduce and adapt the dark channel prior for adaptively transforming the intensity information in point cloud scenes. To remove the isolated intensity noises, multiple filters are integrated to achieve the denoising in the regions with different point densities. The evaluations of our proposed method are conducted on four MLS datasets, which are acquired at different road scenarios with different MLS systems. Extensive experiments and discussions demonstrate that the proposed method can exhibit the remarkable performance on enhancing the intensities in MLS point clouds

Directory of Open Access Journals

Method for windmill farm monitoring

Author: Chen Ni Ya
Chen Yao
Chioua Moncef
Yu Rongrong
Zhou Yingya
Publication venue
Publication date: 01/01/2016
Field of study

PolyPublie