110 research outputs found
Suitability of Mycorrhiza-Defective Rice and Its Progenitor for Studies on the Control of Nitrogen Loss in Paddy Fields via Arbuscular Mycorrhiza
Employing mycorrhiza-defective mutants and their progenitors does not require inoculation or elimination of the resident microbial community in the experimental study of mycorrhizal soil ecology. We aimed to examine the suitability of mycorrhiza-defective rice (non-mycorrhizal, Oryza sativa L., cv. Nipponbare) and its progenitor (mycorrhizal) to evaluate nitrogen (N) loss control from paddy fields via arbuscular mycorrhizal (AM) fungi. We grew the two rice lines in soils with the full community of AM fungi and investigated root AM colonization. In the absence of AM fungi, we estimated rice N content, soil N concentration and microbial community on the basis of phospholipid fatty acids; we also quantified N loss via NH3 volatilization, N2O emission, runoff and leaching. In the presence of AM fungi, we did not find any evidence of AM colonization for non-mycorrhizal rice while mycorrhizal rice was colonized and percentage of root colonization was 17–24%. In the absence of AM fungi, the two rice lines had similar N content, soil N concentration and microbial community. Importantly, there was no significant difference in N loss via all the four pathways between mycorrhizal and non-mycorrhizal systems. This mycorrhizal/non-mycorrhizal rice pair is suitable for further research on the role of AM fungi in the control of soil N loss in paddy fields
DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization
Since American Sign Language (ASL) has no standard written form, Deaf signers
frequently share videos in order to communicate in their native language.
However, since both hands and face convey critical linguistic information in
signed languages, sign language videos cannot preserve signer privacy. While
signers have expressed interest, for a variety of applications, in sign
language video anonymization that would effectively preserve linguistic
content, attempts to develop such technology have had limited success, given
the complexity of hand movements and facial expressions. Existing approaches
rely predominantly on precise pose estimations of the signer in video footage
and often require sign language video datasets for training. These requirements
prevent them from processing videos 'in the wild,' in part because of the
limited diversity present in current sign language video datasets. To address
these limitations, our research introduces DiffSLVA, a novel methodology that
utilizes pre-trained large-scale diffusion models for zero-shot text-guided
sign language video anonymization. We incorporate ControlNet, which leverages
low-level image features such as HED (Holistically-Nested Edge Detection)
edges, to circumvent the need for pose estimation. Additionally, we develop a
specialized module dedicated to capturing facial expressions, which are
critical for conveying essential linguistic information in signed languages. We
then combine the above methods to achieve anonymization that better preserves
the essential linguistic content of the original signer. This innovative
methodology makes possible, for the first time, sign language video
anonymization that could be used for real-world applications, which would offer
significant benefits to the Deaf and Hard-of-Hearing communities. We
demonstrate the effectiveness of our approach with a series of signer
anonymization experiments.Comment: Project webpage: https://github.com/Jeffery9707/DiffSLV
A multimodal spatio-temporal GCN model with enhancements for isolated sign recognition
We propose a multimodal network using skeletons and handshapes as input to recognize individual signs and detect their boundaries in American Sign Language (ASL) videos. Our method integrates a spatio-temporal Graph Convolutional Network (GCN) architecture to estimate human skeleton keypoints; it uses a late-fusion approach for both forward and backward processing of video streams. Our (core) method is designed for the extraction---and analysis of features from---ASL videos, to enhance accuracy and efficiency of recognition of individual signs. A Gating module based on per-channel multi-layer convolutions is employed to evaluate significant frames for recognition of isolated signs. Additionally, an auxiliary multimodal branch network, integrated with a transformer, is designed to estimate the linguistic start and end frames of an isolated sign within a video clip. We evaluated performance of our approach on multiple datasets that include isolated, citation-form signs and signs pre-segmented from continuous signing based on linguistic annotations of start and end points of signs within sentences. We have achieved very promising results when using both types of sign videos combined for training, with overall sign recognition accuracy of 80.8% Top-1 and 95.2% Top-5 for citation-form signs, and 80.4% Top-1 and 93.0% Top-5 for signs pre-segmented from continuous signing.SUB00002686 - Rutgers, The State University of New Jersey; IIS-2212302 - National Science FoundationPublished versio
Diffusion models for Sign Language video anonymization
Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, this does not preserve privacy. Since critical linguistic information is transmitted through facial expressions, the face cannot be obscured. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success and generally require pose estimation that cannot be readily carried out in the wild. To address current limitations, our research introduces DiffSLVA, a novel methodology that uses pre-trained large-scale diffusion models for text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module to capture linguistically essential facial expressions. We then combine the above methods to achieve anonymization that preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities.SUB00002686 - Rutgers, The State University of New Jersey; IIS-2212302 - National Science FoundationPublished versio
DiffSLVA: harnessing diffusion models for sign language video anonymization
Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, since both hands and face convey critical linguistic information in signed languages, sign language videos cannot preserve signer privacy. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success, given the complexity of hand movements and facial expressions. Existing approaches rely predominantly on precise pose estimations of the signer in video footage and often require sign language video datasets for training. These requirements prevent them from processing videos 'in the wild,' in part because of the limited diversity present in current sign language video datasets. To address these limitations, our research introduces DiffSLVA, a novel methodology that utilizes pre-trained large-scale diffusion models for zero-shot text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module dedicated to capturing facial expressions, which are critical for conveying essential linguistic information in signed languages. We then combine the above methods to achieve anonymization that better preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities. We demonstrate the effectiveness of our approach with a series of signer anonymization experiments.Accepted manuscrip
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Video creation has become increasingly popular, yet the expertise and effort
required for editing often pose barriers to beginners. In this paper, we
explore the integration of large language models (LLMs) into the video editing
workflow to reduce these barriers. Our design vision is embodied in LAVE, a
novel system that provides LLM-powered agent assistance and language-augmented
editing features. LAVE automatically generates language descriptions for the
user's footage, serving as the foundation for enabling the LLM to process
videos and assist in editing tasks. When the user provides editing objectives,
the agent plans and executes relevant actions to fulfill them. Moreover, LAVE
allows users to edit videos through either the agent or direct UI manipulation,
providing flexibility and enabling manual refinement of agent actions. Our user
study, which included eight participants ranging from novices to proficient
editors, demonstrated LAVE's effectiveness. The results also shed light on user
perceptions of the proposed LLM-assisted editing paradigm and its impact on
users' creativity and sense of co-creation. Based on these findings, we propose
design implications to inform the future development of agent-assisted content
editing.Comment: Paper accepted to the ACM Conference on Intelligent User Interfaces
(ACM IUI) 202
Nonreciprocal spontaneous parametric process
Mediated by the interaction with quantum vacuum fields, a laser field
propagating in a nonlinear optical medium can generate new light fields via
spontaneous parametric process. Such process is inherent independent of the
propagation direction of light and reciprocal thus far, due to the
direction-independent field-vacuum interaction. In this work, we experimentally
demonstrate a nonreciprocal spontaneous parametric four-wave mixing process in
sodium atomic vapors with dispersive nonlinearity and further broadband optical
isolation by unidirectionally coupling the probe field to an auxiliary quantum
vacuum field in another four-wave mixing process. Thanks to the broad bandwidth
of the spontaneous parametric process, in combination with the Doppler and
power-induced broadening of atomic energy levels, we achieve optical isolation
with a bandwidth larger than 100 GHz for isolation ratio >25 dB. Considering
that both spontaneous parametric processes and wave mixing in nonlinear medium
have been realized in diverse on-chip photonic platforms, our work paves the
way for integrated broadband optical isolations and thus can boost scalability
and function of photonic chips.Comment: 17 pages, 4 figure
Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning
Despite the success of fully-supervised human skeleton sequence modeling,
utilizing self-supervised pre-training for skeleton sequence representation
learning has been an active field because acquiring task-specific skeleton
annotations at large scales is difficult. Recent studies focus on learning
video-level temporal and discriminative information using contrastive learning,
but overlook the hierarchical spatial-temporal nature of human skeletons.
Different from such superficial supervision at the video level, we propose a
self-supervised hierarchical pre-training scheme incorporated into a
hierarchical Transformer-based skeleton sequence encoder (Hi-TRS), to
explicitly capture spatial, short-term, and long-term temporal dependencies at
frame, clip, and video levels, respectively. To evaluate the proposed
self-supervised pre-training scheme with Hi-TRS, we conduct extensive
experiments covering three skeleton-based downstream tasks including action
recognition, action detection, and motion prediction. Under both supervised and
semi-supervised evaluation protocols, our method achieves the state-of-the-art
performance. Additionally, we demonstrate that the prior knowledge learned by
our model in the pre-training stage has strong transfer capability for
different downstream tasks.Comment: Accepted to ECCV 202
Sign language video anonymization
Deaf signers who wish to communicate in their native language frequently share videos on the Web. However, videos cannot preserve privacy—as is often desirable for discussion of sensitive topics—since both hands and face convey critical linguistic information and therefore cannot be obscured without degrading communication. Deaf signers have expressed interest in video anonymization that would preserve linguistic content. However, attempts to develop such technology have thus far shown limited success. We are developing a new method for such anonymization, with input from ASL signers. We modify a motion-based image animation model to generate high-resolution videos with the signer identity changed, but with preservation of linguistically significant motions and facial expressions. An asymmetric encoder-decoder structured image generator is used to generate the high-resolution target frame from the low-resolution source frame based on the optical flow and confidence map. We explicitly guide the model to attain clear generation of hands and face by using bounding boxes to improve the loss computation. FID and KID scores are used for evaluation of the realism of the generated frames. This technology shows great potential for practical applications to benefit deaf signers.Published versio
American Sign Language video anonymization to support online participation of Deaf and Hard of Hearing users
Without a commonly accepted writing system for American Sign Language (ASL), Deaf or Hard of Hearing (DHH) ASL signers who wish to express opinions or ask questions online must post a video of their signing, if they prefer not to use written English, a language in which they may feel less proficient. Since the face conveys essential linguistic meaning, the face cannot simply be removed from the video in order to preserve anonymity. Thus, DHH ASL signers cannot easily discuss sensitive, personal, or controversial topics in their primary language, limiting engagement in online debate or inquiries about health or legal issues. We explored several recent attempts to address this problem through development of “face swap” technologies to automatically disguise the face in videos while preserving essential facial expressions and natural human appearance. We presented several prototypes to DHH ASL signers (N=16) and examined their interests in and requirements for such technology. After viewing transformed videos of other signers and of themselves, participants evaluated the understandability, naturalness of appearance, and degree of anonymity protection of these technologies. Our study revealed users’ perception of key trade-offs among these three dimensions, factors that contribute to each, and their views on transformation options enabled by this technology, for use in various contexts. Our findings guide future designers of this technology and inform selection of applications and design features.Accepted manuscrip
- …