18 research outputs found
The reactions of ruthenium (ii) polypyridyl complexes
Ruthenium (II) polypyridine complexes in general have been extensively studied because of their unique redox and photochemical properties. A typical example of such complexes is tris(2,2â-bipyridyl) ruthenium (II). In this study, this complex was synthesized and then characterized using electronic spectroscopy and cyclic voltammetry. It was also shown that the ruthenium concentration could be accurately determined using ICP-MS. It was found that the complex is very stable in various chemical environments. It was observed from spectrophotometric investigations that persulphate and lead dioxide easily oxidize Ru(bpy)3 2+ to Ru(bpy)3 3+ in the presence of heat and H2SO4, respectively. It was also observed that the oxidation between Ru(bpy)3 2+ and cerium (IV) occurred at approximately 3:2 [Ce(IV)]/[Ru(II)] mole ratio. The resultant Ru(bpy)3 3+ solution was unstable in the presence of light and recovery of Ru(bpy)3 2+ occurred gradually. The regeneration of Ru(bpy)3 2+ from Ru(bpy)3 3+ was found to be a multistep process, which appears to involve the formation of an intermediate species. The following reaction model was found to best explain the kinetic data obtained: Ru(bpy)3 2+ + Ce(IV) â Ru(bpy)3 3+ Ru(bpy)3 3+ â Ru(bpy)3 2+ Ru(bpy)3 3+ â Ru* intermediate Ru* intermediate â Ru(bpy)3 2+ Theoretical rate constants were also calculated for the same process under the experimental conditions. The comparison between the experimental and theoretical results gave good agreement. In addition, the factors that influence the rate of the regeneration of Ru(bpy)3 2+ from Ru(bpy)3 3+ were also discussed
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Temporal video grounding (TVG) aims to retrieve the time interval of a
language query from an untrimmed video. A significant challenge in TVG is the
low "Semantic Noise Ratio (SNR)", which results in worse performance with lower
SNR. Prior works have addressed this challenge using sophisticated techniques.
In this paper, we propose a no-frills TVG model that consists of two core
modules, namely multi-scale neighboring attention and zoom-in boundary
detection. The multi-scale neighboring attention restricts each video token to
only aggregate visual contexts from its neighbor, enabling the extraction of
the most distinguishing information with multi-scale feature hierarchies from
high-ratio noises. The zoom-in boundary detection then focuses on local-wise
discrimination of the selected top candidates for fine-grained grounding
adjustment. With an end-to-end training strategy, our model achieves
competitive performance on different TVG benchmarks, while also having the
advantage of faster inference speed and lighter model parameters, thanks to its
lightweight architecture
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Recent studies have presented compelling evidence that large language models
(LLMs) can equip embodied agents with the self-driven capability to interact
with the world, which marks an initial step toward versatile robotics. However,
these efforts tend to overlook the visual richness of open worlds, rendering
the entire interactive process akin to "a blindfolded text-based game."
Consequently, LLM-based agents frequently encounter challenges in intuitively
comprehending their surroundings and producing responses that are easy to
understand. In this paper, we propose Steve-Eye, an end-to-end trained large
multimodal model designed to address this limitation. Steve-Eye integrates the
LLM with a visual encoder which enables it to process visual-text inputs and
generate multimodal feedback. In addition, we use a semi-automatic strategy to
collect an extensive dataset comprising 850K open-world instruction pairs,
empowering our model to encompass three essential functions for an agent:
multimodal perception, foundational knowledge base, and skill prediction and
planning. Lastly, we develop three open-world evaluation benchmarks, then carry
out extensive experiments from a wide range of perspectives to validate our
model's capability to strategically act and plan. Codes and datasets will be
released.Comment: 19 pages, 19 figure
Human-Object Interaction Detection:A Quick Survey and Examination of Methods
Human-object interaction detection is a relatively new task in the world of
computer vision and visual semantic information extraction. With the goal of
machines identifying interactions that humans perform on objects, there are
many real-world use cases for the research in this field. To our knowledge,
this is the first general survey of the state-of-the-art and milestone works in
this field. We provide a basic survey of the developments in the field of
human-object interaction detection. Many works in this field use multi-stream
convolutional neural network architectures, which combine features from
multiple sources in the input image. Most commonly these are the humans and
objects in question, as well as the spatial quality of the two. As far as we
are aware, there have not been in-depth studies performed that look into the
performance of each component individually. In order to provide insight to
future researchers, we perform an individualized study that examines the
performance of each component of a multi-stream convolutional neural network
architecture for human-object interaction detection. Specifically, we examine
the HORCNN architecture as it is a foundational work in the field. In addition,
we provide an in-depth look at the HICO-DET dataset, a popular benchmark in the
field of human-object interaction detection. Code and papers can be found at
https://github.com/SHI-Labs/Human-Object-Interaction-Detection.Comment: Published at The 1st International Workshop On Human-Centric
Multimedia Analysis, at ACM Multimedia Conference 202
LLaMA Rider: Spurring Large Language Models to Explore the Open World
Recently, various studies have leveraged Large Language Models (LLMs) to help
decision-making and planning in environments, and try to align the LLMs'
knowledge with the world conditions. Nonetheless, the capacity of LLMs to
continuously acquire environmental knowledge and adapt in an open world remains
uncertain. In this paper, we propose an approach to spur LLMs to explore the
open world, gather experiences, and learn to improve their task-solving
capabilities. In this approach, a multi-round feedback-revision mechanism is
utilized to encourage LLMs to actively select appropriate revision actions
guided by feedback information from the environment. This facilitates
exploration and enhances the model's performance. Besides, we integrate
sub-task relabeling to assist LLMs in maintaining consistency in sub-task
planning and help the model learn the combinatorial nature between tasks,
enabling it to complete a wider range of tasks through training based on the
acquired exploration experiences. By evaluation in Minecraft, an open-ended
sandbox world, we demonstrate that our approach LLaMA-Rider enhances the
efficiency of the LLM in exploring the environment, and effectively improves
the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k
instances of collected data, showing minimal training costs compared to the
baseline using reinforcement learning.Comment: 18 page
Spatially Resolved Investigation and Control of the Bistability in Single Crystals of the [Fe(bbpya) (NCS)2] Spin Crossover Complex
The spin transition in single crystals of the [FeII(bbpya) (NCS)2] (bbpya = N,N-bis(2â2?-bipyrid-6-yl)amine) mononuclear complex was investigated by a combination of X-ray diffraction, Raman spectroscopy, as well as optical and atomic force microscopy (AFM) methods. These studies, performed around 440 K, revealed an extremely abrupt spin transition associated with a structural phase transition from a triclinic (low spin) to a monoclinic (mixed low spin/high spin) structure. Spatially resolved observations of this transition evidenced a clear phase separation associated with heterogeneous nucleation and the formation of a moving macroscopic interface whose velocity reached in some cases 300 ?m sâ1. Using photothermal control it was possible to stabilize biphasic states of the crystal and then acquire AFM images of the phase boundary. A âsawtoothâ like topography was repeatedly observed, which most likely emerges so as to minimize the elastic strain. Remarkably, a fine spatial control of the phase boundary could be also achieved using the AFM probe itself, through probeâsample convective heat exchange
BoneâtoâBone Ligament Preserving Laminoplasty with Ultrasonic Osteotome Assistance for Intraspinal Tumors: A Technical Note and Clinical FollowâUp
Objective Laminectomy has been widely used for intraspinal tumor resection. However, the tilted spinous process and narrow lateral laminae of the thoracic spine along with the hypertrophic ligamentum flavum of the lumbar spine pose certain problems for the laminae removal of the traditional laminectomy. We improved the laminectomy method with ultrasonic osteotome to treat thoracolumbar tumors and assessed its safety and superiority. Methods A retrospective analysis was performed in 86 patients with thoracolumbar (T4âL5) spinal tumors treated by resection, including 44 with the lamina removed using the traditional method and 42 with the lamina removed using the boneâtoâbone ligament preserving (BLP) laminoplasty, which preserves the posterior ligament complex. Age, sex, and tumor size, location, and depth were compared between the two groups. The length of incision and bone window, time to remove the vertebral lamina, and epidural effusion volume were recorded at 2âweeks after surgery in the two groups. Postoperative reexamination by magnetic resonance imaging (MRI) at 2âweeks and 3 months after surgery was compared with preoperative MRI to assess the change in vertebral lamina displacement. Results There were no statistical differences in age, sex, and tumor size, depth, or location between the two groups. The BLP laminectomy did not increase the risk of dural, spinal cord, or nerve injuries. The difference between the incision and tumor length, as well as the difference between the bone window and tumor length in the BLP laminectomy group, were smaller than those in the traditional laminectomy group, and the BLP laminectomy took less time compared to that of the traditional laminectomy (pâ<â0.05). There was no significant difference in the volume of epidural effusion between the two groups at 2âweeks postoperatively, or in the displacement of the returned vertebral plate observed in sagittal and axial positions. The same was true for the displacement at 3âmonths postoperatively in the axial position. However, the sagittal displacement in the BLP laminectomy group was smaller than that in the traditional laminectomy group (pâ<â0.05). Conclusions The BLP laminectomy is safe for the resection of thoracolumbar spinal canal tumors. It is less traumatic and faster, with less displacement of the returned lamina, resulting in a stable repair of the spine
On-Orbit Modulation Transfer Function Estimation Based on the Refined Image Kernel
To overcome the limitations of traditional on-orbit modulation function transfer (MTF) measurement methods that are heavily dependent on natural features, scenery, artificial edges, and point source targets, this paper presents an on-orbit MTF measurement method of remote sensing imager based on the refined image kernel (RIK) acquired directly from remote sensing images. First, the kernel is estimated from some remote sensing sub-images with rich texture details by using an iterative support detection (ISD) algorithm; then, it is refined by central pixel energy concentration (EC) to obtain the RIK. Secondly, the MTF curves are calculated by interpolating RIK and Fourier transform. Finally, the final MTF is the average value of MTFs at Nyquist frequency acquired by each RIK. To demonstrate the feasibility and validity of this method, the MTFs were compared to the result of the ISO12233 edge method with an error of no more than 7%. The relative error of the measured results does not exceed 5% for image signal-to-noise ratio (SNR) above 20dB. The results obtained from the on-orbit MTF measurement using remote sensing images of the Jilin-1 satellite have a maximum error of less than 2% compared with the ISO12233 edge method. These demonstrate that the method proposed in this paper supplies highly accurate and robust results and can successfully increase the efficiency of on-orbit MTF measurement, providing a reference for high-frequency monitoring of satellite on-orbit stability and their optical imaging quality
Accommodating Audio Modality in CLIP for Multimodal Processing
Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and optimization. In this paper, we extend the state-of-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing. Specifically, we apply inter-modal and intra-modal contrastive learning to explore the correlation between audio and other modalities in addition to the inner characteristics of the audio modality. Moreover, we further design an audio type token to dynamically learn different audio information type for different scenarios, as both verbal and nonverbal heterogeneous information is conveyed in general audios. Our proposed CLIP4VLA model is validated in different downstream tasks including video retrieval and video captioning, and achieves the state-of-the-art performance on the benchmark datasets of MSR-VTT, VATEX, and Audiocaps.The corresponding code and checkpoints will be released at https://github.com/ludanruan/CLIP4VLA