18 research outputs found

    The reactions of ruthenium (ii) polypyridyl complexes

    Get PDF
    Ruthenium (II) polypyridine complexes in general have been extensively studied because of their unique redox and photochemical properties. A typical example of such complexes is tris(2,2’-bipyridyl) ruthenium (II). In this study, this complex was synthesized and then characterized using electronic spectroscopy and cyclic voltammetry. It was also shown that the ruthenium concentration could be accurately determined using ICP-MS. It was found that the complex is very stable in various chemical environments. It was observed from spectrophotometric investigations that persulphate and lead dioxide easily oxidize Ru(bpy)3 2+ to Ru(bpy)3 3+ in the presence of heat and H2SO4, respectively. It was also observed that the oxidation between Ru(bpy)3 2+ and cerium (IV) occurred at approximately 3:2 [Ce(IV)]/[Ru(II)] mole ratio. The resultant Ru(bpy)3 3+ solution was unstable in the presence of light and recovery of Ru(bpy)3 2+ occurred gradually. The regeneration of Ru(bpy)3 2+ from Ru(bpy)3 3+ was found to be a multistep process, which appears to involve the formation of an intermediate species. The following reaction model was found to best explain the kinetic data obtained: Ru(bpy)3 2+ + Ce(IV) → Ru(bpy)3 3+ Ru(bpy)3 3+ → Ru(bpy)3 2+ Ru(bpy)3 3+ → Ru* intermediate Ru* intermediate → Ru(bpy)3 2+ Theoretical rate constants were also calculated for the same process under the experimental conditions. The comparison between the experimental and theoretical results gave good agreement. In addition, the factors that influence the rate of the regeneration of Ru(bpy)3 2+ from Ru(bpy)3 3+ were also discussed

    No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

    Full text link
    Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture

    Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

    Full text link
    Recent studies have presented compelling evidence that large language models (LLMs) can equip embodied agents with the self-driven capability to interact with the world, which marks an initial step toward versatile robotics. However, these efforts tend to overlook the visual richness of open worlds, rendering the entire interactive process akin to "a blindfolded text-based game." Consequently, LLM-based agents frequently encounter challenges in intuitively comprehending their surroundings and producing responses that are easy to understand. In this paper, we propose Steve-Eye, an end-to-end trained large multimodal model designed to address this limitation. Steve-Eye integrates the LLM with a visual encoder which enables it to process visual-text inputs and generate multimodal feedback. In addition, we use a semi-automatic strategy to collect an extensive dataset comprising 850K open-world instruction pairs, empowering our model to encompass three essential functions for an agent: multimodal perception, foundational knowledge base, and skill prediction and planning. Lastly, we develop three open-world evaluation benchmarks, then carry out extensive experiments from a wide range of perspectives to validate our model's capability to strategically act and plan. Codes and datasets will be released.Comment: 19 pages, 19 figure

    Human-Object Interaction Detection:A Quick Survey and Examination of Methods

    Full text link
    Human-object interaction detection is a relatively new task in the world of computer vision and visual semantic information extraction. With the goal of machines identifying interactions that humans perform on objects, there are many real-world use cases for the research in this field. To our knowledge, this is the first general survey of the state-of-the-art and milestone works in this field. We provide a basic survey of the developments in the field of human-object interaction detection. Many works in this field use multi-stream convolutional neural network architectures, which combine features from multiple sources in the input image. Most commonly these are the humans and objects in question, as well as the spatial quality of the two. As far as we are aware, there have not been in-depth studies performed that look into the performance of each component individually. In order to provide insight to future researchers, we perform an individualized study that examines the performance of each component of a multi-stream convolutional neural network architecture for human-object interaction detection. Specifically, we examine the HORCNN architecture as it is a foundational work in the field. In addition, we provide an in-depth look at the HICO-DET dataset, a popular benchmark in the field of human-object interaction detection. Code and papers can be found at https://github.com/SHI-Labs/Human-Object-Interaction-Detection.Comment: Published at The 1st International Workshop On Human-Centric Multimedia Analysis, at ACM Multimedia Conference 202

    LLaMA Rider: Spurring Large Language Models to Explore the Open World

    Full text link
    Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.Comment: 18 page

    Spatially Resolved Investigation and Control of the Bistability in Single Crystals of the [Fe(bbpya) (NCS)2] Spin Crossover Complex

    Get PDF
    The spin transition in single crystals of the [FeII(bbpya) (NCS)2] (bbpya = N,N-bis(2–2?-bipyrid-6-yl)amine) mononuclear complex was investigated by a combination of X-ray diffraction, Raman spectroscopy, as well as optical and atomic force microscopy (AFM) methods. These studies, performed around 440 K, revealed an extremely abrupt spin transition associated with a structural phase transition from a triclinic (low spin) to a monoclinic (mixed low spin/high spin) structure. Spatially resolved observations of this transition evidenced a clear phase separation associated with heterogeneous nucleation and the formation of a moving macroscopic interface whose velocity reached in some cases 300 ?m s–1. Using photothermal control it was possible to stabilize biphasic states of the crystal and then acquire AFM images of the phase boundary. A “sawtooth” like topography was repeatedly observed, which most likely emerges so as to minimize the elastic strain. Remarkably, a fine spatial control of the phase boundary could be also achieved using the AFM probe itself, through probe–sample convective heat exchange

    Bone‐to‐Bone Ligament Preserving Laminoplasty with Ultrasonic Osteotome Assistance for Intraspinal Tumors: A Technical Note and Clinical Follow‐Up

    No full text
    Objective Laminectomy has been widely used for intraspinal tumor resection. However, the tilted spinous process and narrow lateral laminae of the thoracic spine along with the hypertrophic ligamentum flavum of the lumbar spine pose certain problems for the laminae removal of the traditional laminectomy. We improved the laminectomy method with ultrasonic osteotome to treat thoracolumbar tumors and assessed its safety and superiority. Methods A retrospective analysis was performed in 86 patients with thoracolumbar (T4–L5) spinal tumors treated by resection, including 44 with the lamina removed using the traditional method and 42 with the lamina removed using the bone‐to‐bone ligament preserving (BLP) laminoplasty, which preserves the posterior ligament complex. Age, sex, and tumor size, location, and depth were compared between the two groups. The length of incision and bone window, time to remove the vertebral lamina, and epidural effusion volume were recorded at 2 weeks after surgery in the two groups. Postoperative reexamination by magnetic resonance imaging (MRI) at 2 weeks and 3 months after surgery was compared with preoperative MRI to assess the change in vertebral lamina displacement. Results There were no statistical differences in age, sex, and tumor size, depth, or location between the two groups. The BLP laminectomy did not increase the risk of dural, spinal cord, or nerve injuries. The difference between the incision and tumor length, as well as the difference between the bone window and tumor length in the BLP laminectomy group, were smaller than those in the traditional laminectomy group, and the BLP laminectomy took less time compared to that of the traditional laminectomy (p < 0.05). There was no significant difference in the volume of epidural effusion between the two groups at 2 weeks postoperatively, or in the displacement of the returned vertebral plate observed in sagittal and axial positions. The same was true for the displacement at 3 months postoperatively in the axial position. However, the sagittal displacement in the BLP laminectomy group was smaller than that in the traditional laminectomy group (p < 0.05). Conclusions The BLP laminectomy is safe for the resection of thoracolumbar spinal canal tumors. It is less traumatic and faster, with less displacement of the returned lamina, resulting in a stable repair of the spine

    On-Orbit Modulation Transfer Function Estimation Based on the Refined Image Kernel

    No full text
    To overcome the limitations of traditional on-orbit modulation function transfer (MTF) measurement methods that are heavily dependent on natural features, scenery, artificial edges, and point source targets, this paper presents an on-orbit MTF measurement method of remote sensing imager based on the refined image kernel (RIK) acquired directly from remote sensing images. First, the kernel is estimated from some remote sensing sub-images with rich texture details by using an iterative support detection (ISD) algorithm; then, it is refined by central pixel energy concentration (EC) to obtain the RIK. Secondly, the MTF curves are calculated by interpolating RIK and Fourier transform. Finally, the final MTF is the average value of MTFs at Nyquist frequency acquired by each RIK. To demonstrate the feasibility and validity of this method, the MTFs were compared to the result of the ISO12233 edge method with an error of no more than 7%. The relative error of the measured results does not exceed 5% for image signal-to-noise ratio (SNR) above 20dB. The results obtained from the on-orbit MTF measurement using remote sensing images of the Jilin-1 satellite have a maximum error of less than 2% compared with the ISO12233 edge method. These demonstrate that the method proposed in this paper supplies highly accurate and robust results and can successfully increase the efficiency of on-orbit MTF measurement, providing a reference for high-frequency monitoring of satellite on-orbit stability and their optical imaging quality

    Accommodating Audio Modality in CLIP for Multimodal Processing

    No full text
    Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and optimization. In this paper, we extend the state-of-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing. Specifically, we apply inter-modal and intra-modal contrastive learning to explore the correlation between audio and other modalities in addition to the inner characteristics of the audio modality. Moreover, we further design an audio type token to dynamically learn different audio information type for different scenarios, as both verbal and nonverbal heterogeneous information is conveyed in general audios. Our proposed CLIP4VLA model is validated in different downstream tasks including video retrieval and video captioning, and achieves the state-of-the-art performance on the benchmark datasets of MSR-VTT, VATEX, and Audiocaps.The corresponding code and checkpoints will be released at https://github.com/ludanruan/CLIP4VLA
    corecore