121 research outputs found

    Speaker Extraction with Co-Speech Gestures Cue

    Full text link
    Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human communication, co-speech gestures that are naturally timed with speech also contribute to speech perception. In this work, we explore the use of co-speech gestures sequence, e.g. hand and body movements, as the speaker cue for speaker extraction, which could be easily obtained from low-resolution video recordings, thus more available than face recordings. We propose two networks using the co-speech gestures cue to perform attentive listening on the target speaker, one that implicitly fuses the co-speech gestures cue in the speaker extraction process, the other performs speech separation first, followed by explicitly using the co-speech gestures cue to associate a separated speech to the target speaker. The experimental results show that the co-speech gestures cue is informative in associating the target speaker, and the quality of the extracted speech shows significant improvements over the unprocessed mixture speech

    Effect of low molecular weight heparin and ulinastatin as a combined therapy on soluble myeloid cell expression and intestinal mucosal function in patients with severe pancreatitis

    Get PDF
    Purpose: To investigate the effect of low molecular weight heparins (LMWHs) and ulinastatin on soluble myeloid cells and intestinal mucosal function (IMF) in patients with severe pancreatitis. Methods: A total of 107 patients with severe pancreatitis were divided into two groups: control group (CG, n = 53) and study group (SG, n = 54). The CG was treated with LMWH while SG was similarly treated but in addition received ulinastatin simultaneously. The following parameters were evaluated in the two groups: treatment effects, IMF, time for various indicators to normalize, vascular endothelial function, complication symptoms, T lymphoid subgroup indicators, inflammatory factors, anti-inflammatory factors, soluble B7-H2, and soluble myeloid cell receptor-1 level changes. Results: After treatment, SG showed lower levels of L/M value, DAO and D-lactic acid than in CG (p < 0.05). Gastrointestinal function, leukocytes, amylase, and body temperature in SG had a shorter time to return to normal than in CG (p < 0.05). The levels of IL-10 in SG were higher than in CG, while sB7-H2, TNF-α, sTREM-1 and IL-1 levels were lower than those in the CG (p < 0.05). After treatment, NO levels in SG were higher, but TXB2, vWF and ET levels were lower than in CG (p < 0.05). In addition, CD4+, CD4+/CD8+ indicators were higher and CD8+ lower in SG than in CG (p < 0.05). Conclusion: Ulinastatin + LMWHs improves IMF in patients suffering from severe pancreatitis, shortens the time for various indicators to normalize, and reduces incidence of complications. However, further clinical trials are required to ascertain this therapeutic strategy for the management of severe pancreatitis. Keywords: Low molecular weight heparin; Ulinastatin; Severe pancreatitis; Soluble myeloid cell expression; Intestinal mucosal function; Treatment effec

    Audio Visual Speaker Localization from EgoCentric Views

    Full text link
    The use of audio and visual modality for speaker localization has been well studied in the literature by exploiting their complementary characteristics. However, most previous works employ the setting of static sensors mounted at fixed positions. Unlike them, in this work, we explore the ego-centric setting, where the heterogeneous sensors are embodied and could be moving with a human to facilitate speaker localization. Compared to the static scenario, the ego-centric setting is more realistic for smart-home applications e.g., a service robot. However, this also brings new challenges such as blurred images, frequent speaker disappearance from the field of view of the wearer, and occlusions. In this paper, we study egocentric audio-visual speaker DOA estimation and deal with the challenges mentioned above. Specifically, we propose a transformer-based audio-visual fusion method to estimate the relative DOA of the speaker to the wearer, and design a training strategy to mitigate the problem of the speaker disappearing from the camera's view. We also develop a new dataset for simulating the out-of-view scenarios, by creating a scene with a camera wearer walking around while a speaker is moving at the same time. The experimental results show that our proposed method offers promising performance in this new dataset in terms of tracking accuracy. Finally, we adapt the proposed method for the multi-speaker scenario. Experiments on EasyCom show the effectiveness of the proposed model for multiple speakers in real scenarios, which achieves state-of-the-art results in the sphere active speaker detection task and the wearer activity prediction task. The simulated dataset and related code are available at https://github.com/KawhiZhao/Egocentric-Audio-Visual-Speaker-Localization

    Efficient secret key reusing attribute-based encryption from lattices

    Get PDF
    Attribute-based encryption (ABE) schemes by lattices are likely to resist quantum attacks, and can be widely applied to many Internet of Thing or cloud scenarios. One of the most attractive feature for ABE is the ability of fine-grained access control which provides an effective way to ensure data security. In this work, we propose an efficient ciphertext policy attribute-based encryption scheme based on hardness assumption of LWE. Being different from other similar schemes, a user\u27s secret key can only be generated once only and it can be used to decrypt ciphertext under different access policies by making combinations of secret key fragments. Specially, we propose a method for binding users\u27 secret keys with their attributes and identities, which solves the collusion attack problem. The security of the scheme is proved to be selective secure under the LWE assumption

    Profile driven dataflow optimisation of mean shift visual tracking

    Get PDF
    Profile guided optimisation is a common technique used by compilers and runtime systems to shorten execution runtimes and to optimise locality aware scheduling and memory access on heterogeneous hardware platforms. Some profiling tools trace the execution of low level code, whilst others are designed for abstract models of computation to provide rich domain-specific context in profiling reports. We have implemented mean shift, a computer vision tracking algorithm, in the RVC-CAL dataflow language and use both dynamic runtime and static dataflow profiling mechanisms to identify and eliminate bottlenecks in our naive initial version. We use these profiling reports to tune the CPU scheduler reducing runtime by 88%, and to optimise our dataflow implementation that reduces runtime by a further 43% - an overall runtime reduction of 93%. We also assess the portability of our mean shift optimisations by trading off CPU runtime against resource utilisation on FPGAs. Applying all dataflow optimisations reduces FPGA design space significantly, requiring fewer slice LUTs and less block memory

    SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

    Full text link
    Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims are paired with evidence-containing scientific tables annotated with labels. Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models, including table-based pretraining models and large language models. All models except GPT-4 achieved performance barely above random guessing. Popular prompting techniques, such as Chain-of-Thought, do not achieve much performance gains on SCITAB. Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning. Our codes and data are publicly available at https://github.com/XinyuanLu00/SciTab.Comment: Accepted at EMNLP 2023 (main conference, long paper

    Modeling Multi-wavelength Pulse Profiles of Millisecond Pulsar PSR B1821-24

    Full text link
    PSR B1821-24 is a solitary millisecond pulsar (MSP) which radiates multi-wavelength pulsed photons. It has complex radio, X-ray and γ\gamma-ray pulse profiles with distinct peak phase-separations that challenge the traditional caustic emission models. Using the single-pole annular gap model with suitable magnetic inclination angle (α=40\alpha=40^\circ) and viewing angle (ζ=75\zeta=75^\circ), we managed to reproduce its pulse profiles of three wavebands. It is found that the middle radio peak is originated from the core gap region at high altitudes, and the other two radio peaks are originated from the annular gap region at relatively low altitudes. Two peaks of both X-ray and γ\gamma-ray wavebands are fundamentally originated from annular gap region, while the γ\gamma-ray emission generated from the core gap region contributes somewhat to the first γ\gamma-ray peak. Precisely reproducing the multi-wavelength pulse profiles of PSR B1821-24 enables us to understand emission regions of distinct wavebands and justify pulsar emission models.Comment: Accepted for publication in Ap
    corecore