6,475 research outputs found

    Eigenvalue problem for p-Laplacian three-point boundary value problems on time scales

    Get PDF
    AbstractLet T be a time scale such that 0,T∈T, β,γ⩾0 and 0<η<ρ(T). We consider the following p-Laplacian three-point boundary problem on time scales(φp(uΔ(t)))∇+λh(t)f(u(t))=0,t∈(0,T),u(0)−βuΔ(0)=γuΔ(η),uΔ(T)=0, where p>1, λ>0, h∈Cld((0,T),[0,∞)) and f∈C([0,∞),(0,∞)). Some sufficient conditions for the nonexistence and existence of at least one or two positive solutions for the boundary value problem are established. In doing so the usual restriction that f0=limu→0+f(u)φp(u) and f∞=limu→∞f(u)φp(u) exist is removed. An example is also given to illustrate the main results

    Connecting Speech Encoder and Large Language Model for ASR

    Full text link
    The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR), with several pioneering studies attempting to build integrated ASR models by connecting a speech encoder with an LLM. This paper presents a comparative study of three commonly used structures as connectors, including fully connected layers, multi-head cross-attention, and Q-Former. Speech encoders from the Whisper model series as well as LLMs from the Vicuna model series with different model sizes were studied. Experiments were performed on the commonly used LibriSpeech, Common Voice, and GigaSpeech datasets, where the LLMs with Q-Formers demonstrated consistent and considerable word error rate (WER) reductions over LLMs with other connector structures. Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard. Moreover, a novel segment-level Q-Former is proposed to enable LLMs to recognise speech segments with a duration exceeding the limitation of the encoders, which results in 17% relative WER reductions over other connector structures on 90-second-long speech data

    Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

    Full text link
    Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs. To this end, a fine-grained audio-visual joint representation (FAVOR) learning framework for multimodal LLMs is proposed in this paper, which extends a text-based LLM to simultaneously perceive speech and audio events in the audio input stream and images or videos in the visual input stream, at the frame level. To fuse the audio and visual feature streams into joint representations and to align the joint space with the LLM input embedding space, we propose a causal Q-Former structure with a causal attention module to enhance the capture of causal relations of the audio-visual frames across time. An audio-visual evaluation benchmark (AVEB) is also proposed which comprises six representative single-modal tasks with five cross-modal tasks reflecting audio-visual co-reasoning abilities. While achieving competitive single-modal performance on audio, speech and image tasks in AVEB, FAVOR achieved over 20% accuracy improvements on the video question-answering task when fine-grained information or temporal causal reasoning is required. FAVOR, in addition, demonstrated remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other multimodal LLMs. An interactive demo of FAVOR is available at https://github.com/BriansIDP/AudioVisualLLM.git, and the training code and model checkpoints will be released soon

    Quantification of the performance of iterative and non-iterative computational methods of locating partial discharges using RF measurement techniques

    Get PDF
    Partial discharge (PD) is an electrical discharge phenomenon that occurs when the insulation materialof high voltage equipment is subjected to high electric field stress. Its occurrence can be an indication ofincipient failure within power equipment such as power transformers, underground transmission cableor switchgear. Radio frequency measurement methods can be used to detect and locate discharge sourcesby measuring the propagated electromagnetic wave arising as a result of ionic charge acceleration. Anarray of at least four receiving antennas may be employed to detect any radiated discharge signals, thenthe three dimensional position of the discharge source can be calculated using different algorithms. These algorithms fall into two categories; iterative or non-iterative. This paper evaluates, through simulation, the location performance of an iterative method (the standardleast squares method) and a non-iterative method (the Bancroft algorithm). Simulations were carried outusing (i) a "Y" shaped antenna array and (ii) a square shaped antenna array, each consisting of a four-antennas. The results show that PD location accuracy is influenced by the algorithm's error bound, thenumber of iterations and the initial values for the iterative algorithms, as well as the antenna arrangement for both the non-iterative and iterative algorithms. Furthermore, this research proposes a novel approachfor selecting adequate error bounds and number of iterations using results of the non-iterative method, thus solving some of the iterative method dependencies

    Hybrid acoustic metamaterial as super absorber for broadband low-frequency sound

    Get PDF
    A hybrid acoustic metamaterial is proposed as a new class of sound absorber, which exhibits superior broadband low-frequency sound absorption as well as excellent mechanical stiffness/strength. Based on the honeycomb-corrugation hybrid core (H-C hybrid core), we introduce perforations on both top facesheet and corrugation, forming perforated honeycomb-corrugation hybrid (PHCH) to gain super broadband low-frequency sound absorption. Applying the theory of micro-perforated panel (MPP), we establish a theoretical method to calculate the sound absorption coefficient of this new kind of metamaterial. Perfect sound absorption is found at just a few hundreds hertz with two-octave 0.5 absorption bandwidth. To verify this model, a finite element model is developed to calculate the absorption coefficient and analyze the viscous-thermal energy dissipation. It is found that viscous energy dissipation at perforation regions dominates the total energy consumed. This new kind of acoustic metamaterials show promising engineering applications, which can serve as multiple functional materials with extraordinary low-frequency sound absorption, excellent stiffness/strength and impact energy absorption

    UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer

    Full text link
    Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model
    corecore