172 research outputs found

    Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

    Full text link
    Existing works on weakly-supervised audio-visual video parsing adopt hybrid attention network (HAN) as the multi-modal embedding to capture the cross-modal context. It embeds the audio and visual modalities with a shared network, where the cross-attention is performed at the input. However, such an early fusion method highly entangles the two non-fully correlated modalities and leads to sub-optimal performance in detecting single-modality events. To deal with this problem, we propose the messenger-guided mid-fusion transformer to reduce the uncorrelated cross-modal context in the fusion. The messengers condense the full cross-modal context into a compact representation to only preserve useful cross-modal information. Furthermore, due to the fact that microphones capture audio events from all directions, while cameras only record visual events within a restricted field of view, there is a more frequent occurrence of unaligned cross-modal context from audio for visual event predictions. We thus propose cross-audio prediction consistency to suppress the impact of irrelevant audio information on visual event prediction. Experiments consistently illustrate the superior performance of our framework compared to existing state-of-the-art methods.Comment: WACV 202

    Optimal control and bifurcation analysis of a delayed fractional-order SIRS model with general incidence rate and delayed control

    Get PDF
    A fractional-order generalized SIRS model considering incubation period is established in this paper for the transmission of emerging pathogens. The corresponding Hopf bifurcation is discussed by selecting time delay as the bifurcation parameter. In order to control the occurrence of Hopf bifurcation and achieve better dynamic behaviors, a delayed feedback control is adopted to the model. Further, the delayed fractional-order optimal control problem (DFOCP) is proposed and discussed. The parameters of the proposed model are identified through the measurement data of coronavirus disease 2019 (COVID-19). Based on the results of parameter identification, the corresponding DFOCP with delayed control is numerically solved

    Generalized Few-Shot Point Cloud Segmentation Via Geometric Words

    Full text link
    Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes. Few-shot point cloud segmentation algorithms address this problem by learning to adapt to new classes at the sacrifice of segmentation accuracy for the base classes, which severely impedes its practicality. This largely motivates us to present the first attempt at a more practical paradigm of generalized few-shot point cloud segmentation, which requires the model to generalize to new categories with only a few support point clouds and simultaneously retain the capability to segment base classes. We propose the geometric words to represent geometric components shared between the base and novel classes, and incorporate them into a novel geometric-aware semantic representation to facilitate better generalization to the new classes without forgetting the old ones. Moreover, we introduce geometric prototypes to guide the segmentation with geometric prior knowledge. Extensive experiments on S3DIS and ScanNet consistently illustrate the superior performance of our method over baseline methods. Our code is available at: https://github.com/Pixie8888/GFS-3DSeg_GWs.Comment: Accepted by ICCV 202

    Global dynamics for a class of reaction–diffusion multigroup SIR epidemic models with time fractional-order derivatives

    Get PDF
    This paper investigates the global dynamics for a class of multigroup SIR epidemic model with time fractional-order derivatives and reaction–diffusion. The fractional order considered in this paper is in (0; 1], which the propagation speed of this process is slower than Brownian motion leading to anomalous subdiffusion. Furthermore, the generalized incidence function is considered so that the data itself can flexibly determine the functional form of incidence rates in practice. Firstly, the existence, nonnegativity, and ultimate boundedness of the solution for the proposed system are studied. Moreover, the basic reproduction number R0 is calculated and shown as a threshold: the disease-free equilibrium point of the proposed system is globally asymptotically stable when R0 ≤ 1, while when R0 > 1, the proposed system is uniformly persistent, and the endemic equilibrium point is globally asymptotically stable. Finally, the theoretical results are verified by numerical simulation

    MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models

    Full text link
    With the advancement of deep learning technologies, general-purpose large models such as GPT-4 have demonstrated exceptional capabilities across various domains. Nevertheless, there remains a demand for high-quality, domain-specific outputs in areas like healthcare, law, and finance. This paper first evaluates the existing large models for specialized domains and discusses their limitations. To cater to the specific needs of certain domains, we introduce the ``MiChao-HuaFen 1.0'' pre-trained corpus dataset, tailored for the news and governmental sectors. The dataset, sourced from publicly available internet data from 2022, underwent multiple rounds of cleansing and processing to ensure high quality and reliable origins, with provisions for consistent and stable updates. This dataset not only supports the pre-training of large models for Chinese vertical domains but also aids in propelling deep learning research and applications in related fields.Comment: 4 pages,2 figure

    Supramolecular Assembly and Stimuli-Responsive Behavior of Multielement Hybrid Copolymers

    Get PDF
    Toward the organic polymer, hybrid elements can be defined as those beyond C, H, O, and N. Polymers comprising hybrid elements, such as Si, P, B, or metal ions have attracted great attention in the design of high performance or smart materials. Introduction of hybrid elements into a polymeric network may also lead to the formation of new intermolecular interactions, thus promote the self-organization of polymer chains to form controllable structures and morphologies. In this chapter, we introduce some of the recent important development in the design and self-assembly of hybrid amphiphilic copolymers. Specific attention was paid on the hybrid amphiphilic copolymers containing POSS, boronic acid, or boronate functional moieties. We introduce the design, synthesis, self-assembly behavior, and properties of these hybrid amphiphilic copolymers in detail. Also, the advantages and drawbacks of these polymers and their corresponding nanoassemblies are discussed

    ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

    Full text link
    We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates. Additionally, we construct a large-scale interleaved protein-text dataset, named InterPT, for pre-training. This dataset comprehensively encompasses both (1) structured data sources like protein annotations and (2) unstructured data sources like biological research papers, thereby endowing ProtLLM with crucial knowledge for understanding proteins. We evaluate ProtLLM on classic supervised protein-centric tasks and explore its novel protein-language applications. Experimental results demonstrate that ProtLLM not only achieves superior performance against protein-specialized baselines on protein-centric tasks but also induces zero-shot and in-context learning capabilities on protein-language tasks.Comment: https://protllm.github.io/project

    OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images

    Full text link
    This paper presents OmniCity, a new dataset for omnipotent city understanding from multi-level and multi-view images. More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City. To alleviate the substantial pixel-wise annotation efforts, we propose an efficient street-view image annotation pipeline that leverages the existing label maps of satellite view and the transformation relations between different views (satellite, panorama, and mono-view). With the new OmniCity dataset, we provide benchmarks for a variety of tasks including building footprint extraction, height estimation, and building plane/instance/fine-grained segmentation. Compared with the existing multi-level and multi-view benchmarks, OmniCity contains a larger number of images with richer annotation types and more views, provides more benchmark results of state-of-the-art models, and introduces a novel task for fine-grained building instance segmentation on street-level panorama images. Moreover, OmniCity provides new problem settings for existing tasks, such as cross-view image matching, synthesis, segmentation, detection, etc., and facilitates the developing of new methods for large-scale city understanding, reconstruction, and simulation. The OmniCity dataset as well as the benchmarks will be available at https://city-super.github.io/omnicity
    • …
    corecore