172 research outputs found
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Existing works on weakly-supervised audio-visual video parsing adopt hybrid
attention network (HAN) as the multi-modal embedding to capture the cross-modal
context. It embeds the audio and visual modalities with a shared network, where
the cross-attention is performed at the input. However, such an early fusion
method highly entangles the two non-fully correlated modalities and leads to
sub-optimal performance in detecting single-modality events. To deal with this
problem, we propose the messenger-guided mid-fusion transformer to reduce the
uncorrelated cross-modal context in the fusion. The messengers condense the
full cross-modal context into a compact representation to only preserve useful
cross-modal information. Furthermore, due to the fact that microphones capture
audio events from all directions, while cameras only record visual events
within a restricted field of view, there is a more frequent occurrence of
unaligned cross-modal context from audio for visual event predictions. We thus
propose cross-audio prediction consistency to suppress the impact of irrelevant
audio information on visual event prediction. Experiments consistently
illustrate the superior performance of our framework compared to existing
state-of-the-art methods.Comment: WACV 202
Optimal control and bifurcation analysis of a delayed fractional-order SIRS model with general incidence rate and delayed control
A fractional-order generalized SIRS model considering incubation period is established in this paper for the transmission of emerging pathogens. The corresponding Hopf bifurcation is discussed by selecting time delay as the bifurcation parameter. In order to control the occurrence of Hopf bifurcation and achieve better dynamic behaviors, a delayed feedback control is adopted to the model. Further, the delayed fractional-order optimal control problem (DFOCP) is proposed and discussed. The parameters of the proposed model are identified through the measurement data of coronavirus disease 2019 (COVID-19). Based on the results of parameter identification, the corresponding DFOCP with delayed control is numerically solved
Generalized Few-Shot Point Cloud Segmentation Via Geometric Words
Existing fully-supervised point cloud segmentation methods suffer in the
dynamic testing environment with emerging new classes. Few-shot point cloud
segmentation algorithms address this problem by learning to adapt to new
classes at the sacrifice of segmentation accuracy for the base classes, which
severely impedes its practicality. This largely motivates us to present the
first attempt at a more practical paradigm of generalized few-shot point cloud
segmentation, which requires the model to generalize to new categories with
only a few support point clouds and simultaneously retain the capability to
segment base classes. We propose the geometric words to represent geometric
components shared between the base and novel classes, and incorporate them into
a novel geometric-aware semantic representation to facilitate better
generalization to the new classes without forgetting the old ones. Moreover, we
introduce geometric prototypes to guide the segmentation with geometric prior
knowledge. Extensive experiments on S3DIS and ScanNet consistently illustrate
the superior performance of our method over baseline methods. Our code is
available at: https://github.com/Pixie8888/GFS-3DSeg_GWs.Comment: Accepted by ICCV 202
Global dynamics for a class of reaction–diffusion multigroup SIR epidemic models with time fractional-order derivatives
This paper investigates the global dynamics for a class of multigroup SIR epidemic model with time fractional-order derivatives and reaction–diffusion. The fractional order considered in this paper is in (0; 1], which the propagation speed of this process is slower than Brownian motion leading to anomalous subdiffusion. Furthermore, the generalized incidence function is considered so that the data itself can flexibly determine the functional form of incidence rates in practice. Firstly, the existence, nonnegativity, and ultimate boundedness of the solution for the proposed system are studied. Moreover, the basic reproduction number R0 is calculated and shown as a threshold: the disease-free equilibrium point of the proposed system is globally asymptotically stable when R0 ≤ 1, while when R0 > 1, the proposed system is uniformly persistent, and the endemic equilibrium point is globally asymptotically stable. Finally, the theoretical results are verified by numerical simulation
MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models
With the advancement of deep learning technologies, general-purpose large
models such as GPT-4 have demonstrated exceptional capabilities across various
domains. Nevertheless, there remains a demand for high-quality, domain-specific
outputs in areas like healthcare, law, and finance. This paper first evaluates
the existing large models for specialized domains and discusses their
limitations. To cater to the specific needs of certain domains, we introduce
the ``MiChao-HuaFen 1.0'' pre-trained corpus dataset, tailored for the news and
governmental sectors. The dataset, sourced from publicly available internet
data from 2022, underwent multiple rounds of cleansing and processing to ensure
high quality and reliable origins, with provisions for consistent and stable
updates. This dataset not only supports the pre-training of large models for
Chinese vertical domains but also aids in propelling deep learning research and
applications in related fields.Comment: 4 pages,2 figure
Supramolecular Assembly and Stimuli-Responsive Behavior of Multielement Hybrid Copolymers
Toward the organic polymer, hybrid elements can be defined as those beyond C, H, O, and N. Polymers comprising hybrid elements, such as Si, P, B, or metal ions have attracted great attention in the design of high performance or smart materials. Introduction of hybrid elements into a polymeric network may also lead to the formation of new intermolecular interactions, thus promote the self-organization of polymer chains to form controllable structures and morphologies. In this chapter, we introduce some of the recent important development in the design and self-assembly of hybrid amphiphilic copolymers. Specific attention was paid on the hybrid amphiphilic copolymers containing POSS, boronic acid, or boronate functional moieties. We introduce the design, synthesis, self-assembly behavior, and properties of these hybrid amphiphilic copolymers in detail. Also, the advantages and drawbacks of these polymers and their corresponding nanoassemblies are discussed
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
We propose ProtLLM, a versatile cross-modal large language model (LLM) for
both protein-centric and protein-language tasks. ProtLLM features a unique
dynamic protein mounting mechanism, enabling it to handle complex inputs where
the natural language text is interspersed with an arbitrary number of proteins.
Besides, we propose the protein-as-word language modeling approach to train
ProtLLM. By developing a specialized protein vocabulary, we equip the model
with the capability to predict not just natural language but also proteins from
a vast pool of candidates. Additionally, we construct a large-scale interleaved
protein-text dataset, named InterPT, for pre-training. This dataset
comprehensively encompasses both (1) structured data sources like protein
annotations and (2) unstructured data sources like biological research papers,
thereby endowing ProtLLM with crucial knowledge for understanding proteins. We
evaluate ProtLLM on classic supervised protein-centric tasks and explore its
novel protein-language applications. Experimental results demonstrate that
ProtLLM not only achieves superior performance against protein-specialized
baselines on protein-centric tasks but also induces zero-shot and in-context
learning capabilities on protein-language tasks.Comment: https://protllm.github.io/project
OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images
This paper presents OmniCity, a new dataset for omnipotent city understanding
from multi-level and multi-view images. More precisely, the OmniCity contains
multi-view satellite images as well as street-level panorama and mono-view
images, constituting over 100K pixel-wise annotated images that are
well-aligned and collected from 25K geo-locations in New York City. To
alleviate the substantial pixel-wise annotation efforts, we propose an
efficient street-view image annotation pipeline that leverages the existing
label maps of satellite view and the transformation relations between different
views (satellite, panorama, and mono-view). With the new OmniCity dataset, we
provide benchmarks for a variety of tasks including building footprint
extraction, height estimation, and building plane/instance/fine-grained
segmentation. Compared with the existing multi-level and multi-view benchmarks,
OmniCity contains a larger number of images with richer annotation types and
more views, provides more benchmark results of state-of-the-art models, and
introduces a novel task for fine-grained building instance segmentation on
street-level panorama images. Moreover, OmniCity provides new problem settings
for existing tasks, such as cross-view image matching, synthesis, segmentation,
detection, etc., and facilitates the developing of new methods for large-scale
city understanding, reconstruction, and simulation. The OmniCity dataset as
well as the benchmarks will be available at
https://city-super.github.io/omnicity
- …