14 research outputs found
Kosmos-2.5: A Multimodal Literate Model
We present Kosmos-2.5, a multimodal literate model for machine reading of
text-intensive images. Pre-trained on large-scale text-intensive images,
Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1)
generating spatially-aware text blocks, where each block of text is assigned
its spatial coordinates within the image, and (2) producing structured text
output that captures styles and structures into the markdown format. This
unified multimodal literate capability is achieved through a shared Transformer
architecture, task-specific prompts, and flexible text representations. We
evaluate Kosmos-2.5 on end-to-end document-level text recognition and
image-to-markdown text generation. Furthermore, the model can be readily
adapted for any text-intensive image understanding task with different prompts
through supervised fine-tuning, making it a general-purpose tool for real-world
applications involving text-rich images. This work also paves the way for the
future scaling of multimodal large language models
Recommended from our members
From Hydrodynamics to Jet Quenching, Coalescence, and Hadron Cascade: A Coupled Approach to Solving the R_{AA}⊗v_{2} Puzzle.
Hydrodynamics and jet quenching are responsible for the elliptic flow v_{2} and suppression of large transverse momentum (p_{T}) hadrons, respectively, two of the most important phenomena leading to the discovery of a strongly coupled quark-gluon plasma in high-energy heavy-ion collisions. A consistent description of the hadron suppression factor R_{AA} and v_{2}, especially at intermediate p_{T}, however, remains a challenge. We solve this long-standing R_{AA}⊗v_{2} puzzle by including quark coalescence for hadronization and final state hadron cascade in the coupled linear Boltzmann transport-hydro model that combines concurrent jet transport and hydrodynamic evolution of the bulk medium. We illustrate that quark coalescence and hadron cascade, two keys to solving the puzzle, also lead to a splitting of v_{2} for pions, kaons, and protons in the intermediate p_{T} region. We demonstrate for the first time that experimental data on R_{AA}, v_{2}, and their hadron flavor dependence from low to intermediate and high p_{T} in high-energy heavy-ion collisions can be understood within this coupled framework
Identification of Novel Inhibitors against Coactivator Associated Arginine Methyltransferase 1 Based on Virtual Screening and Biological Assays
Overexpression of coactivator associated arginine methyltransferase 1 (CARM1), a protein arginine N-methyltransferase (PRMT) family enzyme, is associated with various diseases including cancers. Consequently, the development of small-molecule inhibitors targeting PRMTs has significant value for both research and therapeutic purposes. In this study, together with structure-based virtual screening with biochemical assays, two compounds DC_C11 and DC_C66 were identified as novel inhibitors of CARM1. Cellular studies revealed that the two inhibitors are cell membrane permeable and effectively blocked proliferation of cancer cells including HELA, K562, and MCF7. We further predicted the binding mode of these inhibitors through molecular docking analysis, which indicated that the inhibitors competitively occupied the binding site of the substrate and destroyed the protein-protein interactions between CARM1 and its substrates. Overall, this study has shed light on the development of small-molecule CARM1 inhibitors with novel scaffolds
A representative-based framework for parsing and summarizing events in surveillance videos
This paper presents a novel representative-based framework for parsing and summarizing events in long surveillance videos. The proposed framework first extracts object blob sequences and utilizes them to represent events in a surveillance video. Then, a sequence filtering strategy is introduced which detects and eliminates noisy blob sequences based on their spatial and temporal characteristics. After clustering the blob sequences into different event types, we further introduce a representative-based model which integrates location, size, and appearance cues to select a representative blob sequence from each cluster, and creates a snapshot image for each representative blob sequence. Based on the blob-sequence clustering and representative-sequence selection results, two schemes are further proposed to summarize contents of the input surveillance video: (1) type-based scheme which shows snapshot images to users and creates a summary video for a specific event cluster according to user-selected snapshot image; (2) representative-based scheme which creates a summary video only with the extracted representative blob sequences. Experimental results show that our approach can create more effective and well-organized summarization results compared with the state-of-the-art methods
FRIH: Fine-grained Region-aware Image Harmonization
Image harmonization aims to generate a more realistic appearance of
foreground and background for a composite image. Existing methods perform the
same harmonization process for the whole foreground. However, the implanted
foreground always contains different appearance patterns. All the existing
solutions ignore the difference of each color block and losing some specific
details. Therefore, we propose a novel global-local two stages framework for
Fine-grained Region-aware Image Harmonization (FRIH), which is trained
end-to-end. In the first stage, the whole input foreground mask is used to make
a global coarse-grained harmonization. In the second stage, we adaptively
cluster the input foreground mask into several submasks by the corresponding
pixel RGB values in the composite image. Each submask and the coarsely adjusted
image are concatenated respectively and fed into a lightweight cascaded module,
adjusting the global harmonization performance according to the region-aware
local feature. Moreover, we further designed a fusion prediction module by
fusing features from all the cascaded decoder layers together to generate the
final result, which could utilize the different degrees of harmonization
results comprehensively. Without bells and whistles, our FRIH algorithm
achieves the best performance on iHarmony4 dataset (PSNR is 38.19 dB) with a
lightweight model. The parameters for our model are only 11.98 M, far below the
existing methods
Identification of Selective, Cell Active Inhibitors of Protein Arginine Methyltransferase 5 through Structure-Based Virtual Screening and Biological Assays
Protein
arginine methyltransferase 5 (PRMT5), a type II PRMT enzyme,
is reported as an important therapeutic target in leukemia and lymphoma.
In the present study, based on the combination of virtual screening
and biochemical validations, we discovered a series of small-molecule
inhibitors targeting PRMT5. Among those, DC_Y134 exhibited the most
potent activity with IC<sub>50</sub> value of 1.7 μM and displayed
good selectivity against other methyltransferases. Further treatment
with DC_Y134 inhibited the proliferation of several hematological
malignancy cell lines by causing cell cycle arrest and apoptosis.
Western blot assays indicated that DC_Y134 reduced the cellular symmetrically
dimethylated levels. In addition, we analyzed the binding mode of
DC_Y134 through molecular docking, which revealed that DC_Y134 occupies
the binding site of substrate arginine and explained the selectivity
of this inhibitor. Taken together, compound DC_Y134 could be used
to elucidate the biological roles of PRMT5 and serve as a lead compound
for treatment of hematologic malignancies
Manipulation of the Electronic Transport Properties of Charge-Transfer Oxide Thin Films of NdNi O3 Using Static and Electric-Field-Controllable Dynamic Lattice Strain
Using perovskite-type charge-transfer oxide thin films of NdNiO3 (NNO) as a model system, we demonstrate that the effects of lattice strain on the electronic transport properties can be more comprehensively understood by growing NNO films on a number of (001)-, (011)-, and (111)-cut single-crystal substrates with different lattice mismatches including the relaxor-based 0.31Pb(In1/2Nb1/2)O3-0.35Pb(Mg1/3Nb2/3)O3-0.34PbTiO3 (PIN-PMN-PT) and 0.71Pb(Mg1/3Nb2/3)O3-0.29PbTiO3 (PMN-PT) ferroelectric (FE) single crystals. In addition to the static lattice strains from conventional substrates (e.g., SrTiO3, LaAlO3), we in situ impose in-plane compressive or tensile strains to NNO films using FE/ferroelastic domain switching of FE substrates. An unprecedented electric-field-induced large out-of-plane compressive strain (-0.53%) and in-plane tensile strain (+0.81%) are achieved in the 25-nm NNO film by switching the polarization direction of the PIN-PMN-PT substrate at T = 200 K. This value is approximately 7.4 to 45 times larger than those previously reported in FE substrate-based heterostructures. As a result of the induced large lattice strain, the resistivity of the NNO film is modulated up to 125%. Further, taking advantage of the linear piezoelectric strain, a quantitative relationship between the resistivity and the in-plane strain of the NNO film is established, with a gauge fact of (Δρ/ρ)/δϵxx∼40.8. Moreover, using the domain-engineered FE/ferroelastic switching of PMN-PT substrates, multiple stable resistance states with good retention and endurance properties can be obtained at room temperature and the metal-to-insulator transition temperature (T MI ) of NNO films can be modified by precisely controlling the electric-field-pulse sequence as a result of the nonvolatile remnant strain transferring from the PMN-PT to the NNO film. Our results demonstrate that the electric-field-tunable ferroelastic/piezoelectric strain approach can be utilized to gain deeper insight into the intrinsic strain-property relationship of perovskite nickelate films and provide a simple and energy efficient way to construct multistate resistive memories