174 research outputs found

    COMMA: Co-Articulated Multi-Modal Learning

    Full text link
    Pretrained large-scale vision-language models such as CLIP have demonstrated excellent generalizability over a series of downstream tasks. However, they are sensitive to the variation of input text prompts and need a selection of prompt templates to achieve satisfactory performance. Recently, various methods have been proposed to dynamically learn the prompts as the textual inputs to avoid the requirements of laboring hand-crafted prompt engineering in the fine-tuning process. We notice that these methods are suboptimal in two aspects. First, the prompts of the vision and language branches in these methods are usually separated or uni-directionally correlated. Thus, the prompts of both branches are not fully correlated and may not provide enough guidance to align the representations of both branches. Second, it's observed that most previous methods usually achieve better performance on seen classes but cause performance degeneration on unseen classes compared to CLIP. This is because the essential generic knowledge learned in the pretraining stage is partly forgotten in the fine-tuning process. In this paper, we propose Co-Articulated Multi-Modal Learning (COMMA) to handle the above limitations. Especially, our method considers prompts from both branches to generate the prompts to enhance the representation alignment of both branches. Besides, to alleviate forgetting about the essential knowledge, we minimize the feature discrepancy between the learned prompts and the embeddings of hand-crafted prompts in the pre-trained CLIP in the late transformer layers. We evaluate our method across three representative tasks of generalization to novel classes, new target datasets and unseen domain shifts. Experimental results demonstrate the superiority of our method by exhibiting a favorable performance boost upon all tasks with high efficiency.Comment: Accepted to AAAI2024. Code is available at https://github.com/hulianyuyy/COMM

    AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition

    Full text link
    Raw videos have been proven to own considerable feature redundancy where in many cases only a portion of frames can already meet the requirements for accurate recognition. In this paper, we are interested in whether such redundancy can be effectively leveraged to facilitate efficient inference in continuous sign language recognition (CSLR). We propose a novel adaptive model (AdaBrowse) to dynamically select a most informative subsequence from input video sequences by modelling this problem as a sequential decision task. In specific, we first utilize a lightweight network to quickly scan input videos to extract coarse features. Then these features are fed into a policy network to intelligently select a subsequence to process. The corresponding subsequence is finally inferred by a normal CSLR model for sentence prediction. As only a portion of frames are processed in this procedure, the total computations can be considerably saved. Besides temporal redundancy, we are also interested in whether the inherent spatial redundancy can be seamlessly integrated together to achieve further efficiency, i.e., dynamically selecting a lowest input resolution for each sample, whose model is referred to as AdaBrowse+. Extensive experimental results on four large-scale CSLR datasets, i.e., PHOENIX14, PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with 1.44×\times throughput and 2.12×\times fewer FLOPs. Comparisons with other commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness of AdaBrowse. Code is available at \url{https://github.com/hulianyuyy/AdaBrowse}.Comment: ACMMM202

    NCACO-score: An effective main-chain dependent scoring function for structure modeling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Development of effective scoring functions is a critical component to the success of protein structure modeling. Previously, many efforts have been dedicated to the development of scoring functions. Despite these efforts, development of an effective scoring function that can achieve both good accuracy and fast speed still presents a grand challenge.</p> <p>Results</p> <p>Based on a coarse-grained representation of a protein structure by using only four main-chain atoms: N, Cα, C and O, we develop a knowledge-based scoring function, called NCACO-score, that integrates different structural information to rapidly model protein structure from sequence. In testing on the Decoys'R'Us sets, we found that NCACO-score can effectively recognize native conformers from their decoys. Furthermore, we demonstrate that NCACO-score can effectively guide fragment assembly for protein structure prediction, which has achieved a good performance in building the structure models for hard targets from CASP8 in terms of both accuracy and speed.</p> <p>Conclusions</p> <p>Although NCACO-score is developed based on a coarse-grained model, it is able to discriminate native conformers from decoy conformers with high accuracy. NCACO is a very effective scoring function for structure modeling.</p

    A centimeter-scale achromatic hybrid metalens with polarization-insensitivity in the visible

    Full text link
    Metalenses, featuring ultra-compactness and CMOS compatibility, are limited by the compromise between the diameter, numerical aperture, and working waveband. To address this problem, we propose and numerically demonstrate a centimeter-scale metasurface-refractive hybrid metalens working in the band of 440 - 700 nm. Revisiting the general Snell law, we present the phase profile of a chromatic aberration correction metasurface that can apply to a plano-convex refractive lens of an arbitrary surface type. Simulated by our semi-vector method, the designed achromatic hybrid metalens achieves 81% chromatic aberration suppression and polarization insensitivity. Broadband imaging results of the hybrid metalens are further provided, verifying the achromatism of the designed hybrid metalens. It can find applications in camera lenses and other optical systems that need compact, high-performance lenses.Comment: 10 pages, 5 figures

    Efficient wide-bandgap perovskite solar cells with open-circuit voltage deficit below 0.4 V via hole-selective interface engineering

    Get PDF
    Wide-bandgap mixed-halide perovskite solar cells (WBG-PSCs) are promising top cells for efficient tandem photovoltaics to achieve high power conversion efficiency (PCE) at low cost. However, the open-circuit voltage (VOC) of WBG-PSCs is still unsatisfactory as the VOC-deficit is generally larger than 0.45 V. Herein, we report a buried interface engineering strategy that substantially improves the VOC of WBG-PSCs by inserting amphophilic molecular hole-selective materials featuring with a cyanovinyl phosphonic acid (CPA) anchoring group between the perovskite and substrate. The assembly and redistribution of CPA-based amphiphilic molecules at the perovskite-substrate buried interface not only promotes the growth of a low-defect crystalline perovskite thin film, but also suppresses the photo-induced halide phase separation. The energy level alignment between wide-bandgap perovskite and the hole-selective layer is further improved by modulating the substituents on the triphenylamine donor moiety (methoxyls for MPA-CPA, methyls for MePA-CPA, and bare TPA-CPA). Using a 1.68 eV bandgap perovskite, the MePA-CPA-based devices achieved an unprecedentedly high VOC of 1.29 V and PCE of 22.3% under standard AM 1.5 sunlight. The VOC-deficit (&lt;0.40 V) is the lowest value reported for WBG-PSCs. This work not only provides an effective approach to decreasing the VOC-deficit of WBG-PSCs, but also confirms the importance of energy level alignment at the charge-selective layers in PSCs.</p

    The Influence of Neighbourhood Environment on Airbnb: a Geographically Weighed Regression Analysis

    Get PDF
    Sharing accommodation has emerged recently as a new business model in the accommodation sector. Due to the potential gentrification Airbnb might bring to an area, it is critical to understand the spatial patterns of sharing economy and its possible determinants. The neighbourhood environment has proven to be an important factor in the traditional hotel business, and whether it is the same for sharing accommodation is worth investigating. In this study, location data of 29,780 houses/apartments on Airbnb.com in London was collected. Using Ordinal Least Square and Geography Weighed Regression analysis, the spatial distribution features of Airbnb and its relationship with neighbourhood environment in London were explored. The results show that sharing accommodation is mainly located in the city centre and around tourist attractions. Neighbourhood elements such as Water, Vegetation Coverage, Art & Human Landscape, Travel & Transport, University, Nightlife Spot emerged as important factors influencing Airbnb. In addition, the distribution of Airbnb in London is spatially non-stationary, in some areas high Airbnb is associated with higher transportation accessibility, in other areas, high Airbnb is associated with more attractions or nightlife spots, suggesting that the role of different factors varies in different regions, proving Tobler’s first law of geography

    Clinical, radiological, and laboratory features of HIV-negative pulmonary cryptococcosis with regard to serum lateral flow assay

    Get PDF
    IntroductionCryptococcosis is the second most common invasive yeast infection in China. Pulmonary cryptococcosis (PC) is difficult to diagnose due to the lack of specific clinical features and the limitation of diagnostic techniques. Although lateral flow assay was very useful in diagnosing cryptococcal infection, quite a few patients with PC presented negative serum lateral flow assay (sLFA).MethodsWe conducted a retrospective study of HIV-negative patients who were diagnosed with PC in our hospital over the past decade to explore the potential relationship between the clinical profiles and sLFA in PC.ResultsIn total, 112 patients with sLFA tested were enrolled in this study, of which 58.93% were male. The positivity rate of sLFA for PC was 91.07%. The extent of pulmonary lesions was positively correlated with sLFA grade (Spearman r = 0.268, p &lt; 0.01). Solitary nodule (SN) and pneumonia were the most common imaging findings in PC with negative and positive sLFA respectively. Among 65 symptomatic PC patients, 14 presented with fever and had higher hypersensitive C-reactive protein (hsCRP) level and more extensive pulmonary involvement (Mann-Whitney U test, p &lt; 0.05) than those without fever. Symptomatic PC patients were more likely to have positive results of sLFA (Mann-Whitney U test, p = 0.05) compared against asymptomatic ones.DiscussionIn conclusion, negative sLFA cannot exclude PC in patients with a solitary nodule in lung. Positive sLFA is more reliable in diagnosing PC in symptomatic patients with diffused lesions in lung who generally experience a more severe systemic inflammatory reaction

    Kinetic-MHD hybrid simulation of fishbone modes excited by fast ions on the experimental advanced superconducting tokamak (EAST)

    Get PDF
    Kinetic-MagnetoHydroDynamic hybrid simulations are carried out to investigate fishbone modes excited by fast ions on the Experimental Advanced Superconducting Tokamak. The simulations use realistic equilibrium reconstructed from experiment data with the constraint of the q = 1 surface location (q is the safety factor). Anisotropic slowing down distribution is used to model the distribution of the fast ions from neutral beam injection. The resonance condition is used to identify the interaction between the fishbone mode and the fast ions, which shows that the fishbone mode is simultaneously in resonance with the bounce motion of the trapped particles and the transit motion of the passing particles. Both the passing and trapped particles are important in destabilizing the fishbone mode. The simulations show that the mode frequency chirps down as the mode reaches the nonlinear stage, during which there is a substantial flattening of the perpendicular pressure of fast ions, compared with that of the parallel pressure. For passing particles, the resonance remains within the q = 1 surface, while, for trapped particles, the resonant location moves out radially during the nonlinear evolution. In addition, parameter scanning is performed to examine the dependence of the linear frequency and growth rate of fishbones on the pressure and injection velocity of fast ions

    1,7-Dihydr­oxy-2,3,4-trimeth­oxy-9H-xanthen-9-one monohydrate from Halenia elliptica

    Get PDF
    The title compound, C16H14O7·H2O, possesses a planar three-ring skeleton; its carbonyl, one of the two hydroxy and two of the three methoxy O atoms and the water mol­ecule form hydrogen bonds, giving rise to a layer structure
    corecore