3,262 research outputs found

    S4E3 : What is AI and what roles does it play in our lives?

    Get PDF
    Artificial Intelligence, or AI, sounds like a futuristic concept from science fiction movies, but is very much with us in the present day. We interact with this emerging technology on a daily basis when we apply for jobs, order groceries, access our bank accounts, apply for a loan and scroll through social media. In Episode 3 of Season 4 of “The Maine Question,” we examine AI, how it improves our lives and how it can cause problems. Penny Rheingans, director of the University of Maine’s School of Computing and Information Science, and Roy Turner, a UMaine associate professor of computer science, help us unravel the fascinating and complicated story of AI

    FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

    Full text link
    Image captioning is a central task in computer vision which has experienced substantial progress following the advent of vision-language pre-training techniques. In this paper, we highlight a frequently overlooked limitation of captioning models that often fail to capture semantically significant elements. This drawback can be traced back to the text-image datasets; while their captions typically offer a general depiction of image content, they frequently omit salient details. To mitigate this limitation, we propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval

    Impact of Conversion on Short and Long-Term Outcome in Laparoscopic Resection of Curable Colorectal Cancer

    Get PDF
    These authors found that conversion in laparoscopic surgery for curable colorectal cancer is associated with a worse peri-operative outcome and worse disease-free survival

    VASR: Visual Analogies of Situation Recognition

    Full text link
    A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes. We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label ~80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly (~86%), but struggle with carefully chosen distractors (~53%, compared to 90% human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/Comment: Accepted to AAAI 2023. Website: https://vasr-dataset.github.io

    Identification and Localization of Myxococcus xanthus Porins and Lipoproteins

    Get PDF
    Myxococcus xanthus DK1622 contains inner (IM) and outer membranes (OM) separated by a peptidoglycan layer. Integral membrane, β-barrel proteins are found exclusively in the OM where they form pores allowing the passage of nutrients, waste products and signals. One porin, Oar, is required for intercellular communication of the C-signal. An oar mutant produces CsgA but is unable to ripple or stimulate csgA mutants to develop suggesting that it is the channel for C-signaling. Six prediction programs were evaluated for their ability to identify β-barrel proteins. No program was reliable unless the predicted proteins were first parsed using Signal P, Lipo P and TMHMM, after which TMBETA-SVM and TMBETADISC-RBF identified β-barrel proteins most accurately. 228 β-barrel proteins were predicted from among 7331 protein coding regions, representing 3.1% of total genes. Sucrose density gradients were used to separate vegetative cell IM and OM fractions, and LC-MS/MS of OM proteins identified 54 β-barrel proteins. Another class of membrane proteins, the lipoproteins, are anchored in the membrane via a lipid moiety at the N-terminus. 44 OM proteins identified by LC-MS/MS were predicted lipoproteins. Lipoproteins are distributed between the IM, OM and ECM according to an N-terminal sorting sequence that varies among species. Sequence analysis revealed conservation of alanine at the +7 position of mature ECM lipoproteins, lysine at the +2 position of IM lipoproteins, and no noticable conservation within the OM lipoproteins. Site directed mutagenesis and immuno transmission electron microscopy showed that alanine at the +7 position is essential for sorting of the lipoprotein FibA into the ECM. FibA appears at normal levels in the ECM even when a +2 lysine is added to the signal sequence. These results suggest that ECM proteins have a unique method of secretion. It is now possible to target lipoproteins to specific IM, OM and ECM locations by manipulating the amino acid sequence near the +1 cysteine processing site

    CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

    Full text link
    Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are unaware of the bigger picture as they operate on cropped text images. In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a gated cross-attention mechanism. This component gradually shifts to the context-enhanced representation, allowing for stable fine-tuning of a pretrained recognizer. We demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP TExt Recognition), on leading text recognition architectures and achieve state-of-the-art results across multiple benchmarks. Furthermore, our analysis highlights improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.Comment: Accepted for publication by ICCV 202

    Diffractometer‐Control Software For Bragg‐Rod Measurements

    Full text link
    We present Generalized Diffractometer Control (gdc), a diffractometer‐control software package developed specifically for high‐precision measurements of Bragg rods; we discuss its features and analyze its performance in data collection. gdc, implemented at several APS beamlines, controls a six‐circle diffractometer in either Eulerian or kappa geometry, yet does not assume a mechanically ideal diffractometer; instead, the measured directions of the diffractometer axes (and the direction of the incident beam) are input parameters. The Labview‐based program features a graphical interface, making it straightforward to find all the commands and operations. Other features include optimized scans along Bragg rods, straightforward background subtraction, and extensive sets of pseudomotors. © 2004 American Institute of PhysicsPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/87660/2/1221_1.pd
    corecore