4 research outputs found

    Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models

    Full text link
    The synergy of language and vision models has given rise to Large Language and Vision Assistant models (LLVAs), designed to engage users in rich conversational experiences intertwined with image-based queries. These comprehensive multimodal models seamlessly integrate vision encoders with Large Language Models (LLMs), expanding their applications in general-purpose language and visual comprehension. The advent of Large Multimodal Models (LMMs) heralds a new era in Artificial Intelligence (AI) assistance, extending the horizons of AI utilization. This paper takes a unique perspective on LMMs, exploring their efficacy in performing image classification tasks using tailored prompts designed for specific datasets. We also investigate the LLVAs zero-shot learning capabilities. Our study includes a benchmarking analysis across four diverse datasets: MNIST, Cats Vs. Dogs, Hymnoptera (Ants Vs. Bees), and an unconventional dataset comprising Pox Vs. Non-Pox skin images. The results of our experiments demonstrate the model's remarkable performance, achieving classification accuracies of 85\%, 100\%, 77\%, and 79\% for the respective datasets without any fine-tuning. To bolster our analysis, we assess the model's performance post fine-tuning for specific tasks. In one instance, fine-tuning is conducted over a dataset comprising images of faces of children with and without autism. Prior to fine-tuning, the model demonstrated a test accuracy of 55\%, which significantly improved to 83\% post fine-tuning. These results, coupled with our prior findings, underscore the transformative potential of LLVAs and their versatile applications in real-world scenarios.Comment: 5 pages,6 figures, 4 tables, Accepted on The International Symposium on Foundation and Large Language Models (FLLM2023

    Can ChatGPT be Your Personal Medical Assistant?

    Full text link
    The advanced large language model (LLM) ChatGPT has shown its potential in different domains and remains unbeaten due to its characteristics compared to other LLMs. This study aims to evaluate the potential of using a fine-tuned ChatGPT model as a personal medical assistant in the Arabic language. To do so, this study uses publicly available online questions and answering datasets in Arabic language. There are almost 430K questions and answers for 20 disease-specific categories. GPT-3.5-turbo model was fine-tuned with a portion of this dataset. The performance of this fine-tuned model was evaluated through automated and human evaluation. The automated evaluations include perplexity, coherence, similarity, and token count. Native Arabic speakers with medical knowledge evaluated the generated text by calculating relevance, accuracy, precision, logic, and originality. The overall result shows that ChatGPT has a bright future in medical assistance.Comment: 5 pages, 7 figures, two tables, Accepted on The International Symposium on Foundation and Large Language Models (FLLM2023

    Fast and Efficient Image Generation Using Variational Autoencoders and K-Nearest Neighbor OveRsampling Approach

    No full text
    Researchers gravitate towards Generative Adversarial Networks (GAN) to create artificial images. However, GANs suffer from convergence issues, mode collapse, and overall complexity in balancing the Nash Equilibrium. Images generated are often distorted, rendering them useless. We propose a combination of Variational Autoencoders (VAEs) and a statistical oversampling method called K-Nearest Neighbor OveRsampling (KNNOR) to create artificial images. This combination of VAE and KNNOR results in more life-like images with reduced distortion. We fine-tune several pre-trained networks on a separate set of real and fake face images to test images generated by our method against images generated by conventional Deep Convolutional GANs (DCGANs). We also compare the combination of VAEs and Synthetic Minority Oversampling Technique (SMOTE) to establish the efficacy of KNNOR against naive oversampling methods. Not only are our methods better able to convince the classifiers that the images generated are authentic, but the models are also half in size of DCGANs. The code is available at GitHub for public use

    Cyber-Physical System Demonstration of an Automated Shuttle-Conveyor-Belt Operation for Inventory Control of Multiple Stockpiles: A Proof of Concept

    No full text
    Smart manufacturing in the so-called Industry 4.0 age pushes the research and development of laboratory-scale proof of concepts before its deployment in pilots and real-size equipment. As such, we present a cyber-physical system (CPS) demonstration in the mining industry field engineered to autonomously manage the handling of solids flowing in a conveyor-belt that drops materials in containers, forming multiple stockpiles per belt. The CPS operates to control multiple stockpiles’ inventories using mixed-integer optimization that minimizes the square deviation of the measured inventory to their targets (heights). Within the sensing-optimizing-actuating (SOA) cycle, the CPS demonstration is performed as follows. First, the sensing (data measurement, data processing, and system evaluation) uses a deep neural network in real-time to assess the level of materials stored in transparent containers. Second, the optimizing (mathematical programming, optimization techniques, and decision-making capabilities) is performed using a flowsheet network formulation called unit-operation-port-state superstructure (UOPSS) that permits a fast solution for the position-idle-time-varying discrete manipulated variables as operational schedules. Third, the actuating (cyber-physical integration) implements a physical actuation solution through an integrated CPS environment. According to the findings of our experimentation, stockpiling process control in a smart manufacturing context has enormous potentials to control multiple stockpiles’ inventory autonomously
    corecore