6 research outputs found

    RE-Tagger: A light-weight Real-Estate Image Classifier

    Full text link
    Real-estate image tagging is one of the essential use-cases to save efforts involved in manual annotation and enhance the user experience. This paper proposes an end-to-end pipeline (referred to as RE-Tagger) for the real-estate image classification problem. We present a two-stage transfer learning approach using custom InceptionV3 architecture to classify images into different categories (i.e., bedroom, bathroom, kitchen, balcony, hall, and others). Finally, we released the application as REST API hosted as a web application running on 2 cores machine with 2 GB RAM. The demo video is available here.Comment: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (DEMO TRACK

    Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

    Full text link
    Visual Question Answering is a challenging task, as it requires seamless interaction between perceptual, linguistic, and background knowledge systems. While the recent progress of visual and natural language models like BLIP has led to improved performance on this task, we lack understanding of the ability of such models to perform on different kinds of questions and reasoning types. As our initial analysis of BLIP-family models revealed difficulty with answering fine-detail questions, we investigate the following question: Can visual cropping be employed to improve the performance of state-of-the-art visual question answering models on fine-detail questions? Given the recent success of the BLIP-family models, we study a zero-shot and a fine-tuned BLIP model. We define three controlled subsets of the popular VQA-v2 benchmark to measure whether cropping can help model performance. Besides human cropping, we devise two automatic cropping strategies based on multi-modal embedding by CLIP and BLIP visual QA model gradients. Our experiments demonstrate that the performance of BLIP model variants can be significantly improved through human cropping, and automatic cropping methods can produce comparable benefits. A deeper dive into our findings indicates that the performance enhancement is more pronounced in zero-shot models than in fine-tuned models and more salient with smaller bounding boxes than larger ones. We perform case studies to connect quantitative differences with qualitative observations across question types and datasets. Finally, we see that the cropping enhancement is robust, as we gain an improvement of 4.59% (absolute) in the general VQA-random task by simply inputting a concatenation of the original and gradient-based cropped images. We make our code available to facilitate further innovation on visual cropping methods for question answering.Comment: 16 pages, 5 figures, 7 table

    Privacy Aware Question-Answering System for Online Mental Health Risk Assessment

    Full text link
    Social media platforms have enabled individuals suffering from mental illnesses to share their lived experiences and find the online support necessary to cope. However, many users fail to receive genuine clinical support, thus exacerbating their symptoms. Screening users based on what they post online can aid providers in administering targeted healthcare and minimize false positives. Pre-trained Language Models (LMs) can assess users' social media data and classify them in terms of their mental health risk. We propose a Question-Answering (QA) approach to assess mental health risk using the Unified-QA model on two large mental health datasets. To protect user data, we extend Unified-QA by anonymizing the model training process using differential privacy. Our results demonstrate the effectiveness of modeling risk assessment as a QA task, specifically for mental health use cases. Furthermore, the model's performance decreases by less than 1% with the inclusion of differential privacy. The proposed system's performance is indicative of a promising research direction that will lead to the development of privacy-aware diagnostic systems.Comment: 5 pages, 2 figures, 3 table

    DCNN-GA: A Deep Neural Net Architecture for Navigation of UAV in Indoor Environment

    No full text
    The applications of unmanned aerial vehicles (UAVs) in military, intelligent transportation, agriculture, rescue operations, natural environment mapping, and many other allied domains has increased exponentially during the past few years. Some of the use cases of their applications range from aerial surveillance, data retrieval to their use in real-Time communicative networks. Though UAVs were traditionally used only outdoors, many of its indoor applications like for rescue operations, inventory tracking in warehouses, etc., have recently emerged and these use cases are being actively explored. One of the major challenges for indoor drone applications is navigation and obstacle avoidance. Due to indoor operations, the global positioning system fails in accurate localization and navigation. To address this issue, we introduce a scheme that facilitates the autonomous navigation of UAVs (which have an onboard camera) in the indoor corridors of a building using deep-neural-networks-based processing of images. For a deep neural network, the selection of a good combination of hyperparameters for a better prediction is a complicated task. In this article, the hyperparameters tuning of a convolutional neural network is achieved by using genetic algorithms. The proposed architecture (DCNN-GA) is compared with state-of-The-Art ImageNet models. The experimental results show the minimum loss and high performance of the proposed algorithm

    Federated Learning Meets Human Emotions: A Decentralized Framework for Human-Computer Interaction for IoT Applications

    No full text
    As stated by Spock, 'change is the essential process of all existence,' which is reflected in everyday applications in our daily lives. We, as humans, just need to find a way to make the best use of the current technological advances. The pandemic has managed to exploit our deepest vulnerabilities and insecurities. We need to cope with a lot of things, just to be comfortable in the new normal. Hence, we can rely on technology, the greatest asset developed by humans. In this article, we discuss how we can enhance the work environment in offices post-pandemic. We combine federated learning with emotion analysis to create a state-of-the-art, simple, secure, and efficient emotion monitoring system. We combine facial expression and speech signals to find out macroexpressions and create an emotion index that is monitored to find the mental health of the user. Federated learning enables users to locally train the model without compromising his/her privacy. In place of sending data to the centralized server, the proposed scheme sends only model weights that are combined at the server to make a better global model, which is further pushed back to the users. This model is then trained interorganizational as it does not violate the privacy or data sharing to achieve optimal results. The data collected from users are monitored to analyze the mental health and presented with counseling solutions during low times. Technology is a panacea that has enabled us to survive in this pandemic, and by using our solution to improve work culture and the environment in post-pandemic times

    Federated Learning and Autonomous UAVs for Hazardous Zone Detection and AQI Prediction in IoT Environment

    No full text
    Air pollution monitoring, finding the hazardous zone, and future air quality predictions have recently become a significant issue for many researchers. With the adverse effect of low air quality on human health, it has become necessary for predicting the air quality index (AQI) accurately and on time. The unmanned aerial vehicle (UAV) can collect air quality data with high spatial and temporal resolutions. Using a fleet of UAVs could be considered a good option. In the proposed work, we implement a distributed federated learning (FL) algorithm within a UAV swarm that collects air quality data using built-in sensors. A scheme for finding the area with the highest AQI value is proposed using swarm intelligence. The collected data are then fed to a CNN-LSTM model to predict the AQI. The trained local model is sent to the central server, and the server aggregates the received models from UAVs in the swarm. A global model is created and is transmitted to the UAV swarm again in the next iteration. The proposed architecture is compared with other time-series models. The results show that the proposed model predicts AQI daily with a minimal error rate on a real-time data set from Delhi
    corecore