23 research outputs found

    Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)

    Full text link
    The advent of foundation models signals a new era in artificial intelligence. The Segment Anything Model (SAM) is the first foundation model for image segmentation. In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups. The increasing requirement for annotated eye-image datasets presents a significant opportunity for SAM to redefine the landscape of data annotation in gaze estimation. Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks. Our results are consistent with studies in other domains, demonstrating that SAM's segmentation effectiveness can be on-par with specialized models depending on the feature, with prompts improving its performance, evidenced by an IoU of 93.34% for pupil segmentation in one dataset. Foundation models like SAM could revolutionize gaze estimation by enabling quick and easy image segmentation, reducing reliance on specialized models and extensive manual annotation.Comment: 14 pages, 8 figures, 1 table, submitted to ETRA 2024: ACM Symposium on Eye Tracking Research & Application

    Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language

    Get PDF
    Falomir Z, Kluth T. Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language. Cognitive Processing. 2018;19(2):265-284.The challenge of describing 3D real scenes is tackled in this paper using qualitative spatial descriptors. A key point to study is which qualitative descriptors to use and how these qualitative descriptors must be organized to produce a suitable cognitive explanation. In order to find answers, a survey test was carried out with human participants which openly described a scene containing some pieces of furniture. The data obtained in this survey are analysed, and taking this into account, the QSn3D computational approach was developed which uses a XBox 360 Kinect to obtain 3D data from a real indoor scene. Object features are computed on these 3D data to identify objects in indoor scenes. The object orientation is computed, and qualitative spatial relations between the objects are extracted. These qualitative spatial relations are the input to a grammar which applies saliency rules obtained from the survey study and generates cognitive natural language descriptions of scenes. Moreover, these qualitative descriptors can be expressed as first-order logical facts in Prolog for further reasoning. Finally, a validation study is carried out to test whether the descriptions provided by QSn3D approach are human readable. The obtained results show that their acceptability is higher than 82%

    A lightweight network based on dual-stream feature fusion and dual-domain attention for white blood cells segmentation

    Get PDF
    IntroductionAccurate white blood cells segmentation from cytopathological images is crucial for evaluating leukemia. However, segmentation is difficult in clinical practice. Given the very large numbers of cytopathological images to be processed, diagnosis becomes cumbersome and time consuming, and diagnostic accuracy is also closely related to experts' experience, fatigue and mood and so on. Besides, fully automatic white blood cells segmentation is challenging for several reasons. There exists cell deformation, blurred cell boundaries, and cell color differences, cells overlapping or adhesion.MethodsThe proposed method improves the feature representation capability of the network while reducing parameters and computational redundancy by utilizing the feature reuse of Ghost module to reconstruct a lightweight backbone network. Additionally, a dual-stream feature fusion network (DFFN) based on the feature pyramid network is designed to enhance detailed information acquisition. Furthermore, a dual-domain attention module (DDAM) is developed to extract global features from both frequency and spatial domains simultaneously, resulting in better cell segmentation performance.ResultsExperimental results on ALL-IDB and BCCD datasets demonstrate that our method outperforms existing instance segmentation networks such as Mask R-CNN, PointRend, MS R-CNN, SOLOv2, and YOLACT with an average precision (AP) of 87.41%, while significantly reducing parameters and computational cost.DiscussionOur method is significantly better than the current state-of-the-art single-stage methods in terms of both the number of parameters and FLOPs, and our method has the best performance among all compared methods. However, the performance of our method is still lower than the two-stage instance segmentation algorithms. in future work, how to design a more lightweight network model while ensuring a good accuracy will become an important problem

    Visualization of Regression Models Using Discriminative Dimensionality Reduction

    Get PDF
    Schulz A, Hammer B. Visualization of Regression Models Using Discriminative Dimensionality Reduction. In: Computer Analysis of Images and Patterns. Lecture Notes in Computer Science. Vol 9257. Cham: Springer Science + Business Media; 2015: 437-449
    corecore