Search CORE

23 research outputs found

Few-shot linguistic grounding of visual attributes and relations using gaussian kernels

Author: Koudouna Daniel
Terzić Kasim
Publication venue: 'Scitepress'
Publication date: 08/02/2021
Field of study

Understanding complex visual scenes is one of fundamental problems in computer vision, but learning in this domain is challenging due to the inherent richness of the visual world and the vast number of possible scene configurations. Current state of the art approaches to scene understanding often employ deep networks which require large and densely annotated datasets. This goes against the seemingly intuitive learning abilities of humans and our ability to generalise from few examples to unseen situations. In this paper, we propose a unified framework for learning visual representation of words denoting attributes such as “blue” and relations such as “left of” based on Gaussian models operating in a simple, unified feature space. The strength of our model is that it only requires a small number of weak annotations and is able to generalize easily to unseen situations such as recognizing object relations in unusual configurations. We demonstrate the effectiveness of our model on the pr edicate detection task. Our model is able to outperform the state of the art on this task in both the normal and zero-shot scenarios, while training on a dataset an order of magnitude smaller. (Less)Publisher PD

University of St. Andrews - Pure

St Andrews Research Repository

PlaNet-ClothPick : effective fabric flattening based on latent dynamic planning

Author: Abdulrahim Kadi Halid
Terzić Kasim
Publication venue: IEEE
Publication date: 09/02/2024
Field of study

Why do Recurrent State Space Models such as PlaNet fail at cloth manipulation tasks? Recent work has attributed this to the blurry prediction of the observation, which makes it difficult to plan directly in the latent space. This paper explores the reasons behind this by applying PlaNet in the pick-and-place fabric-flattening domain. We find that the sharp discontinuity of the transition function on the contour of the fabric makes it difficult to learn an accurate latent dynamic model, causing the MPC planner to produce pick actions slightly outside of the article. By limiting picking space on the cloth mask and training on specially engineered trajectories, our mesh-free PlaNet-ClothPick surpasses visual planning and policy learning methods on principal metrics in simulation, achieving similar performance as state-of-the-art mesh-based planning approaches. Notably, our model exhibits a faster action inference and requires fewer transitional model parameters than the state-of-the-art robotic systems in this domain. Other supplementary materials are available at: https://sites.google.com/view/planet-clothpick

University of St. Andrews - Pure

St Andrews Research Repository

Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects

Author: Kadi Halid A.
Terzić Kasim
Publication venue: 'MDPI AG'
Publication date: 21/02/2023
Field of study

Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe

University of St. Andrews - Pure

St Andrews Research Repository

Visualization as Intermediate Representations (VLAIR) for human activity recognition

Author: Jiang Ai
Nacenta Miguel
Terzić Kasim
Ye Juan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/05/2020
Field of study

Ambient, binary, event-driven sensor data is useful for many human activity recognition applications such as smart homes and ambient-assisted living. These sensors are privacy-preserving, unobtrusive, inexpensive and easy to deploy in scenarios that require detection of simple activities such as going to sleep, and leaving the house. However, classification performance is still a challenge, especially when multiple people share the same space or when different activities take place in the same areas. To improve classification performance we develop what we call a Visualization as Intermediate Representations (VLAIR) approach. The main idea is to re-represent the data as visualizations (generated pixel images) in a similar way as how visualizations are created for humans to analyze and communicate data. Then we can feed these images to a convolutional neural network whose strength resides in extracting effective visual features. We have tested five variants (mappings) of the VLAIR approach and compared them to a collection of classifiers commonly used in classic human activity recognition. The best of the VLAIR approaches outperforms the best baseline, with strong advantage in recognising less frequent activities and distinguishing users and activities in common areas. We conclude the paper with a discussion on why and how VLAIR can be useful in human activity recognition scenarios and beyond.Postprin

University of St. Andrews - Pure

St Andrews Research Repository

Interpretable feature maps for robot attention

Author: du Buf J. M.H.
Terzić Kasim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Attention is crucial for autonomous agents interacting with complex environments. In a real scenario, our expectations drive attention, as we look for crucial objects to complete our understanding of the scene. But most visual attention models to date are designed to drive attention in a bottom-up fashion, without context, and the features they use are not always suitable for driving top-down attention. In this paper, we present an attentional mechanism based on semantically meaningful, interpretable features. We show how to generate a low-level semantic representation of the scene in real time, which can be used to search for objects based on specific features such as colour, shape, orientation, speed, and texture.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

BIMP: A real-time biological model of multi-scale keypoint detection in V1

Author: Bay
Grigorescu
Hansen
Heitger
J.M. Hans du Buf
João M.F. Rodrigues
Kasim Terzić
Lowe
Mikolajczyk
Rodrigues
Serre
Terzić
Tuytelaars
Würtz
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

We present an improved, biologically inspired and multiscale keypoint operator. Models of single- and double-stopped hypercomplex cells in area V1 of the mammalian visual cortex are used to detect stable points of high complexity at multiple scales. Keypoints represent line and edge crossings, junctions and terminations at fine scales, and blobs at coarse scales. They are detected by applying first and second derivatives to responses of complex cells in combination with two inhibition schemes to suppress responses along lines and edges. A number of optimisations make our new algorithm much faster than previous biologically inspired models, achieving real-time performance on modern GPUs and competitive speeds on CPUs. In this paper we show that the keypoints exhibit state-of-the-art repeatability in standardised benchmarks, often yielding best-in-class performance. This makes them interesting both in biological models and as a useful detector in practice. We also show that keypoints can be used as a data selection step, significantly reducing the complexity in state-of-the-art object categorisation. (C) 2014 Elsevier B.V. All rights reserved

Crossref

Sapientia

University of St. Andrews - Pure

A parametric spectral model for texture-based salience

Author: Du Buf J. M.H.
Krishna Sai
Terzić Kasim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/09/2018
Field of study

We present a novel saliency mechanism based on texture. Local texture at each pixel is characterised by the 2D spectrum obtained from oriented Gabor filters. We then apply a parametric model and describe the texture at each pixel by a combination of two 1D Gaussian approximations. This results in a simple model which consists of only four parameters. These four parameters are then used as feature channels and standard Difference-of-Gaussian blob detection is applied in order to detect salient areas in the image, similar to the Itti and Koch model. Finally, a diffusion process is used to sharpen the resulting regions. Evaluation on a large saliency dataset shows a significant improvement of our method over the baseline Itti and Koch model.Postprin

St Andrews Research Repository

Re-identification of individuals from images using spot constellations : a case study in Arctic charr (Salvelinus alpinus)

Author: Debicki Ignacy T.
Kristjánsson Bjarni K.
Leblanc Camille A.
Mittell Elizabeth A.
Morrissey Michael B.
Terzić Kasim
Publication venue: 'The Royal Society'
Publication date: 01/07/2021
Field of study

The long-term monitoring of Arctic charr in lava caves is funded by the Icelandic Research Fund, RANNÍS (research grant nos. 120227 and 162893). E.A.M. was supported by the Icelandic Research Fund, RANNÍS (grant no. 162893) and NERC research grant awarded to M.B.M. (grant no. NE/R011109/1). M.B.M. was supported by a University Research Fellowship from the Royal Society (London). C.A.L. and B.K.K. were supported by Hólar University, Iceland. The Titan Xp GPU used for this research was donated to K.T. by the NVIDIA Corporation.The ability to re-identify individuals is fundamental to the individual-based studies that are required to estimate many important ecological and evolutionary parameters in wild populations. Traditional methods of marking individuals and tracking them through time can be invasive and imperfect, which can affect these estimates and create uncertainties for population management. Here we present a photographic re-identification method that uses spot constellations in images to match specimens through time. Photographs of Arctic charr (Salvelinus alpinus) were used as a case study. Classical computer vision techniques were compared with new deep-learning techniques for masks and spot extraction. We found that a U-Net approach trained on a small set of human-annotated photographs performed substantially better than a baseline feature engineering approach. For matching the spot constellations, two algorithms were adapted, and, depending on whether a fully or semi-automated set-up is preferred, we show how either one or a combination of these algorithms can be implemented. Within our case study, our pipeline both successfully identified unmarked individuals from photographs alone and re-identified individuals that had lost tags, resulting in an approximately 4 our multi-step pipeline involves little human supervision and could be applied to many organisms.Publisher PDFPeer reviewe

Directory of Open Access Journals

PubMed Central

University of St. Andrews - Pure

St Andrews Research Repository

Biologically inspired vision for human-robot interaction

Author: du Buf J. M.Hans
Farrajota Miguel
Krishna Sai
Rodrigues João M.F.
Saleiro Mario
Terzić Kasim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/09/2018
Field of study

Human-robot interaction is an interdisciplinary research area that is becoming more and more relevant as robots start to enter our homes, workplaces, schools, etc. In order to navigate safely among us, robots must be able to understand human behavior, to communicate, and to interpret instructions from humans, either by recognizing their speech or by understanding their body movements and gestures. We present a biologically inspired vision system for human-robot interaction which integrates several components: visual saliency, stereo vision, face and hand detection and gesture recognition. Visual saliency is computed using color, motion and disparity. Both the stereo vision and gesture recognition components are based on keypoints coded by means of cortical V1 simple, complex and end-stopped cells. Hand and face detection is achieved by using a linear SVM classifier. The system was tested on a child-sized robot.Postprin

St Andrews Research Repository

Methods for reducing visual discomfort in stereoscopic 3D: A review

Author: Akeley
Bando
Banks
Basha
Blohm
Carnegie
Chang
Chen
Chen
Chen
Choi
Fry
Harris
Heinzle
Hoffman
Hoffman
Holliman
Hong
Howarth
Hwang
Iatsun
Ideses
Jiang
Jiang
Jung
Jung
Jung
Jung
Jung
Kang
Kasim Terzić
Kim
Kim
Kim
Kim
Kim
Kitrosser
Konrad
Kooi
Koppal
Lambooij
Lambooij
Lang
Le Callet
Lee
Lee
Lee
Lee
Leroy
Li
Li
Li
Li
Lipton
Liu
Love
López
Ma
MacKenzie
MacKenzie
Masia
McIntire
Meesters
Mendiburu
Miles Hansard
Moorthy
Mu
Nojiri
Oh
Oh
Pajak
Park
Park
Park
Park
Percival
Pritch
Qi
Read
Rolland
Sakamoto
Sanftmann
Scher
Schor
Schor
Schor
Schor
Seuntiëns
Shao
Shao
Sheard
Sheedy
Shibata
Shibata
Shiwa
Sohn
Sohn
Sohn
Solimini
Tasli
Templin
Torii
Urvoy
Wang
Wang
Wang
Wang
Ware
Winkler
Wopking
Xia
Xue
Yan
Yano
Yoo
Yun
Zellinger
Zeng
Zeri
Zhang
Zhou
Zitnick
Publication venue: 'Elsevier BV'
Publication date: 11/08/2016
Field of study

This work was supported by the EPSRC Grant EP/M01469X/1, “Geometric Evaluation of Stereoscopic Video”

Crossref

Elsevier - Publisher Connector

Queen Mary Research Online

University of St. Andrews - Pure