25 research outputs found
Multi-view Traffic Intersection Dataset (MTID)
MTID er et datasæt med trafikovervågning. Det indeholder optagelser af ét kryds, der er optaget fra to forskellige perspektiver i samme tidsrum. Trafikanterne er blevet grundigt annoteret til pixel-præcision.The Multi-view Traffic Intersection Dataset (MTID) is a traffic surveillance dataset containing footage of the same intersection from multiple points of view. Traffic in all views has been carefully annotated to pixel-level accuracy
Spatially Variant Super-Resolution (SVSR) benchmarking dataset
The Spatially Variant Super-Resolution (SVSR) benchmarking dataset contains 1119 low-resolution images that are degraded by complex noise of varying intensity and type and their corresponding noise free X2 and X4 high-resolution counterparts, for evaluation of the robustness of real-world super-resolution methods. Additionally, the dataset is also suitable for evaluation of denoisers
LISA Traffic Sign Dataset
Verdens største datasæt med billeder af amerikanske trafikskilte.The World's largest dataset containing pictures of US traffic signs
AAU RainSnow Traffic Surveillance Dataset
Rain, Snow, and Bad Weather in Traffic Surveillance Computed vision-based image analysis lays the foundation for automatic traffic surveillance. This works well in daylight when the road users are clearly visible to the camera but often struggles when the visibility of the scene is impaired by insufficient lighting or bad weather conditions such as rain, snow, haze, and fog. In this dataset, we have focused on collecting traffic surveillance video in rainfall and snowfall, capturing 22 five-minute videos from seven different traffic intersections. The illumination of the scenes vary from broad daylight to twilight and night. The scenes feature glare from headlights of cars, reflections from puddles, and blur from raindrops at the camera lens. We have collected the data using a conventional RGB colour camera and a thermal infrared camera. If combined, these modalities should enable robust detection and classification of road users even under challenging weather conditions. 100 frames have been selected randomly from each five-minute sequence and any road user in these frames is annotated on a per-pixel, instance-level with corresponding category label. In total, 2,200 frames are annotated, containing 13,297 objects.Rain, Snow, and Bad Weather in Traffic Surveillance Computed vision-based image analysis lays the foundation for automatic traffic surveillance. This works well in daylight when the road users are clearly visible to the camera but often struggles when the visibility of the scene is impaired by insufficient lighting or bad weather conditions such as rain, snow, haze, and fog. In this dataset, we have focused on collecting traffic surveillance video in rainfall and snowfall, capturing 22 five-minute videos from seven different traffic intersections. The illumination of the scenes vary from broad daylight to twilight and night. The scenes feature glare from headlights of cars, reflections from puddles, and blur from raindrops at the camera lens. We have collected the data using a conventional RGB colour camera and a thermal infrared camera. If combined, these modalities should enable robust detection and classification of road users even under challenging weather conditions. 100 frames have been selected randomly from each five-minute sequence and any road user in these frames is annotated on a per-pixel, instance-level with corresponding category label. In total, 2,200 frames are annotated, containing 13,297 objects
Benchmark movement data set for trust assessment in human robot collaboration
In the Drapebot project, a worker is supposed to collaborate with a large industrial manipulator in two tasks: collaborative transport of carbon fibre patches and collaborative draping. To realize data-driven trust assessement, the worker is equipped with a motion tracking suit and the body movement data is labeled with the trust scores from a standard Trust questionnaire
Tell me what you see: An exploratory investigation of visual mental imagery evoked by music
The link between musical structure and evoked visual mental imagery (VMI), that is, seeing in the absence of a corresponding sensory stimulus, has yet to be thoroughly investigated. We explored this link by manipulating the characteristics of four pieces of music for synthesizer, guitars, and percussion (songs). Two original songs were selected on the basis of a pilot study, and two were new, specially composed to combine the musical and acoustical characteristics of the originals. A total of 135 participants were randomly assigned to one of the four groups who listened to one song each; 73% of participants reported experiencing VMI. There were similarities between participants’ descriptions of the mental imagery evoked by each song and clear differences between them. A combination of coding and content analysis produced 10 categories: Nature, Places and settings, Objects, Time, Movements and events, Color(s), Humans, Affects, Literal sound, and Film. Regardless of whether or not they had reported experiencing VMI, participants then carried out a card-sorting task in which they selected the terms they thought best described a scene or setting appropriate to the music they had heard and rated emotional dimensions. The results confirmed those of the content analysis. Taken together, participants’ ratings, descriptions of VMI, and selection of terms in the card-sorting task confirmed that new songs combining the characteristics of original songs evoke the elements of VMI associated with the latter. The findings are important for the understanding of the musical and acoustical characteristics that may influence our experiences of music, including VMI
BrackishMOT
This is the first underwater MOT dataset captured in turbid waters. It contains 98 sequences of fish, crabs, jellyfish, and more
Tell me what you see: An exploratory investigation of visual mental imagery evoked by music
The link between musical structure and evoked visual mental imagery (VMI), that is, seeing in the absence of a corresponding sensory stimulus, has yet to be thoroughly investigated. We explored this link by manipulating the characteristics of four pieces of music for synthesizer, guitars, and percussion (songs). Two original songs were selected on the basis of a pilot study, and two were new, specially composed to combine the musical and acoustical characteristics of the originals. A total of 135 participants were randomly assigned to one of the four groups who listened to one song each; 73% of participants reported experiencing VMI. There were similarities between participants’ descriptions of the mental imagery evoked by each song and clear differences between them. A combination of coding and content analysis produced 10 categories: Nature, Places and settings, Objects, Time, Movements and events, Color(s), Humans, Affects, Literal sound, and Film. Regardless of whether or not they had reported experiencing VMI, participants then carried out a card-sorting task in which they selected the terms they thought best described a scene or setting appropriate to the music they had heard and rated emotional dimensions. The results confirmed those of the content analysis. Taken together, participants’ ratings, descriptions of VMI, and selection of terms in the card-sorting task confirmed that new songs combining the characteristics of original songs evoke the elements of VMI associated with the latter. The findings are important for the understanding of the musical and acoustical characteristics that may influence our experiences of music, including VMI
AAU VAP Trimodal People Segmentation Dataset
Context How do you design a computer vision algorithm that is able to detect and segment people when they are captured by a visible light camera, a thermal infrared camera, and a depth sensor? And how do you fuse the three inherently different data streams such that you can reliably transfer features from one modality to another? Feel free to download our dataset and try it out yourselves! Content The dataset features a total of 5724 annotated frames divided in three indoor scenes. Activity in scene 1 and 3 is using the full depth range of the Kinect for XBOX 360 sensor whereas activity in scene 2 is constrained to a depth range of plus/minus 0.250 m in order to suppress the parallax between the two physical sensors. Scene 1 and 2 are situated in a closed meeting room with little natural light to disturb the depth sensing, whereas scene 3 is situated in an area with wide windows and a substantial amount of sunlight. For each scene, a total of three persons are interacting, reading, walking, sitting, reading, etc. Every person is annotated with a unique ID in the scene on a pixel-level in the RGB modality. For the thermal and depth modalities, annotations are transferred from the RGB images using a registration algorithm found in registrator.cpp. We have used our AAU VAP Multimodal Pixel Annotator to create the ground-truth, pixel-based masks for all three modalities
