1,151 research outputs found
Multiband Probabilistic Cataloging: A Joint Fitting Approach to Point Source Detection and Deblending
Probabilistic cataloging (PCAT) outperforms traditional cataloging methods on single-band optical data in crowded fields. We extend our work to multiple bands, achieving greater sensitivity (~0.4 mag) and greater speed (500Ă) compared to previous single-band results. We demonstrate the effectiveness of multiband PCAT on mock data, in terms of both recovering accurate posteriors in the catalog space and directly deblending sources. When applied to Sloan Digital Sky Survey (SDSS) observations of M2, taking Hubble Space Telescope data as truth, our joint fit on r- and i-band data goes ~0.4 mag deeper than single-band probabilistic cataloging and has a false discovery rate less than 20% for F606W †20. Compared to DAOPHOT, the two-band SDSS catalog fit goes nearly 1.5 mag deeper using the same data and maintains a lower false discovery rate down to F606W ~ 20.5. Given recent improvements in computational speed, multiband PCAT shows promise in application to large-scale surveys and is a plausible framework for joint analysis of multi-instrument observational data. https://github.com/RichardFeder/multiband_pcat
Feature Representation for Online Signature Verification
Biometrics systems have been used in a wide range of applications and have
improved people authentication. Signature verification is one of the most
common biometric methods with techniques that employ various specifications of
a signature. Recently, deep learning has achieved great success in many fields,
such as image, sounds and text processing. In this paper, deep learning method
has been used for feature extraction and feature selection.Comment: 10 pages, 10 figures, Submitted to IEEE Transactions on Information
Forensics and Securit
Efficient Min-cost Flow Tracking with Bounded Memory and Computation
This thesis is a contribution to solving multi-target tracking in an optimal fashion for real-time demanding computer vision applications. We introduce a challenging benchmark, recorded with our autonomous driving platform AnnieWAY. Three main challenges of tracking are addressed: Solving the data association (min-cost flow) problem faster than standard solvers, extending this approach to an online setting, and making it real-time capable by a tight approximation of the optimal solution
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs
Counterfactual examples have proven to be valuable in the field of natural
language processing (NLP) for both evaluating and improving the robustness of
language models to spurious correlations in datasets. Despite their
demonstrated utility for NLP, multimodal counterfactual examples have been
relatively unexplored due to the difficulty of creating paired image-text data
with minimal counterfactual changes. To address this challenge, we introduce a
scalable framework for automatic generation of counterfactual examples using
text-to-image diffusion models. We use our framework to create
COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and
text captions based on the MS-COCO dataset. We validate the quality of
COCO-Counterfactuals through human evaluations and show that existing
multimodal models are challenged by our counterfactual image-text pairs.
Additionally, we demonstrate the usefulness of COCO-Counterfactuals for
improving out-of-domain generalization of multimodal vision-language models via
training data augmentation.Comment: Accepted to NeurIPS 2023 Datasets and Benchmarks Trac
Detection and height estimation of buildings from SAR and optical images using conditional random fields
[no abstract
Knowledge and Reasoning for Image Understanding
abstract: Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (âwhereâ), characterize and recognize (âwhatâ) objects, regions, and their attributes in the image. However, the notion of âunderstandingâ (and the goal of artificial intelligent machines) goes beyond factual recall of the recognized components and includes reasoning and thinking beyond what can be seen (or perceived). Understanding is often evaluated by asking questions of increasing difficulty. Thus, the expected functionalities of an intelligent Image Understanding system can be expressed in terms of the functionalities that are required to answer questions about an image. Answering questions about images require primarily three components: Image Understanding, question (natural language) understanding, and reasoning based on knowledge. Any question, asking beyond what can be directly seen, requires modeling of commonsense (or background/ontological/factual) knowledge and reasoning.
Knowledge and reasoning have seen scarce use in image understanding applications. In this thesis, we demonstrate the utilities of incorporating background knowledge and using explicit reasoning in image understanding applications. We first present a comprehensive survey of the previous work that utilized background knowledge and reasoning in understanding images. This survey outlines the limited use of commonsense knowledge in high-level applications. We then present a set of vision and reasoning-based methods to solve several applications and show that these approaches benefit in terms of accuracy and interpretability from the explicit use of knowledge and reasoning. We propose novel knowledge representations of image, knowledge acquisition methods, and a new implementation of an efficient probabilistic logical reasoning engine that can utilize publicly available commonsense knowledge to solve applications such as visual question answering, image puzzles. Additionally, we identify the need for new datasets that explicitly require external commonsense knowledge to solve. We propose the new task of Image Riddles, which requires a combination of vision, and reasoning based on ontological knowledge; and we collect a sufficiently large dataset to serve as an ideal testbed for vision and reasoning research. Lastly, we propose end-to-end deep architectures that can combine vision, knowledge and reasoning modules together and achieve large performance boosts over state-of-the-art methods.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Multi-Surface Simplex Spine Segmentation for Spine Surgery Simulation and Planning
This research proposes to develop a knowledge-based multi-surface simplex deformable model for segmentation of healthy as well as pathological lumbar spine data. It aims to provide a more accurate and robust segmentation scheme for identification of intervertebral disc pathologies to assist with spine surgery planning. A robust technique that combines multi-surface and shape statistics-aware variants of the deformable simplex model is presented. Statistical shape variation within the dataset has been captured by application of principal component analysis and incorporated during the segmentation process to refine results. In the case where shape statistics hinder detection of the pathological region, user-assistance is allowed to disable the prior shape influence during deformation. Results have been validated against user-assisted expert segmentation
An investigation into the geomorphology of the Hebron Fault, Namibia, using a satellite-derived, high-resolution digital elevation model (DEM)
The Hebron fault scarp in southern Namibia is 45 km in length with an average height of 5.5 m and a maximum height of 8.9 m. Namibia is a Stable Continental Region (SCR) â a slowly deforming area within a continental plate. The country also has little recorded seismicity with the largest earthquake on the International Seismological Center (ISC) catalogue being MW 5.4. If the Hebron fault scarp was formed in a single event, this would represent a MW 7.3 earthquake. SCRs do occasionally experience large earthquakes, however, the recurrence intervals between these events is much larger than in rapidly deforming areas. Consequently, studying palaeo-earthquakes allows the record of seismicity to be extended and the characteristics of SCR events to be better understood. These studies may help refine the Mmax estimates required for seismic hazard assessment. Previous work on Hebron has been limited to field descriptions and theodolite survey scarp heights. Furthermore, there have been several interpretations of the fault mechanism and number of rupture events. This study produces a high-resolution Digital Elevation Model (DEM) via stereophotogrammetry using pan-sharpened Worldview-3 satellite imagery (0.31 m resolution). The DEM was used for several geomorphological analyses. These included measuring the scarp height at 160 locations along its length, measuring river channel displacements and identifying knickpoints along river profiles. Results indicate that the scarp formed from a normal, dip-slip fault that ruptured in a single event. This scenario would imply a high slip-to-length ratio. A comparison of other SCR fault scarps in the literature was made which shows that Hebronsâ slip-to-length ratio falls within the values found on other SCR faults. This study also discusses the implications of results for seismic hazard assessment in the region. Due a poor seismic record, probabilistic seismic hazard analysis (PSHA) will calculate a low seismic risk for Namibia. As large earthquakes can occur in SCRs, deterministic seismic hazard analysis (DSHA) can be used to inform policy makers of the worst case scenarios
Using child-friendly movie stimuli to study the development of face, place, and object regions from age 3 to 12 years
Scanning young children while they watch short, engaging, commerciallyâproduced movies has emerged as a promising approach for increasing data retention and quality. Movie stimuli also evoke a richer variety of cognitive processes than traditional experiments, allowing the study of multiple aspects of brain development simultaneously. However, because these stimuli are uncontrolled, it is unclear how effectively distinct profiles of brain activity can be distinguished from the resulting data. Here we develop an approach for identifying multiple distinct subjectâspecific Regions of Interest (ssROIs) using fMRI data collected during movieâviewing. We focused on the test case of higherâlevel visual regions selective for faces, scenes, and objects. Adults (NÂ =â13) were scanned while viewing a 5.6âmin childâfriendly movie, as well as a traditional localizer experiment with blocks of faces, scenes, and objects. We found that just 2.7âmin of movie data could identify subjectâspecific face, scene, and object regions. While successful, movieâdefined ssROIS still showed weaker domain selectivity than traditional ssROIs. Having validated our approach in adults, we then used the same methods on movie data collected from 3 to 12âyearâold children (NÂ =â122). Movie response timecourses in 3âyearâold children's face, scene, and object regions were already significantly and specifically predicted by timecourses from the corresponding regions in adults. We also found evidence of continued developmental change, particularly in the faceâselective posterior superior temporal sulcus. Taken together, our results reveal both early maturity and functional change in face, scene, and object regions, and more broadly highlight the promise of short, childâfriendly movies for developmental cognitive neuroscience
- âŠ