21 research outputs found

    Automated Search for Resource-Efficient Branched Multi-Task Networks

    Full text link
    The multi-modal nature of many vision problems calls for neural network architectures that can perform multiple tasks concurrently. Typically, such architectures have been handcrafted in the literature. However, given the size and complexity of the problem, this manual architecture exploration likely exceeds human design abilities. In this paper, we propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching (tree-like) structures in the encoding stage of a multi-task neural network. To allow flexibility within resource-constrained environments, we introduce a proxyless, resource-aware loss that dynamically controls the model size. Evaluations across a variety of dense prediction tasks show that our approach consistently finds high-performing branching structures within limited resource budgets.Comment: British Machine Vision Conference (BMVC) 202

    Breathing New Life into 3D Assets with Generative Repainting

    Full text link
    Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Broad adoption of these models is due to significant improvement in the quality of generations and efficient conditioning on various modalities, not just text. However, lifting the rich generative priors of these 2D models into 3D is challenging. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools and demonstrate their ability to work together in a non-learned fashion. Such modularity has the intrinsic advantage of eased partial upgrades, which became an important property in such a fast-paced domain. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools, and outputs a painted input geometry in several formats. We conduct a large-scale study on a wide range of objects and categories from the ShapeNetSem dataset and demonstrate the advantages of our approach, both qualitatively and quantitatively. Project page: https://www.obukhov.ai/repainting_3d_asset

    Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

    Full text link
    We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting. Semantic segmentation and monocular depth estimation are shown to be complementary tasks; in a multi-task learning setting, a proper encoding of their relationships can further improve performance on both tasks. Motivated by this observation, we propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions. To capture the cross-task relationships, we propose a neural network architecture that contains task-specific and cross-task refinement heads. Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain. We experimentally observe improvements in both tasks' performance because the complementary information present in these tasks is better captured. Specifically, we show that: (1) our approach improves performance on all tasks when they are complementary and mutually dependent; (2) the CTRL helps to improve both semantic segmentation and depth estimation tasks performance in the challenging UDA setting; (3) the proposed ISL training scheme further improves the semantic segmentation performance. The implementation is available at https://github.com/susaha/ctrl-uda.Comment: Accepted at CVPR 2021; updated results according to the released source cod

    Designing Efficient Deep Neural Networks: Topological Optimization, Quantization and Multi-Task Learning

    No full text
    The design of more complex and powerful deep neural networks has consistently advanced the state-of-the-art in a wide range of tasks over time. In the pursuit of increased performance, computational complexity is often severely hindered, as seen by the significant increase in the number of parameters, the required floating-point operations, and latency. While the great advancements of deep neural networks increase the interest in their use in downstream applications such as robotics and augmented reality, these applications require computationally efficient alternatives. This thesis focuses on the design of efficient deep neural networks, specifically, improving performance given computational constraints, or decreasing complexity with minor performance degradation. Firstly, we present a novel convolutional operation reparameterization and its application to multi-task learning. By reparameterizing the convolutional operations, we can achieve comparable performance to single-task models at a fraction of the total number of parameters. Secondly, we conduct an extensive study to evaluate the efficacy of self-supervised tasks as auxiliary tasks in a multi-task learning framework. We find that jointly training a target task with self-supervised tasks can improve performance and robustness, commonly outperforms labeled auxiliary tasks, while not requiring modifications to the architecture used at deployment. Thirdly, we propose a novel transformer layer for efficient single-object visual tracking. We demonstrate that the performance of real-time singleobject trackers can be significantly improved without compromising latency, while consistently outperforming alternative transformer layers. Finally, we investigated the efficacy of adapting interest point detection and description neural networks for use in computationally limited platforms. We find that mixed-precision quantization of network components, coupled with a binary descriptor normalization layer, yields minor performance degradations while improving the size of sparse 3D maps, matching speed, and inference speed by at least an order of magnitude. To conclude, this thesis focuses on the design of deep neural networks given computational limitations. With an increasing interest and demand for efficient deep networks, we envision the presented work will pave the way towards even more efficient methods, bridging the gap with better-performing alternatives

    Scleral Buckling versus Vitrectomy for Retinal Detachment Repair: Comparison of Visual Fields and Nerve Fiber Layer Thickness

    No full text
    PURPOSE: To compare retrospectively visual field loss, and retinal nerve fiber layer (RNFL) defects, in cases of Rhegmatogenous Retinal Detachment (RRD) treated by Scleral Buckle (SB) or Pars Plana Vitrectomy (PPV). DESIGN: Retrospective, comparative, non-randomized, interventional case series. METHODS: A review of 50 eyes with primary RRD and uncomplicated surgical treatment, the 25 of which were treated by vitrectomy and C3F8 injection, and the other 25 cases by external buckle. The Pars Plana Vitrectomy (PPV) group and the scleral buckle (SB) group were compared retrospectively (at least nine months after surgery), in the context of functional and structural changes. The Visual Fields (VF) were studied using static automated perimetry. The Total Deviation (TD) and Pattern Deviation (PD) values, for the attached and detached areas were compared separately. The optic nerve head morphology was studied using Heidelberg Retinal Tomograph (HRT) and the RNFL, using spectral domain Optical Coherence Tomography (SD-OCT). RESULTS: The preoperatively detached areas had more affected TD and PD values, compared to the preoperatively attached areas, (p=0.001 and p=0.001 respectively) in both groups. The preoperatively attached areas of the SB group had better mean TD and mean PD values, compared to the preoperatively attached areas of the PPV group (p=0.007 and p=0.009 respectively). The RNFL and HRT values had no statistically significant difference between the two groups. CONCLUSIONS: In our study, cases of RRD treated by SB or PPV, the preoperatively detached areas had more affected VF indices compared to the preoperatively attached areas in both groups. The preoperatively attached areas VF values were better in the SB cases, compared to the PPV cases. The RNFL and HRT values had no statistically significant difference between the two groups.Morfological changes, represented by RNFL and HRT values in RRD patients treated by Scleral Buckle (SB) or Pars Plana Vitrectomy (PPV) and C3F8 injection, did not have any statistically significant difference. However this is not the case regarding functional changes, as studied using visual fields. It seems that detached retina, despite successful reattachment, suffers permanent functional damage as a result of the detachment. Our results suggest that it is very likely that the injection of C3F8 might affect Retinal functionality, in a grade substantially minor to that of the retinal detachment, most probably at the level of RNFL. Further studies are needed to elucidate the exact nature and localization of these findings.ΣΚΟΠΟΣ: Η σύγκριση της δομής και λειτουργίας των νευρικών ινών σε οφθαλμούς με ρηγματογενή αποκόλληση αμφιβληστροειδούς που έχουν αντιμετωπιστεί χειρουργικά με σκληρικό μόσχευμα (Scleral Buckle – SB), ή υαλοειδεκτομή και ένθεση αερίου C3F8 (Pars Plana Vitrectomy – PPV). ΣΧΕΔΙΑΣΜΟΣ: Αναδρομική, συγκριτική, μη τυχαιοποιημένη, παρεμβατική σειρά περιστατικών. ΜΕΘΟΔΟΙ: Μελετήθηκαν 50 οφθαλμοί ασθενών με πρωτοπαθή ρηγματογενή αποκόλληση αμφιβληστροειδούς, τουλάχιστον 9 μήνες μετά από επιτυχή χειρουργική αντιμετώπιση, με μία χειρουργική επέμβαση. Εξ αυτών, 25 οφθαλμοί αντιμετωπίστηκαν με υαλοειδεκτομή και ένθεση C3F8 (ομάδα PPV), και 25 οφθαλμοί με σκληρικό μόσχευμα (ομάδα SB). Έγινε μελέτη μορφολογικών και λειτουργικών παραμέτρων. Τα οπτικά πεδία (Visual Fields - VF) μελετήθηκαν με τη χρήση αυτοματοποιημένης στατικής περιμετρίας. Η μέση τιμή συνολικής απόκλισης (Τotal Deviation - TD) και η μέση τιμή απόκλισης δομής (Pattern Deviation - PD), συγκρίθηκαν χωριστά για τις προεγχειρητικά επικολλημένες και αποκολλημένες περιοχές. Η μορφολογία της κεφαλής του οπτικού νεύρου μελετήθηκε με συνεστιακό Laser οφθαλμοσκόπιο σάρωσης HRT II (Heidelberg Retina Tomograph II), ενώ το πάχος της στιβάδας των νευρικών ινών (Retinal Nerve Fiber Layer – RNFL), με οπτική τομογραφία συνοχής (SD-OCT). ΑΠΟΤΕΛΕΣΜΑΤΑ: Στις προεγχειρητικά αποκολλημένες περιοχές οι μέσες τιμές TD και PD, είναι χειρότερες σε σύγκριση με τις προεγχειρητικά επικολλημένες περιοχές, (p = 0,001 και p = 0,001 αντίστοιχα) και στις δύο ομάδες. Οι προεγχειρητικά επικολλημένες περιοχές της ομάδας SB εμφάνισαν καλύτερες μέσες τιμές TD και μέσες τιμές PD, σε σύγκριση με τις προεγχειρητικά επικολλημένες περιοχές της ομάδας PPV (p = 0,007 και ρ = 0,009 αντίστοιχα). Οι τιμές των παραμέτρων που μελετήθηκαν με το RNFL και το HRT II δεν διέφεραν σε στατιστικά σημαντικό επίπεδο μεταξύ των δύο ομάδων. ΣΥΜΠΕΡΑΣΜΑΤΑ: Στη μελέτη μας, σε οφθαλμούς χειρουργημένους λόγω ρηγματογενούς αποκόλλησης, οι προεγχειρητικά αποκολλημένες περιοχές είχαν χειρότερη ευαισθησία στα οπτικά πεδία σε σύγκριση με τις προεγχειρητικά επικολλημένες περιοχές, ανεξάρτητα από τη μέθοδο αντιμετώπισης (σκληρικό μόσχευμα ή υαλοειδεκτομή και ένθεση αερίου C3F8). Για τις προεγχειρητικά επικολλημένες περιοχές, η ευαισθησία ήταν χειρότερη για τους οφθαλμούς που χειρουργήθηκαν με τη μέθοδο της υαλοειδεκτομής με ένθεση C3F8. Οι μετρήσεις πάχους της RNFL και οι παράμετροι του HRT δεν είχαν στατιστικά σημαντική διαφορά μεταξύ των δύο ομάδων.Δεν διαπιστώθηκαν επομένως στατιστικά σημαντικές μορφολογικές διαφορές, στο πάχος της RNFL και τις παραμέτρους του HRT σε ασθενείς που αντιμετωπίστηκαν με υαλοειδεκτομή και ένεση C3F8 ή με σκληρικό μόσχευμα. Ωστόσο διαπιστώθηκαν λειτουργικές διαφορές, όπως μελετήθηκαν με τη χρήση των οπτικών πεδίων. Φαίνεται ότι ο προεγχειρητικά αποκολλημένος αμφιβληστροειδής, παρά την επιτυχή επανεπικόλληση, υφίσταται μόνιμη βλάβη ως αποτέλεσμα της αποκόλλησης, ανεξάρτητα από τη μέθοδο θεραπείας. Φαίνεται επίσης ότι η υαλοειδεκτομή με ένθεση αερίου C3F8 επηρεάζει βλαπτικά τη λειτουργικότητα του αμφιβληστροειδούς, αν και επιδρά σε βαθμό σημαντικά μικρότερο από την ίδια την αποκόλληση, πιθανότατα στο επίπεδο των νευρικών ινών.Απαιτούνται περαιτέρω μελέτες για να διευκρινιστεί η ακριβής φύση του βλαπτικού παράγοντα και η εντόπιση της δομής στην οποία επιδρά

    Efficient Visual Tracking with Exemplar Transformers

    No full text
    The design of more complex and powerful neural net- work models has significantly advanced the state-of-the-art in visual object tracking. These advances can be attributed to deeper networks, or the introduction of new building blocks, such as transformers. However, in the pursuit of increased tracking performance, runtime is often hindered. Furthermore, efficient tracking architectures have received surprisingly little attention. In this paper, we introduce the Exemplar Transformer, a transformer module utilizing a single instance level attention layer for realtime visual object tracking. E.T.Track, our visual tracker that incorpo- rates Exemplar Transformer modules, runs at 47 FPS on a CPU. This is up to 8× faster than other transformer-based models. When compared to lightweight trackers that can operate in realtime on standard CPUs, E.T.Track consis- tently outperforms all other methods on the LaSOT [16], OTB-100 [52], NFS [27], TrackingNet [36], and VOT- ST2020 [29] datasets. Code and models are available at https://github.com/pblatter/ettrack

    Efficient Visual Tracking with Exemplar Transformers

    Full text link
    The design of more complex and powerful neural network models has significantly advanced the state-of-the-art in visual object tracking. These advances can be attributed to deeper networks, or the introduction of new building blocks, such as transformers. However, in the pursuit of increased tracking performance, runtime is often hindered. Furthermore, efficient tracking architectures have received surprisingly little attention. In this paper, we introduce the Exemplar Transformer, a transformer module utilizing a single instance level attention layer for realtime visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer modules, runs at 47 FPS on a CPU. This is up to 8x faster than other transformer-based models. When compared to lightweight trackers that can operate in realtime on standard CPUs, E.T.Track consistently outperforms all other methods on the LaSOT, OTB-100, NFS, TrackingNet, and VOT-ST2020 datasets. The code will be made publicly available upon publication
    corecore