422 research outputs found

    Nerfstudio: A Modular Framework for Neural Radiance Field Development

    Full text link
    Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio.Comment: Project page at https://nerf.studi

    FEW SHOT PHOTOGRAMETRY: A COMPARISON BETWEEN NERF AND MVS-SFM FOR THE DOCUMENTATION OF CULTURAL HERITAGE

    Get PDF
    3D documentation methods for Digital Cultural Heritage (DCH) domain is a field that becomes increasingly interdisciplinary, breaking down boundaries that have long separated experts from different domains. In the past, there has been an ambiguous claim for ownership of skills, methodologies, and expertise in the heritage sciences. This study aims to contribute to the dialogue between these different disciplines by presenting a novel approach for 3D documentation of an ancient statue. The method combines TLS acquisition and MVS pipeline using images from a DJI Mavic 2 drone. Additionally, the study compares the accuracy and final product of the Deep Points (DP) and Neural Radiance Fields (NeRF) methods, using the TLS acquisition as validation ground truth. Firstly, a TLS acquisition was performed on an ancient statue using a Faro Focus 2 scanner. Next, a multi-view stereo (MVS) pipeline was adopted using 2D images captured by a Mini-2 DJI Mavic 2 drone from a distance of approximately 1 meter around the statue. Finally, the same images were used to train and run the NeRF network after being reduced by 90%. The main contribution of this paper is to improve our understanding of this method and compare the accuracy and final product of two different approaches - direct projection (DP) and NeRF - by exploiting a TLS acquisition as the validation ground truth. Results show that the NeRF approach outperforms DP in terms of accuracy and produces a more realistic final product. This paper has important implications for the field of CH preservation, as it offers a new and effective method for generating 3D models of ancient statues. This technology can help to document and preserve important cultural artifacts for future generations, while also providing new insights into the history and culture of different civilizations. Overall, the results of this study demonstrate the potential of combining TLS and NeRF for generating accurate and realistic 3D models of ancient statues

    On the use of heterogenous computing in high-energy particle physics at the ATLAS detector

    Get PDF
    A dissertation submitted in fulfillment of the requirements for the degree of Master of Physics in the School of Physics November 1, 2017.The ATLAS detector at the Large Hadron Collider (LHC) at CERN is undergoing upgrades to its instrumentation, as well as the hardware and software that comprise its Trigger and Data Acquisition (TDAQ) system. The increased energy will yield larger cross sections for interesting physics processes, but will also lead to increased artifacts in on-line reconstruction in the trigger, as well as increased trigger rates, beyond the current system’s capabilities. To meet these demands it is likely that the massive parallelism of General-Purpose Programming with Graphic Processing Units (GPGPU) will be utilised. This dissertation addresses the problem of integrating GPGPU into the existing Trigger and TDAQ platforms; detailing and analysing GPGPU performance in the context of performing in a high-throughput, on-line environment like ATLAS. Preliminary tests show low to moderate speed-up with GPU relative to CPU, indicating that to achieve a more significant performance increase it may be necessary to alter the current platform beyond pairing suitable GPUs to CPUs in an optimum ratio. Possible solutions are proposed and recommendations for future work are given.LG201

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Kohteentunnistus kuvista konvoluutioneuroverkoilla

    Get PDF
    Object detection is a subfield of computer vision that is currently heavily based on machine learning. For the past decade, the field of machine learning has been dominated by so-called deep neural networks, which take advantage of improvements in computing power and data availability. A subtype of a neural network called a convolutional neural network (CNN) is well-suited for image-related tasks. The network is trained to look for different features, such as edges, corners and colour differences, across the image and to combine these into more complex shapes. For object detection, the system has to both estimate the locations of probable objects and to classify these. For this master's thesis, we reviewed the current literature on convolutional object detection and tested the implementability of one of the methods. We found that convolutional object detection is still evolving as a technology, despite outranking other object detection methods. By virtue of free availability of datasets and pretrained networks, it is possible to create a functional implementation of a deep neural network without access to specialist hardware. Pretrained networks can also be used as a starting point for training new networks, decreasing costly training time. For the experimental part, we implemented Fast R-CNN using MATLAB and MatConvNet and tested a general object detector on two different traffic-related datasets. We found that Fast R-CNN is relatively precise and considerably faster than the original convolutional object detection method, R-CNN, and can be implemented on a home computer. Advanced methods, such as Faster R-CNN and SSD, improve the speed of Fast R-CNN. We also experimented with a geometry-based scene estimation model, which was reported to improve the precision of a previous generation object detection method. We found that with our implementation of Fast R-CNN there was no such improvement, although further adjustments are possible. Combining whole scene modelling with convolutional networks is a potential subject of further study.Kohteentunnistus on tietokonenäön osa-alue, joka pohjautuu vahvasti koneoppimiseen. Koneoppimisen tämän vuosikymmenen trendi ovat niin kutsutut syväoppivat neuroverkot, jotka perustuvat laskentatehon ja datan saatavuuden kasvuun. Konvoluutioneuroverkko on neuroverkon alatyyppi, joka sopii erityisesti kuviin liittyvien ongelmien ratkaisuun. Verkko opetetaan etsimään yksinkertaisia kuvapiirteitä ja yhdistelemään näitä monimutkaisemmiksi muodoiksi. Kohteentunnistusongelmassa menetelmän tulee sekä paikallistaa että luokitella kiinnostavat kohteet. Diplomityöni sisältää kirjallisuuskatsauksen konvoluutioon perustuviin kohteentunnistusmenetelmiin sekä selostuksen erään tällaisen menetelmän toteuttamisesta. Konvoluutioon perustuva kohteentunnistus kehittyy tällä hetkellä kiivaasti ja on muita menetelmiä tarkempi ja nopeampi. Vapaasti saatavilla olevien opetusaineistojen ja esiopetetujen verkkojen avulla syvä neuroverkko on mahdollista toteuttaa suhteellisen vaivattomasti ja ilman erikoislaitteita. Esiopetettuja verkkoja voidaan käyttää pohjana uusien verkkojen kouluttamiseen. Kokeellisessa osassa toteutin Fast R-CNN:n MATLABin ja MatConvNetin avulla ja kokeilin kahden liikennedata-aineiston avulla, kuinka yleisellä datalla opetettu verkko suoriutui erityisongelmasta. Fast R-CNN suoritti tunnistuksen kohtuullisen tarkasti ja on edeltäjäänsä R-CNN:ää sen verran nopeampi, että on toteutettavissa kotitietokoneella. Kehittyneemmät menetelmät, kuten Faster R-CNN ja SSD, olisivat tätäkin nopeampia, mutta eivät juurikaan tarkempia. Kokeilin myös yhdistää Fast R-CNN geometriantunnistusmenetelmän kanssa, jota on käytetty aikaisemman sukupolven menetelmän tarkkuuden parantamiseen. Konvoluutiomenetelmän kanssa tarkkuus ei noussut, mutta tutkin työssäni, mistä tämä johtui ja kuinka koko näkymän estimointia voidaan mahdollisesti hyödyntää konvoluutioneuroverkoissa

    Scalable data clustering using GPUs

    Get PDF
    The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA\u27s CUDA framework and Tesla architecture, were investigated as a low-cost, high performance solution to a number of clustering algorithms. C-means and Expectation Maximization with Gaussian mixture models were implemented using the CUDA framework. The algorithm implementations use a hybrid of CUDA, OpenMP, and MPI to scale to many GPUs on multiple nodes in a high performance computing environment. This framework is envisioned as part of a larger cloud-based workflow service where biologists can apply multiple algorithms and parameter sweeps to their data sets and quickly receive a thorough set of results that can be further analyzed by experts. Improvements over previous GPU-accelerated implementations range from 1.42x to 21x for C-means and 3.72x to 5.65x for the Gaussian mixture model on non-trivial data sets. Using a single NVIDIA GTX 260 speedups are on average 90x for C-means and 74x for Gaussians with flow cytometry files compared to optimized C code running on a single core of a modern Intel CPU. Using the TeraGrid Lincoln high performance cluster at NCSA C-means achieves 42% parallel efficiency and a CPU speedup of 4794x with 128 Tesla C1060 GPUs. The Gaussian mixture model achieves 72% parallel efficiency and a CPU speedup of 6286x
    corecore