255 research outputs found

    ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools

    Get PDF
    Real-time tool segmentation from endoscopic videos is an essential part of many computer-assisted robotic surgical systems and of critical importance in robotic surgical data science. We propose two novel deep learning architectures for automatic segmentation of non-rigid surgical instruments. Both methods take advantage of automated deep-learning-based multi-scale feature extraction while trying to maintain an accurate segmentation quality at all resolutions. The two proposed methods encode the multi-scale constraint inside the network architecture. The first proposed architecture enforces it by cascaded aggregation of predictions and the second proposed network does it by means of a holistically-nested architecture where the loss at each scale is taken into account for the optimization process. As the proposed methods are for real-time semantic labeling, both present a reduced number of parameters. We propose the use of parametric rectified linear units for semantic labeling in these small architectures to increase the regularization ability of the design and maintain the segmentation accuracy without overfitting the training sets. We compare the proposed architectures against state-of-the-art fully convolutional networks. We validate our methods using existing benchmark datasets, including ex vivo cases with phantom tissue and different robotic surgical instruments present in the scene. Our results show a statistically significant improved Dice Similarity Coefficient over previous instrument segmentation methods. We analyze our design choices and discuss the key drivers for improving accuracy.Comment: Paper accepted at IROS 201

    Navigation system based in motion tracking sensor for percutaneous renal access

    Get PDF
    Tese de Doutoramento em Engenharia BiomédicaMinimally-invasive kidney interventions are daily performed to diagnose and treat several renal diseases. Percutaneous renal access (PRA) is an essential but challenging stage for most of these procedures, since its outcome is directly linked to the physician’s ability to precisely visualize and reach the anatomical target. Nowadays, PRA is always guided with medical imaging assistance, most frequently using X-ray based imaging (e.g. fluoroscopy). Thus, radiation on the surgical theater represents a major risk to the medical team, where its exclusion from PRA has a direct impact diminishing the dose exposure on both patients and physicians. To solve the referred problems this thesis aims to develop a new hardware/software framework to intuitively and safely guide the surgeon during PRA planning and puncturing. In terms of surgical planning, a set of methodologies were developed to increase the certainty of reaching a specific target inside the kidney. The most relevant abdominal structures for PRA were automatically clustered into different 3D volumes. For that, primitive volumes were merged as a local optimization problem using the minimum description length principle and image statistical properties. A multi-volume Ray Cast method was then used to highlight each segmented volume. Results show that it is possible to detect all abdominal structures surrounding the kidney, with the ability to correctly estimate a virtual trajectory. Concerning the percutaneous puncturing stage, either an electromagnetic or optical solution were developed and tested in multiple in vitro, in vivo and ex vivo trials. The optical tracking solution aids in establishing the desired puncture site and choosing the best virtual puncture trajectory. However, this system required a line of sight to different optical markers placed at the needle base, limiting the accuracy when tracking inside the human body. Results show that the needle tip can deflect from its initial straight line trajectory with an error higher than 3 mm. Moreover, a complex registration procedure and initial setup is needed. On the other hand, a real-time electromagnetic tracking was developed. Hereto, a catheter was inserted trans-urethrally towards the renal target. This catheter has a position and orientation electromagnetic sensor on its tip that function as a real-time target locator. Then, a needle integrating a similar sensor is used. From the data provided by both sensors, one computes a virtual puncture trajectory, which is displayed in a 3D visualization software. In vivo tests showed a median renal and ureteral puncture times of 19 and 51 seconds, respectively (range 14 to 45 and 45 to 67 seconds). Such results represent a puncture time improvement between 75% and 85% when comparing to state of the art methods. 3D sound and vibrotactile feedback were also developed to provide additional information about the needle orientation. By using these kind of feedback, it was verified that the surgeon tends to follow a virtual puncture trajectory with a reduced amount of deviations from the ideal trajectory, being able to anticipate any movement even without looking to a monitor. Best results show that 3D sound sources were correctly identified 79.2 ± 8.1% of times with an average angulation error of 10.4º degrees. Vibration sources were accurately identified 91.1 ± 3.6% of times with an average angulation error of 8.0º degrees. Additionally to the EMT framework, three circular ultrasound transducers were built with a needle working channel. One explored different manufacture fabrication setups in terms of the piezoelectric materials, transducer construction, single vs. multi array configurations, backing and matching material design. The A-scan signals retrieved from each transducer were filtered and processed to automatically detect reflected echoes and to alert the surgeon when undesirable anatomical structures are in between the puncture path. The transducers were mapped in a water tank and tested in a study involving 45 phantoms. Results showed that the beam cross-sectional area oscillates around the ceramics radius and it was possible to automatically detect echo signals in phantoms with length higher than 80 mm. Hereupon, it is expected that the introduction of the proposed system on the PRA procedure, will allow to guide the surgeon through the optimal path towards the precise kidney target, increasing surgeon’s confidence and reducing complications (e.g. organ perforation) during PRA. Moreover, the developed framework has the potential to make the PRA free of radiation for both patient and surgeon and to broad the use of PRA to less specialized surgeons.Intervenções renais minimamente invasivas são realizadas diariamente para o tratamento e diagnóstico de várias doenças renais. O acesso renal percutâneo (ARP) é uma etapa essencial e desafiante na maior parte destes procedimentos. O seu resultado encontra-se diretamente relacionado com a capacidade do cirurgião visualizar e atingir com precisão o alvo anatómico. Hoje em dia, o ARP é sempre guiado com recurso a sistemas imagiológicos, na maior parte das vezes baseados em raios-X (p.e. a fluoroscopia). A radiação destes sistemas nas salas cirúrgicas representa um grande risco para a equipa médica, aonde a sua remoção levará a um impacto direto na diminuição da dose exposta aos pacientes e cirurgiões. De modo a resolver os problemas existentes, esta tese tem como objetivo o desenvolvimento de uma framework de hardware/software que permita, de forma intuitiva e segura, guiar o cirurgião durante o planeamento e punção do ARP. Em termos de planeamento, foi desenvolvido um conjunto de metodologias de modo a aumentar a eficácia com que o alvo anatómico é alcançado. As estruturas abdominais mais relevantes para o procedimento de ARP, foram automaticamente agrupadas em volumes 3D, através de um problema de optimização global com base no princípio de “minimum description length” e propriedades estatísticas da imagem. Por fim, um procedimento de Ray Cast, com múltiplas funções de transferência, foi utilizado para enfatizar as estruturas segmentadas. Os resultados mostram que é possível detetar todas as estruturas abdominais envolventes ao rim, com a capacidade para estimar corretamente uma trajetória virtual. No que diz respeito à fase de punção percutânea, foram testadas duas soluções de deteção de movimento (ótica e eletromagnética) em múltiplos ensaios in vitro, in vivo e ex vivo. A solução baseada em sensores óticos ajudou no cálculo do melhor ponto de punção e na definição da melhor trajetória a seguir. Contudo, este sistema necessita de uma linha de visão com diferentes marcadores óticos acoplados à base da agulha, limitando a precisão com que a agulha é detetada no interior do corpo humano. Os resultados indicam que a agulha pode sofrer deflexões à medida que vai sendo inserida, com erros superiores a 3 mm. Por outro lado, foi desenvolvida e testada uma solução com base em sensores eletromagnéticos. Para tal, um cateter que integra um sensor de posição e orientação na sua ponta, foi colocado por via trans-uretral junto do alvo renal. De seguida, uma agulha, integrando um sensor semelhante, é utilizada para a punção percutânea. A partir da diferença espacial de ambos os sensores, é possível gerar uma trajetória de punção virtual. A mediana do tempo necessário para puncionar o rim e ureter, segundo esta trajetória, foi de 19 e 51 segundos, respetivamente (variações de 14 a 45 e 45 a 67 segundos). Estes resultados representam uma melhoria do tempo de punção entre 75% e 85%, quando comparados com o estado da arte dos métodos atuais. Além do feedback visual, som 3D e feedback vibratório foram explorados de modo a fornecer informações complementares da posição da agulha. Verificou-se que com este tipo de feedback, o cirurgião tende a seguir uma trajetória de punção com desvios mínimos, sendo igualmente capaz de antecipar qualquer movimento, mesmo sem olhar para o monitor. Fontes de som e vibração podem ser corretamente detetadas em 79,2 ± 8,1% e 91,1 ± 3,6%, com erros médios de angulação de 10.4º e 8.0 graus, respetivamente. Adicionalmente ao sistema de navegação, foram também produzidos três transdutores de ultrassom circulares com um canal de trabalho para a agulha. Para tal, foram exploradas diferentes configurações de fabricação em termos de materiais piezoelétricos, transdutores multi-array ou singulares e espessura/material de layers de suporte. Os sinais originados em cada transdutor foram filtrados e processados de modo a detetar de forma automática os ecos refletidos, e assim, alertar o cirurgião quando existem variações anatómicas ao longo do caminho de punção. Os transdutores foram mapeados num tanque de água e testados em 45 phantoms. Os resultados mostraram que o feixe de área em corte transversal oscila em torno do raio de cerâmica, e que os ecos refletidos são detetados em phantoms com comprimentos superiores a 80 mm. Desta forma, é expectável que a introdução deste novo sistema a nível do ARP permitirá conduzir o cirurgião ao longo do caminho de punção ideal, aumentado a confiança do cirurgião e reduzindo possíveis complicações (p.e. a perfuração dos órgãos). Além disso, de realçar que este sistema apresenta o potencial de tornar o ARP livre de radiação e alarga-lo a cirurgiões menos especializados.The present work was only possible thanks to the support by the Portuguese Science and Technology Foundation through the PhD grant with reference SFRH/BD/74276/2010 funded by FCT/MEC (PIDDAC) and by Fundo Europeu de Desenvolvimento Regional (FEDER), Programa COMPETE - Programa Operacional Factores de Competitividade (POFC) do QREN

    Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

    Full text link
    Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody in speech, which is different for different speakers. Models like cycle-consistent adversarial network (Cycle-GAN) and variational auto-encoder (VAE) have been successfully applied to voice conversion tasks without parallel data. However, due to the neural network architectures and feature vectors chosen for these approaches, the length of the predicted utterance has to be fixed to that of the input utterance, which limits the flexibility in mimicking the speaking rates and rhythmic patterns for the target speaker. On the other hand, sequence-to-sequence learning model was used to remove the above length constraint, but parallel training data are needed. In this paper, we propose an approach utilizing sequence-to-sequence model trained with unsupervised Cycle-GAN to perform the transformation between the phoneme posteriorgram sequences for different speakers. In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data. Preliminary evaluation on two datasets showed very encouraging results.Comment: 8 pages, 6 figures, Submitted to SLT 201

    DocMIR: An automatic document-based indexing system for meeting retrieval

    Get PDF
    This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter's computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio-visual data and metadata created during post-production. The identification is based on documents' signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality system

    Kernel Spectral Clustering and applications

    Full text link
    In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane in a high dimensional space induced by a kernel. In addition, the multi-way clustering can be obtained by combining a set of binary decision functions via an Error Correcting Output Codes (ECOC) encoding scheme. Because of its model-based nature, the KSC method encompasses three main steps: training, validation, testing. In the validation stage model selection is performed to obtain tuning parameters, like the number of clusters present in the data. This is a major advantage compared to classical spectral clustering where the determination of the clustering parameters is unclear and relies on heuristics. Once a KSC model is trained on a small subset of the entire data, it is able to generalize well to unseen test points. Beyond the basic formulation, sparse KSC algorithms based on the Incomplete Cholesky Decomposition (ICD) and L0L_0, L1,L0+L1L_1, L_0 + L_1, Group Lasso regularization are reviewed. In that respect, we show how it is possible to handle large scale data. Also, two possible ways to perform hierarchical clustering and a soft clustering method are presented. Finally, real-world applications such as image segmentation, power load time-series clustering, document clustering and big data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms

    Efficient material characterization by means of the Doppler effect in microwaves

    Get PDF
    Subject of this thesis is the efficient material characterization and defects detection by means of the Doppler effect with microwaves. The first main goal of the work is to develop a prototype of a microwave Doppler system for Non-Destructive Testing (NDT) purposes. Therefore it is necessary that the Doppler system satisfies the following requirements: non-expensive, easily integrated into industrial process, allows fast measurements. The Doppler system needs to include software for hardware control, measurements, and fast signal processing. The second main goal of the thesis is to establish and experimentally confirm possible practical applications of the Doppler system. The Doppler system consists of the following parts. The hardware part is designed in a way to ensure fast measurement and easy adjustment to the different radar types. The software part of the system contains tools for: hardware control, data acquisition, signal processing and representing data to the user. In this work firstly a new type of 2D Doppler amplitude imaging was developed and formalized. Such a technique is used to derive information about the measured object from several angles of view. In the thesis special attention was paid to the frequency analysis of the mea- sured signals as a means to improve spatial resolution of the radar. In the context of frequency analysis we present 2D Doppler frequency imaging and compare it with amplitude imaging. In the thesis the spatial resolution ability of CW radars is examined and im- proved. We show that the joint frequency and the amplitude signal processing allows to significantly increase the spatial resolution of the radar.Das Thema dieser Dissertation ist die effiziente Materialcharakterisierung und Fehlerdetektion durch Nutzung des Dopplereffektes mittels Mikrowellen. Das erste Hauptziel der Arbeit ist die Entwicklung eines Prototyps eines Mikrowellen-Doppler-Systems im Bereich der zerstörungsfreien Prüfung. Das Doppler-System muss folgenden Voraussetzungen erfüllen: es sollte preisgünstig sein, leicht in industrielle Prozesse integrierbar sein und schnelle Messungen erlauben. Das Doppler-System muss die Software für die Hardware-Kontrolle, den Messablauf und die schnelle Signalverarbeitung beinhalten. Das zweite Hauptziel der Dissertation ist es, mögliche praktische Anwendungsfelder des Doppler-Systems zu identifizieren und experimentell zu bearbeiten. Das Doppler-System besteht aus zwei Teilen. Der Hardware-Teil ist so konstruiert, dass er schnelle Messungen und leichte Anpassungen an verschiedene Sensor- und Radartypen zulässt. Der Software-Teil des Systems beinhaltet Werkzeuge für: Hardware-Kontrolle, Datenerfassung, Signalverarbeitung und Programme, um die Daten für den Benutzer zu präsentieren. In dieser Arbeit wurde zuerst ein neuer Typ der 2D-Doppler-Amplitudenbildgebung entwickelt und formalisiert. Dieser Technik wird dafür benutzt, Informationen über die gemessenen Objekte von verschiedenen Blickpunkten aus zu erhalten. In dieser Doktorarbeit wird der Frequenzanalyse der gemessenen Signale besondere Aufmerksamkeit geschenkt, um die Ortsauflösung des Radars zu verbessern. Im Kontext der Frequenzanalyse wird die 2D-Doppler-Frequenzbildgebung präsentiert und mit der Amplitudenbildgebung vergleichen. In dieser Dissertation werden die räumliche Auflösungsmöglichkeiten von CW-Radaren untersucht und verbessert. Es wird gezeigt, dass es die Frequenz- und Amplitudensignalverarbeitung erlaubt, die Ortsauflösung des Radars erheblich zu erhöhen
    corecore