255 research outputs found
ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools
Real-time tool segmentation from endoscopic videos is an essential part of
many computer-assisted robotic surgical systems and of critical importance in
robotic surgical data science. We propose two novel deep learning architectures
for automatic segmentation of non-rigid surgical instruments. Both methods take
advantage of automated deep-learning-based multi-scale feature extraction while
trying to maintain an accurate segmentation quality at all resolutions. The two
proposed methods encode the multi-scale constraint inside the network
architecture. The first proposed architecture enforces it by cascaded
aggregation of predictions and the second proposed network does it by means of
a holistically-nested architecture where the loss at each scale is taken into
account for the optimization process. As the proposed methods are for real-time
semantic labeling, both present a reduced number of parameters. We propose the
use of parametric rectified linear units for semantic labeling in these small
architectures to increase the regularization ability of the design and maintain
the segmentation accuracy without overfitting the training sets. We compare the
proposed architectures against state-of-the-art fully convolutional networks.
We validate our methods using existing benchmark datasets, including ex vivo
cases with phantom tissue and different robotic surgical instruments present in
the scene. Our results show a statistically significant improved Dice
Similarity Coefficient over previous instrument segmentation methods. We
analyze our design choices and discuss the key drivers for improving accuracy.Comment: Paper accepted at IROS 201
Navigation system based in motion tracking sensor for percutaneous renal access
Tese de Doutoramento em Engenharia BiomédicaMinimally-invasive kidney interventions are daily performed to diagnose and treat several renal
diseases. Percutaneous renal access (PRA) is an essential but challenging stage for most of these
procedures, since its outcome is directly linked to the physician’s ability to precisely visualize and
reach the anatomical target.
Nowadays, PRA is always guided with medical imaging assistance, most frequently using X-ray
based imaging (e.g. fluoroscopy). Thus, radiation on the surgical theater represents a major risk to
the medical team, where its exclusion from PRA has a direct impact diminishing the dose exposure
on both patients and physicians.
To solve the referred problems this thesis aims to develop a new hardware/software framework
to intuitively and safely guide the surgeon during PRA planning and puncturing.
In terms of surgical planning, a set of methodologies were developed to increase the certainty of
reaching a specific target inside the kidney. The most relevant abdominal structures for PRA were
automatically clustered into different 3D volumes. For that, primitive volumes were merged as a local
optimization problem using the minimum description length principle and image statistical
properties. A multi-volume Ray Cast method was then used to highlight each segmented volume.
Results show that it is possible to detect all abdominal structures surrounding the kidney, with the
ability to correctly estimate a virtual trajectory.
Concerning the percutaneous puncturing stage, either an electromagnetic or optical solution
were developed and tested in multiple in vitro, in vivo and ex vivo trials. The optical tracking solution
aids in establishing the desired puncture site and choosing the best virtual puncture trajectory.
However, this system required a line of sight to different optical markers placed at the needle base,
limiting the accuracy when tracking inside the human body. Results show that the needle tip can
deflect from its initial straight line trajectory with an error higher than 3 mm. Moreover, a complex
registration procedure and initial setup is needed.
On the other hand, a real-time electromagnetic tracking was developed. Hereto, a catheter
was inserted trans-urethrally towards the renal target. This catheter has a position and orientation
electromagnetic sensor on its tip that function as a real-time target locator. Then, a needle integrating a similar sensor is used. From the data provided by both sensors, one computes a virtual puncture
trajectory, which is displayed in a 3D visualization software. In vivo tests showed a median renal and
ureteral puncture times of 19 and 51 seconds, respectively (range 14 to 45 and 45 to 67 seconds).
Such results represent a puncture time improvement between 75% and 85% when comparing to
state of the art methods.
3D sound and vibrotactile feedback were also developed to provide additional information about
the needle orientation. By using these kind of feedback, it was verified that the surgeon tends to
follow a virtual puncture trajectory with a reduced amount of deviations from the ideal trajectory,
being able to anticipate any movement even without looking to a monitor. Best results show that 3D
sound sources were correctly identified 79.2 ± 8.1% of times with an average angulation error of
10.4º degrees. Vibration sources were accurately identified 91.1 ± 3.6% of times with an average
angulation error of 8.0º degrees.
Additionally to the EMT framework, three circular ultrasound transducers were built with a needle
working channel. One explored different manufacture fabrication setups in terms of the piezoelectric
materials, transducer construction, single vs. multi array configurations, backing and matching
material design. The A-scan signals retrieved from each transducer were filtered and processed to
automatically detect reflected echoes and to alert the surgeon when undesirable anatomical
structures are in between the puncture path. The transducers were mapped in a water tank and
tested in a study involving 45 phantoms. Results showed that the beam cross-sectional area
oscillates around the ceramics radius and it was possible to automatically detect echo signals in
phantoms with length higher than 80 mm.
Hereupon, it is expected that the introduction of the proposed system on the PRA procedure,
will allow to guide the surgeon through the optimal path towards the precise kidney target, increasing
surgeon’s confidence and reducing complications (e.g. organ perforation) during PRA. Moreover, the
developed framework has the potential to make the PRA free of radiation for both patient and surgeon
and to broad the use of PRA to less specialized surgeons.Intervenções renais minimamente invasivas são realizadas diariamente para o tratamento e
diagnóstico de várias doenças renais. O acesso renal percutâneo (ARP) é uma etapa essencial e
desafiante na maior parte destes procedimentos. O seu resultado encontra-se diretamente
relacionado com a capacidade do cirurgião visualizar e atingir com precisão o alvo anatómico.
Hoje em dia, o ARP é sempre guiado com recurso a sistemas imagiológicos, na maior parte
das vezes baseados em raios-X (p.e. a fluoroscopia). A radiação destes sistemas nas salas cirúrgicas
representa um grande risco para a equipa médica, aonde a sua remoção levará a um impacto direto
na diminuição da dose exposta aos pacientes e cirurgiões.
De modo a resolver os problemas existentes, esta tese tem como objetivo o desenvolvimento
de uma framework de hardware/software que permita, de forma intuitiva e segura, guiar o cirurgião
durante o planeamento e punção do ARP.
Em termos de planeamento, foi desenvolvido um conjunto de metodologias de modo a
aumentar a eficácia com que o alvo anatómico é alcançado. As estruturas abdominais mais
relevantes para o procedimento de ARP, foram automaticamente agrupadas em volumes 3D, através
de um problema de optimização global com base no princípio de “minimum description length” e
propriedades estatísticas da imagem. Por fim, um procedimento de Ray Cast, com múltiplas funções
de transferência, foi utilizado para enfatizar as estruturas segmentadas. Os resultados mostram que
é possível detetar todas as estruturas abdominais envolventes ao rim, com a capacidade para
estimar corretamente uma trajetória virtual.
No que diz respeito à fase de punção percutânea, foram testadas duas soluções de deteção
de movimento (ótica e eletromagnética) em múltiplos ensaios in vitro, in vivo e ex vivo. A solução
baseada em sensores óticos ajudou no cálculo do melhor ponto de punção e na definição da melhor
trajetória a seguir. Contudo, este sistema necessita de uma linha de visão com diferentes
marcadores óticos acoplados à base da agulha, limitando a precisão com que a agulha é detetada
no interior do corpo humano. Os resultados indicam que a agulha pode sofrer deflexões à medida
que vai sendo inserida, com erros superiores a 3 mm.
Por outro lado, foi desenvolvida e testada uma solução com base em sensores
eletromagnéticos. Para tal, um cateter que integra um sensor de posição e orientação na sua ponta, foi colocado por via trans-uretral junto do alvo renal. De seguida, uma agulha, integrando um sensor
semelhante, é utilizada para a punção percutânea. A partir da diferença espacial de ambos os
sensores, é possível gerar uma trajetória de punção virtual. A mediana do tempo necessário para
puncionar o rim e ureter, segundo esta trajetória, foi de 19 e 51 segundos, respetivamente
(variações de 14 a 45 e 45 a 67 segundos). Estes resultados representam uma melhoria do tempo
de punção entre 75% e 85%, quando comparados com o estado da arte dos métodos atuais.
Além do feedback visual, som 3D e feedback vibratório foram explorados de modo a fornecer
informações complementares da posição da agulha. Verificou-se que com este tipo de feedback, o
cirurgião tende a seguir uma trajetória de punção com desvios mínimos, sendo igualmente capaz
de antecipar qualquer movimento, mesmo sem olhar para o monitor. Fontes de som e vibração
podem ser corretamente detetadas em 79,2 ± 8,1% e 91,1 ± 3,6%, com erros médios de angulação
de 10.4º e 8.0 graus, respetivamente.
Adicionalmente ao sistema de navegação, foram também produzidos três transdutores de
ultrassom circulares com um canal de trabalho para a agulha. Para tal, foram exploradas diferentes
configurações de fabricação em termos de materiais piezoelétricos, transdutores multi-array ou
singulares e espessura/material de layers de suporte. Os sinais originados em cada transdutor
foram filtrados e processados de modo a detetar de forma automática os ecos refletidos, e assim,
alertar o cirurgião quando existem variações anatómicas ao longo do caminho de punção. Os
transdutores foram mapeados num tanque de água e testados em 45 phantoms. Os resultados
mostraram que o feixe de área em corte transversal oscila em torno do raio de cerâmica, e que os
ecos refletidos são detetados em phantoms com comprimentos superiores a 80 mm.
Desta forma, é expectável que a introdução deste novo sistema a nível do ARP permitirá
conduzir o cirurgião ao longo do caminho de punção ideal, aumentado a confiança do cirurgião e
reduzindo possíveis complicações (p.e. a perfuração dos órgãos). Além disso, de realçar que este
sistema apresenta o potencial de tornar o ARP livre de radiação e alarga-lo a cirurgiões menos
especializados.The present work was only possible thanks to the support by the Portuguese Science and
Technology Foundation through the PhD grant with reference SFRH/BD/74276/2010 funded by
FCT/MEC (PIDDAC) and by Fundo Europeu de Desenvolvimento Regional (FEDER), Programa
COMPETE - Programa Operacional Factores de Competitividade (POFC) do QREN
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences
Speaking rate refers to the average number of phonemes within some unit time,
while the rhythmic patterns refer to duration distributions for realizations of
different phonemes within different phonetic structures. Both are key
components of prosody in speech, which is different for different speakers.
Models like cycle-consistent adversarial network (Cycle-GAN) and variational
auto-encoder (VAE) have been successfully applied to voice conversion tasks
without parallel data. However, due to the neural network architectures and
feature vectors chosen for these approaches, the length of the predicted
utterance has to be fixed to that of the input utterance, which limits the
flexibility in mimicking the speaking rates and rhythmic patterns for the
target speaker. On the other hand, sequence-to-sequence learning model was used
to remove the above length constraint, but parallel training data are needed.
In this paper, we propose an approach utilizing sequence-to-sequence model
trained with unsupervised Cycle-GAN to perform the transformation between the
phoneme posteriorgram sequences for different speakers. In this way, the length
constraint mentioned above is removed to offer rhythm-flexible voice conversion
without requiring parallel data. Preliminary evaluation on two datasets showed
very encouraging results.Comment: 8 pages, 6 figures, Submitted to SLT 201
DocMIR: An automatic document-based indexing system for meeting retrieval
This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter's computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio-visual data and metadata created during post-production. The identification is based on documents' signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality system
Kernel Spectral Clustering and applications
In this chapter we review the main literature related to kernel spectral
clustering (KSC), an approach to clustering cast within a kernel-based
optimization setting. KSC represents a least-squares support vector machine
based formulation of spectral clustering described by a weighted kernel PCA
objective. Just as in the classifier case, the binary clustering model is
expressed by a hyperplane in a high dimensional space induced by a kernel. In
addition, the multi-way clustering can be obtained by combining a set of binary
decision functions via an Error Correcting Output Codes (ECOC) encoding scheme.
Because of its model-based nature, the KSC method encompasses three main steps:
training, validation, testing. In the validation stage model selection is
performed to obtain tuning parameters, like the number of clusters present in
the data. This is a major advantage compared to classical spectral clustering
where the determination of the clustering parameters is unclear and relies on
heuristics. Once a KSC model is trained on a small subset of the entire data,
it is able to generalize well to unseen test points. Beyond the basic
formulation, sparse KSC algorithms based on the Incomplete Cholesky
Decomposition (ICD) and , , Group Lasso regularization are
reviewed. In that respect, we show how it is possible to handle large scale
data. Also, two possible ways to perform hierarchical clustering and a soft
clustering method are presented. Finally, real-world applications such as image
segmentation, power load time-series clustering, document clustering and big
data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms
Efficient material characterization by means of the Doppler effect in microwaves
Subject of this thesis is the efficient material characterization and defects detection by means of the Doppler effect with microwaves. The first main goal of the work is to develop a prototype of a microwave Doppler system for Non-Destructive Testing (NDT) purposes. Therefore it is necessary that the Doppler system satisfies the following requirements: non-expensive, easily integrated into industrial process, allows fast measurements. The Doppler system needs to include software for hardware control, measurements, and fast signal processing. The second main goal of the thesis is to establish and experimentally confirm possible practical applications of the Doppler system. The Doppler system consists of the following parts. The hardware part is designed in a way to ensure fast measurement and easy adjustment to the different radar types. The software part of the system contains tools for: hardware control, data acquisition, signal processing and representing data to the user. In this work firstly a new type of 2D Doppler amplitude imaging was developed and formalized. Such a technique is used to derive information about the measured object from several angles of view. In the thesis special attention was paid to the frequency analysis of the mea- sured signals as a means to improve spatial resolution of the radar. In the context of frequency analysis we present 2D Doppler frequency imaging and compare it with amplitude imaging. In the thesis the spatial resolution ability of CW radars is examined and im- proved. We show that the joint frequency and the amplitude signal processing allows to significantly increase the spatial resolution of the radar.Das Thema dieser Dissertation ist die effiziente Materialcharakterisierung und Fehlerdetektion durch Nutzung des Dopplereffektes mittels Mikrowellen. Das erste Hauptziel der Arbeit ist die Entwicklung eines Prototyps eines Mikrowellen-Doppler-Systems im Bereich der zerstörungsfreien Prüfung. Das Doppler-System muss folgenden Voraussetzungen erfüllen: es sollte preisgünstig sein, leicht in industrielle Prozesse integrierbar sein und schnelle Messungen erlauben. Das Doppler-System muss die Software für die Hardware-Kontrolle, den Messablauf und die schnelle Signalverarbeitung beinhalten. Das zweite Hauptziel der Dissertation ist es, mögliche praktische Anwendungsfelder des Doppler-Systems zu identifizieren und experimentell zu bearbeiten. Das Doppler-System besteht aus zwei Teilen. Der Hardware-Teil ist so konstruiert, dass er schnelle Messungen und leichte Anpassungen an verschiedene Sensor- und Radartypen zulässt. Der Software-Teil des Systems beinhaltet Werkzeuge für: Hardware-Kontrolle, Datenerfassung, Signalverarbeitung und Programme, um die Daten für den Benutzer zu präsentieren. In dieser Arbeit wurde zuerst ein neuer Typ der 2D-Doppler-Amplitudenbildgebung entwickelt und formalisiert. Dieser Technik wird dafür benutzt, Informationen über die gemessenen Objekte von verschiedenen Blickpunkten aus zu erhalten. In dieser Doktorarbeit wird der Frequenzanalyse der gemessenen Signale besondere Aufmerksamkeit geschenkt, um die Ortsauflösung des Radars zu verbessern. Im Kontext der Frequenzanalyse wird die 2D-Doppler-Frequenzbildgebung präsentiert und mit der Amplitudenbildgebung vergleichen. In dieser Dissertation werden die räumliche Auflösungsmöglichkeiten von CW-Radaren untersucht und verbessert. Es wird gezeigt, dass es die Frequenz- und Amplitudensignalverarbeitung erlaubt, die Ortsauflösung des Radars erheblich zu erhöhen
- …