10 research outputs found
COVID-Net CT-2: Enhanced Deep Neural Networks for Detection of COVID-19 from Chest CT Images Through Bigger, More Diverse Learning
The COVID-19 pandemic continues to rage on, with multiple waves causing
substantial harm to health and economies around the world. Motivated by the use
of CT imaging at clinical institutes around the world as an effective
complementary screening method to RT-PCR testing, we introduced COVID-Net CT, a
neural network tailored for detection of COVID-19 cases from chest CT images as
part of the open source COVID-Net initiative. However, one potential limiting
factor is restricted quantity and diversity given the single nation patient
cohort used. In this study, we introduce COVID-Net CT-2, enhanced deep neural
networks for COVID-19 detection from chest CT images trained on the largest
quantity and diversity of multinational patient cases in research literature.
We introduce two new CT benchmark datasets, the largest comprising a
multinational cohort of 4,501 patients from at least 15 countries. We leverage
explainability to investigate the decision-making behaviour of COVID-Net CT-2,
with the results for select cases reviewed and reported on by two
board-certified radiologists with over 10 and 30 years of experience,
respectively. The COVID-Net CT-2 neural networks achieved accuracy, COVID-19
sensitivity, PPV, specificity, and NPV of 98.1%/96.2%/96.7%/99%/98.8% and
97.9%/95.7%/96.4%/98.9%/98.7%, respectively. Explainability-driven performance
validation shows that COVID-Net CT-2's decision-making behaviour is consistent
with radiologist interpretation by leveraging correct, clinically relevant
critical factors. The results are promising and suggest the strong potential of
deep neural networks as an effective tool for computer-aided COVID-19
assessment. While not a production-ready solution, we hope the open-source,
open-access release of COVID-Net CT-2 and benchmark datasets will continue to
enable researchers, clinicians, and citizen data scientists alike to build upon
them.Comment: 15 page
Identificação de objetos para veículos autónomos com base em aprendizagem automática
Autonomous driving is one of the most actively researched fields in artificial
intelligence. The autonomous vehicles are expected to significantly reduce
the road accidents and casualties one day when they become sufficiently
mature transport option. Currently much effort is focused to prove the
concept of autonomous vehicles that is based on a suit of sensors to observe
their surroundings. In particular, camera and LiDAR are researched as an
efficient combination of sensors for on-line object identification on the road.
2D object identification is an already established field in Computer Vision.
The successful application of Deep Learning techniques has led to 2D vision
with Human-level accuracy. However, for a matter of improved safety more
advanced approaches suggest that the vehicle should not rely on a single
class of sensors. LiDAR has been proposed as an additional sensor, particularly
due to its 3D vision capability. 3D vision relies on LiDAR captured data
to recognize objects in 3D. However, in contrast to the 2D object identifi- cation, 3D object detection is a relatively immature field and still has many
challenges to overcome. In addition, LiDARs are expensive sensors, which
makes the acquisition of data required for training 3D object recognition
techniques expensive tasks as well.
In this context, this Master's thesis has the major goal to further facilitate
the 3D object identification for autonomous vehicles based on Deep Learning
(DL). The specific contributions of the present work are the following.
First, a comprehensive overview of the state of the art Deep Learning architectures
for 3D object identification based on Point Clouds. The purpose
of this overview is to understand how to better approach such a problem in
the context of autonomous driving.
Second, synthetic but realistic Lidar captured data was generated in the
GTA V virtual environment. Tools were developed to convert the generated
data into the KITTI dataset format, which has become standard in 3D
object detection techniques for autonomous driving.
Third, some of the overviewed 3D object identification DL architectures
were evaluated with the generated data. Though their performance with
the generated data was worse than with the original KITTI data, the models
were still able to correctly process the synthetic data without being retrained.
The future benefit of this work is that the models can be further
trained with home-made data and varying testing scenarios.
The implemented GTA V mod has proved to be capable of providing rich,
well-structured and compatible datasets with the state of the art 3D object
identification architectures.
The developed tool is publicly available and we hope it will be useful in
advancing 3D object identification for autonomous driving, as it removes
the dependency from datasets provided by a third party.Condução autónoma é uma das áreas mais ativamente estudadas em inteligência artificial.
É esperado que os veículos autónomos reduzam significativamente os acidentes rodoviários e vitimas mortais quando se tornarem
suficientemente maturos como opção de transporte. Atualmente, muitos
dos esforços estão focados na prova de conceito de veículos autónomos
serem baseados num conjunto de sensores que observam o ambiente em
redor. Em particular, a camara e o LiDAR são estudados como sendo uma
combinação eficiente de sensores para realização de identificação de objectos
on-line nas estradas.
Identificação de objetos 2D é uma área de estudo já estabelecida no campo
de Computação Visual. O sucesso na aplicação de técnicas de Deep Learning
levou a que a visão 2D atingisse uma precisão ao nível Humano. No
entanto, de forma a melhorar a segurança, abordagens mais avançadas sugerem
que o veículo não deve depender de uma única classe de sensores. O
LiDAR foi proposto como sendo um sensor adicional, particularmente devido
à sua capacidade de visão 3D. Visão 3D depende dos dados capturados
pelo LiDAR para reconhecer objetos em 3D. No entanto, em contraste com
a identificação de objetos 2D, a identificação de objetos 3D é um campo
de estudos relativamente imaturo e ainda possui muitos desafios para ultrapassar.
Adicionalmente, LiDARs são sensores dispendiosos, o que também
torna a aquisição de dados necessários para o treino de técnicas de reconhecimento
de objetos 3D mais cara.
Neste contexto, esta tese de Mestrado tem como objetivo principal facilitar a
identificação de objetos 3D, baseada em Deep Learning (DL), para veículos
autónomos. As contribuições especificas deste trabalho são as seguintes.
Primeiro, uma visão global compreensiva do estado de arte relativo _as arquiteturas
Deep Learning para identificação de objetos 3D baseadas em
point clouds. O propósito desta visão global é para perceber como melhor
abordar este tipo de problema no contexto de condução autónoma.
Segundo, foi gerado um dataset sintético, mas realista, com dados capturados
por um LiDAR no ambiente virtual do GTA V. Foram desenvolvidas
ferramentas para converter os dados gerados no formato do dataset do
KITTI, que se tornou num standard para avaliação de técnicas de deteção
de objetos 3D para condução autónoma.
Terceiro, algumas das arquiteturas DL de identificação de objetos 3D revistas
foram avaliadas com o dataset gerado. Apesar da sua performance
com o dataset gerado ter sido pior que os resultados no dataset original do
KITTI, os models chegaram a conseguir processar corretamente os dados
sintéticos sem serem retreinados. O benefício futuro deste trabalho consiste
nos modelos poderem ser adicionalmente treinados com dados produzidos
localmente e testados em cenários variados.
O mod do GTA V implementado provou ser capaz de fornecer datasets ricos,
bem estruturados e compatíveis com o estado de arte em arquiteturas de
identificação de objetos 3D.
A ferramenta desenvolvida está disponível publicamente e esperamos que
seja útil para o avanço da identificação de objetos 3D para condução
autónoma, já que remove a dependência de datasets fornecidos por terceiros.Mestrado em Engenharia de Computadores e Telemátic
Investigating Scene Understanding for Robotic Grasping: From Pose Estimation to Explainable AI
In the rapidly evolving field of robotics, the ability to accurately grasp and manipulate objects—known as robotic grasping—is a cornerstone of autonomous operation. This capability is pivotal across a multitude of applications, from industrial manufacturing automation to supply chain management, and is a key determinant of a robot's ability to interact effectively with its environment. Central to this capability is the concept of scene understanding, a complex task that involves interpreting the robot's environment to facilitate decision-making and action planning. This thesis presents a comprehensive exploration of scene understanding for robotic grasping, with a particular emphasis on pose estimation, a critical aspect of scene understanding.
Pose estimation, the process of determining the position and orientation of objects within the robot's environment, is a crucial component of robotic grasping. It provides the robot with the necessary spatial information about the objects in the scene, enabling it to plan and execute grasping actions effectively. However, many current pose estimation methods provide relative pose compared to a 3D model, which lacks descriptiveness without referencing the 3D model. This thesis explores the use of keypoints and superquadrics as more general and descriptive representations of an object's pose. These novel approaches address the limitations of traditional methods and significantly enhance the generalizability and descriptiveness of pose estimation, thereby improving the overall effectiveness of robotic grasping.
In addition to pose estimation, this thesis briefly touches upon the importance of uncertainty estimation and explainable AI in the context of robotic grasping. It introduces the concept of multimodal consistency for uncertainty estimation, providing a reliable measure of uncertainty that can enhance decision-making in human-in-the-loop situations. Furthermore, it explores the realm of explainable AI, presenting a method for gaining deeper insights into deep learning models, thereby enhancing their transparency and interpretability.
In summary, this thesis presents a comprehensive approach to scene understanding for robotic grasping, with a particular emphasis on pose estimation. It addresses key challenges and advances the state of the art in this critical area of robotics research. The research is structured around five published papers, each contributing to a unique aspect of the overall study