7 research outputs found
SurgMAE: Masked Autoencoders for Long Surgical Video Analysis
There has been a growing interest in using deep learning models for
processing long surgical videos, in order to automatically detect
clinical/operational activities and extract metrics that can enable workflow
efficiency tools and applications. However, training such models require vast
amounts of labeled data which is costly and not scalable. Recently,
self-supervised learning has been explored in computer vision community to
reduce the burden of the annotation cost. Masked autoencoders (MAE) got the
attention in self-supervised paradigm for Vision Transformers (ViTs) by
predicting the randomly masked regions given the visible patches of an image or
a video clip, and have shown superior performance on benchmark datasets.
However, the application of MAE in surgical data remains unexplored. In this
paper, we first investigate whether MAE can learn transferrable representations
in surgical video domain. We propose SurgMAE, which is a novel architecture
with a masking strategy based on sampling high spatio-temporal tokens for MAE.
We provide an empirical study of SurgMAE on two large scale long surgical video
datasets, and find that our method outperforms several baselines in low data
regime. We conduct extensive ablation studies to show the efficacy of our
approach and also demonstrate it's superior performance on UCF-101 to prove
it's generalizability in non-surgical datasets as well
Deployment of Image Analysis Algorithms under Prevalence Shifts
Domain gaps are among the most relevant roadblocks in the clinical
translation of machine learning (ML)-based solutions for medical image
analysis. While current research focuses on new training paradigms and network
architectures, little attention is given to the specific effect of prevalence
shifts on an algorithm deployed in practice. Such discrepancies between class
frequencies in the data used for a method's development/validation and that in
its deployment environment(s) are of great importance, for example in the
context of artificial intelligence (AI) democratization, as disease prevalences
may vary widely across time and location. Our contribution is twofold. First,
we empirically demonstrate the potentially severe consequences of missing
prevalence handling by analyzing (i) the extent of miscalibration, (ii) the
deviation of the decision threshold from the optimum, and (iii) the ability of
validation metrics to reflect neural network performance on the deployment
population as a function of the discrepancy between development and deployment
prevalence. Second, we propose a workflow for prevalence-aware image
classification that uses estimated deployment prevalences to adjust a trained
classifier to a new environment, without requiring additional annotated
deployment data. Comprehensive experiments based on a diverse set of 30 medical
classification tasks showcase the benefit of the proposed workflow in
generating better classifier decisions and more reliable performance estimates
compared to current practice
Minimum Relevant Features to Obtain Explainable Systems for Predicting Cardiovascular Disease Using the Statlog Data Set
Learning systems have been focused on creating models capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in the interpretation and explanation of the results. The need for interpretation is greater when these models are used to support decision making. In some areas, this becomes an indispensable requirement, such as in medicine. The goal of this study was to define a simple process to construct a system that could be easily interpreted based on two principles: (1) reduction of attributes without degrading the performance of the prediction systems and (2) selecting a technique to interpret the final prediction system. To describe this process, we selected a problem, predicting cardiovascular disease, by analyzing the well-known Statlog (Heart) data set from the University of California’s Automated Learning Repository. We analyzed the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. We performed an analysis on a large set of classification techniques and performance metrics, demonstrating that it is possible to construct explainable and reliable models that provide high quality predictive performance
Enhancing Video Recommendation Using Multimedia Content
Video recordings are complex media types. When we watch a movie, we can effortlessly register a lot of details conveyed to us (by the author) through different multimedia channels, in particular, the audio and visual modalities. To date, majority of movie recommender systems use collaborative filtering (CF) models or content-based filtering (CBF) relying on metadata (e.g., editorial such as genre or wisdom of the crowd such as user-generated tags) at their core since they are human-generated and are assumed to cover the 'content semantics' of movies by a great degree. The information obtained from multimedia content and learning from muli-modal sources (e.g., audio, visual and metadata) on the other hand, offers the possibility of uncovering relationships between modalities and obtaining an in-depth understanding of natural phenomena occurring in a video. These discerning characteristics of heterogeneous feature sets meet users' differing information needs. In the context of this Ph.D. thesis [9], which is briefly summarized in the current extended abstract, approaches to automated extraction of multimedia information from videos and their integration with video recommender systems have been elaborated, implemented, and analyzed. Variety of tasks related to movie recommendation using multimedia content have been studied. The results of this thesis can motivate the fact that recommender system research can benefit from knowledge in multimedia signal processing and machine learning established over the last decades for solving various recommendation tasks
Improving cataract surgery procedure using machine learning and thick data analysis
Cataract surgery is one of the most frequent and safe Surgical operations
are done globally, with approximately 16 million surgeries conducted each
year. The entire operation is carried out under microscopical supervision.
Even though ophthalmic surgeries are similar in some ways to endoscopic
surgeries, the way they are set up is very different. Endoscopic surgery operations were shown on a big screen so that a trainee surgeon could see them.
Cataract surgery, on the other hand, was done under a microscope so that
only the operating surgeon and one more trainee could see them through
additional oculars. Since surgery video is recorded for future reference, the
trainee surgeon watches the full video again for learning purposes. My proposed framework could be helpful for trainee surgeons to better understand
the cataract surgery workflow. The framework is made up of three assistive
parts: figuring out how serious cataract surgery is; if surgery is needed, what
phases are needed to be done to perform surgery; and what are the problems that could happen during the surgery. In this framework, three training
models has been used with different datasets to answer all these questions.
The training models include models that help to learn technical skills as well
as thick data heuristics to provide non-technical training skills. For video
analysis, big data and deep learning are used in many studies of cataract
surgery. Deep learning requires lots of data to train a model, while thick
data requires a small amount of data to find a result. We have used thick
data and expert heuristics to develop our proposed framework.Thick data
analysis reduced the use of lots of data and also allowed us to understand
the qualitative nature of data in order to shape a proposed cataract surgery
workflow framework
Toward efficient indexing structure for scalable content-based music retrieval
Pretendemos problematizar arte e loucura, inicialmente discutindo a experiência do pesquisador em relação às imagens do mundo, com o testemunho e a figura do louco e, consequentemente, com o fora que ela evoca. Em seguida nos colocamos diante do muro, situação-limite na qual a loucura enquanto catástrofe e a arte enquanto via poética vêm compor um limiar, ausência que Blanchot transpõe à linguagem para dar a ver outras constelações possíveis, tanto de palavras quanto de seus inomináveis. Por fim, com Walter Benjamin, pomos a história da loucura a contrapelo, e, mergulhados no Ateliê de Escrita do Hospital Psiquiátrico São Pedro, desvelamos que a arte pode, na relação com a loucura, tornar-se a linguagem essencial na perigosa travessia em direção à experiência, transpondo a vivência desse estado assustador para trazer ao mundo outro sentido, reconhecendo outros modos de existência que podem vir a ser outras poéticas de vida.We intend to problematize art and madness. We begin by discussing the experience of the researcher in relation to images of the world, to witnessing and to the image of the insane, and then inevitably to the outside they evoke. Subsequently, we stand before a wall, a limit situation in which madness as catastrophe and art as poetics compose a threshold, an absence which Blanchot transposes to language to bring other possible constellations into view, both as words and as their unnamable others. Finally, with Walter Benjamin, we touch upon the grain of the history of madness – immersed in the Writing Workshop at the São Pedro Psychiatric Hospital, in Porto Alegre, Brazil, we reveal that, in relation to madness, art can become the essential language of the perilous passage towards experience, transposing the experience of this horrific state to bring another sense to the world, recognizing other modes of existence which may come to be other poetics of life.Nous désirons problématiser l’art et la folie, initialement en discutant l’expérience du chercheur par rapport aux images du monde, avec le témoignage et l’image du fou, et, par conséquent, l’extérieur qu’elle évoque. Puis, on se pose devant le mur, situation extrême dans laquelle la folie comme catastrophe et l’art comme voie poétique composent un seuil viennent à construire un seuil, absence que Blanchot transpose en langage afin de révéler d’autres constellations possibles tant comme des mots, tant comme ses innombrables. Enfin, avec Walter Benjamin, nous prenons l’histoire de la folie à contre-poil, et plongés dans l’Atelier d’écriture de l’Hôpital psychiatrique de São Pedro, à Porto Alegre au Brésil, nous révélons que l’art, par rapport à la folie, peut devenir le langage essentiel de la traversée dangereuse vers l’expérience, en transposant le vécu de cet état terrifiant, afin de donner un autre sens au monde, tout en reconnaissant d’autres modes d’existence qui pourraient devenir d’autres poétiques de vie.Nuestra intención es de problematizar el arte y la locura, inicialmente discutiendo la experiencia del investigador en relación con las imágenes del mundo, el testimonio y la figura del loco, y por lo tanto con el afuera que ella evoca. Seguidamente, nos ponemos delante de un muro, una situación extrema en la que la locura como catástrofe y el arte como vía poética componen un umbral, una ausencia que Blanchot transpone en lenguaje para revelar las otras constelaciones posibles tanto como palabras, tanto como innombrables otros. Por último, con Walter Benjamin, ponemos la historia de la locura a contra pelo, y sumergidos en el Taller de escritura del Hospital Psiquiátrico São Pedro de Porto Alegre, Brasil, desvelamos que, en relación con la locura, el arte puede convertirse en el lenguaje esencial de ese peligroso pasaje que nos conduce a la experiencia, que transpone lo vivido en este estado aterrador para dar otro sentido al mundo, reconociendo otros modos de existencia que pueden llegar a ser otras poéticas de vida