    Application for video analysis based on machine learning and computer vision algorithms

    An application for video data analysis based on computer vision methods is presented. The proposed system consists of five consecutive stages: face detection, face tracking, gender recognition, age classification and statistics analysis. AdaBoost classifier is utilized for face detection. A modification of Lucas and Kanade algorithm is introduced on the stage of tracking. Novel gender and age classifiers based on adaptive features and support vector machines are proposed. All the stages are united into a single system of audience analysis. The proposed software complex can find its applications in different areas, from digital signage and video surveillance to automatic systems of accident prevention and intelligent human-computer interfaces


    Questa tesi esplora l'integrazione dell'intelligenza artificiale (IA) in Otorinolaringoiatria – Chirurgia di Testa e Collo, concentrandosi sui progressi della computer vision per l’endoscopia e le procedure chirurgiche. La ricerca inizia con una revisione completa dello stato dell’arte dell'IA e della computer vision in questo campo, identificando aree per ulteriori sviluppi. L'obiettivo principale è stato quello di sviluppare un sistema di computer vision per l'analisi di immagini e video endoscopici. La ricerca ha coinvolto la progettazione di strumenti per la rilevazione e segmentazione di neoplasie nelle vie aerodigestive superiori (VADS) e la valutazione della motilità delle corde vocali, cruciale nella stadiazione del carcinoma laringeo. Inoltre, lo studio si è focalizzato sul potenziale dei foundation vision models, vision transformers basati su self-supervised learning, per ridurre la necessità di annotazione da parte di esperti, approccio particolarmente vantaggioso in campi con dati limitati. Inoltre, la ricerca ha incluso lo sviluppo di un'applicazione web per migliorare e velocizzare il processo di annotazione in endoscopia delle VADS, nell’ambito generale delle tecniche di MLOps. La tesi copre varie fasi della ricerca, a partire dalla definizione del quadro concettuale e della metodologia, denominata "Videomics". Include una revisione della letteratura sull'IA in endoscopia clinica, focalizzata sulla Narrow Band Imaging (NBI) e sulle reti neurali convoluzionali (CNN). Lo studio progredisce attraverso diverse fasi, dalla valutazione della qualità delle immagini endoscopiche alla caratterizzazione approfondita delle lesioni neoplastiche. Si affronta anche la necessità di standard nel reporting degli studi di computer vision in ambito medico e si valuta l'applicazione dell'IA in setting dinamici come la motilità delle corde vocali. Una parte significativa della ricerca indaga l'uso di algoritmi di computer vision generalizzati (“foundation models”) e la “commoditization” degli algoritmi di machine learning, utilizzando polipi nasali e il carcinoma orofaringeo come casi studio. Infine, la tesi discute lo sviluppo di ENDO-CLOUD, un sistema basato su cloud per l’analisi della videolaringoscopia, evidenziando le sfide e le soluzioni nella gestione dei dati e l’utilizzo su larga scala di modelli di IA nell'imaging medico.This thesis explores the integration of artificial intelligence (AI) in Otolaryngology – Head and Neck Surgery, focusing on advancements in computer vision for endoscopy and surgical procedures. It begins with a comprehensive review of AI and computer vision advancements in this field, identifying areas for further exploration. The primary aim was to develop a computer vision system for endoscopy analysis. The research involved designing tools for detecting and segmenting neoplasms in the upper aerodigestive tract (UADT) and assessing vocal fold motility, crucial in laryngeal cancer staging. Further, the study delves into the potential of vision foundation models, like vision transformers trained via self-supervision, to reduce the need for expert annotations, particularly beneficial in fields with limited cases. Additionally, the research includes the development of a web application for enhancing and speeding up the annotation process in UADT endoscopy, under the umbrella of Machine Learning Operations (MLOps). The thesis covers various phases of research, starting with defining the conceptual framework and methodology, termed "Videomics". It includes a literature review on AI in clinical endoscopy, focusing on Narrow Band Imaging (NBI) and convolutional neural networks (CNNs). The research progresses through different stages, from quality assessment of endoscopic images to in-depth characterization of neoplastic lesions. It also addresses the need for standards in medical computer vision study reporting and evaluates the application of AI in dynamic vision scenarios like vocal fold motility. A significant part of the research investigates the use of "general purpose" vision algorithms and the commoditization of machine learning algorithms, using nasal polyps and oropharyngeal cancer as case studies. Finally, the thesis discusses the development of ENDO-CLOUD, a cloud-based system for videolaryngoscopy, highlighting the challenges and solutions in data management and the large-scale deployment of AI models in medical imaging

    Fourteenth Biennial Status Report: März 2017 - February 2019

    Learning to Transform Time Series with a Few Examples

    We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account
