71 research outputs found

    InceptionCaps: A Performant Glaucoma Classification Model for Data-scarce Environment

    Full text link
    Glaucoma is an irreversible ocular disease and is the second leading cause of visual disability worldwide. Slow vision loss and the asymptomatic nature of the disease make its diagnosis challenging. Early detection is crucial for preventing irreversible blindness. Ophthalmologists primarily use retinal fundus images as a non-invasive screening method. Convolutional neural networks (CNN) have demonstrated high accuracy in the classification of medical images. Nevertheless, CNN's translation-invariant nature and inability to handle the part-whole relationship between objects make its direct application unsuitable for glaucomatous fundus image classification, as it requires a large number of labelled images for training. This work reviews existing state of the art models and proposes InceptionCaps, a novel capsule network (CapsNet) based deep learning model having pre-trained InceptionV3 as its convolution base, for automatic glaucoma classification. InceptionCaps achieved an accuracy of 0.956, specificity of 0.96, and AUC of 0.9556, which surpasses several state-of-the-art deep learning model performances on the RIM-ONE v2 dataset. The obtained result demonstrates the robustness of the proposed deep learning model.Comment: 8 page

    Climate variability and human livelihoods in western India: 1780-1860

    Get PDF
    This thesis presents a unique exploration of societal vulnerability to climate variability through an analysis of the historical climatology of colonial western India between 1780 and 1860. It utilises a range of historical documentary sources, most notably English language newspapers alongside materials written by officials of the British East India Company and British and American missionaries. Information from these sources is used to reconstruct past rainfall variability, with the resulting climatic chronology used as a backdrop against which to examine societal responses to climate. The study adopts a content analysis methodology to reconstruct monsoon intensity from 1780-1860. This is calibrated against the instrumental rainfall record for western India, which extends back to 1847. The reconstruction therefore represents a 67-year extension of the monsoon record for western India. The extended chronology is compared with existing reconstructions of climatic forcings related to monsoon rainfall, including the strength of the Somali jet and indices of El Nino Southern Oscillation. These suggest a stationarity in the relationship between these forcings and monsoon rainfall during and after the study period, indicating that the reconstruction methodology is robust. The analysis of societal vulnerability to climate focuses upon severe drought episodes identified through the rainfall reconstruction. Eight such episodes are identified, all occurring where drought was widespread across the study area. Of these, five drought episodes occurred after previous years of deficient monsoon rainfall. Vulnerability at the local level appears to have been driven predominantly by indebtedness and a lack of government accountability, coupled with limited markets. Institutional adaptation policy changed significantly with the shift from Maratha to British rule in 1818 through the adoption of laissez faire drought remediation. Evidence suggests that this did not affect vulnerability significantly during the duration of the study period, as the widespread acceptance of the doctrine amongst the colonial community avoided institutional inertia. However, this may have served to increase vulnerability to droughts in the later part of the nineteenth century

    시공간 주의집중을 갖는 이중 흐름 행동인식 신경망

    Get PDF
    학위논문(석사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2019. 8. 전화숙.오늘날 활발한 심층 신경망 연구와 데이터 저장 및 처리 기술 발달로 인해 이 미지 뿐만 아니라 비디오와 같은 시간 흐름을 가진 대용량 데이터에서 다양한 인식 문제를 수행하는 연구가 더욱 더 많은 관심을 받고 있다. 그 중에서도 이중 흐름 신경망은 처음으로 신경망을 통한 학습이 기존의 수작업으로 뽑은 특징보다 (hand- crafted features) 좋은 성능을 보여준 이후로, 비디오 행동 인식에서 주류 아키텍쳐로 자리잡았다. 본 논문에서는 해당 아키텍쳐를 확장하여 비디오에서 동작 인식을 위해 독립적으로 훈련된 이중 흐름 신경망에 시공간 주의집중을 주는 아키텍쳐를 제안했 다. 본 논문에서는 cross attention을 통해 기존의 독립적인 신경망에 상호 보완적인 학습으로 성능 향상을 유도했다. HMDB-51의 표준 비디오 행동인식 벤치 마크에서 본 논문의 아키텍쳐의 성능을 실험하였으며, 기존의 아키텍쳐보다 개선된 성능을 얻을 수 있었다.Two-stream architecture has been mainstream since the success of [1], but two important information is processed independently and not interacted until the late fusion. We investigate a different spatio-temporal attention architecture based on two separate recognition streams (spatial and temporal), which interact with each other by cross attention. The spatial stream performs action recognition from still video frames, whilst the temporal stream is trained to recognise action from motion in the form of dense optical flow. Both streams convey their learned knowledge to the other stream in the form of attention maps. Cross attentions allow us to exploit the availability of supplemental information and enhance learning of the streams. To demonstrate the benefits of our proposed cross-stream spatio-temporal attention architecture, it has been evaluated on two standard action recognition benchmarks where it boosts the previous performance.요 약 제 1 장 서론 제 2 장 관련 연구 2.1 행동 인식에서의 이중 흐름 신경망 2.2 행동인식에서의 주의 집중(Attention) 제 3 장 시공간 주의집중을 갖는 이중 흐름 행동인식 신경망 3.1 효과적인 주의집중 추출 3.2 행동패턴 학습과정 제 4 장 실험 4.1 데이터셋과 구현 세부사항 4.2 성능 비교 제 5 장 결론 ABSTRACTMaste

    Evaluation of fringe projection and laser scanning for 3d reconstruction of dental pieces

    Get PDF
    The rapid prototyping and copying of real 3D objects play a key role in some industries. Both applications rely on the generation of appropriated computer aided manufacturing (CAM) files. These files represent a set of coordinates of an object and can be understood by a computer numerically controlled (CNC) machine. Non-contact techniques, like laser scanning and fringe projection, are among the possibilities for obtaining such CAM files. In this work, a comparison between the two aforementioned non-contact techniques is presented. The comparison is made based on their performance as candidates for generating CAM files of objects of high reflectivity and maximum lateral dimensions of the order of 15 mm The parameters tested are the quality of the 3D reconstruction, the processing time, and the possibility of these being implemented in industrial scenarios, among others. Under the scope of these parameters, it is concluded that laser scanning offers superior performance for the kind of objects here considered. The techniques are evaluated with dental pieces in order to validate these methodologies in the rapid prototyping and copying of teeth

    Eight Biennial Report : April 2005 – March 2007

    No full text

    MGMAE: Motion Guided Masking for Video Masked Autoencoding

    Full text link
    Masked autoencoding has shown excellent performance on self-supervised video representation learning. Temporal redundancy has led to a high masking ratio and customized masking strategy in VideoMAE. In this paper, we aim to further improve the performance of video masked autoencoding by introducing a motion guided masking strategy. Our key insight is that motion is a general and unique prior in video, which should be taken into account during masked pre-training. Our motion guided masking explicitly incorporates motion information to build temporal consistent masking volume. Based on this masking volume, we can track the unmasked tokens in time and sample a set of temporal consistent cubes from videos. These temporal aligned unmasked tokens will further relieve the information leakage issue in time and encourage the MGMAE to learn more useful structure information. We implement our MGMAE with an online efficient optical flow estimator and backward masking map warping strategy. We perform experiments on the datasets of Something-Something V2 and Kinetics-400, demonstrating the superior performance of our MGMAE to the original VideoMAE. In addition, we provide the visualization analysis to illustrate that our MGMAE can sample temporal consistent cubes in a motion-adaptive manner for more effective video pre-training.Comment: ICCV 2023 camera-ready versio

    Segmentierung medizinischer Bilddaten und bildgestützte intraoperative Navigation

    Get PDF
    Die Entwicklung von Algorithmen zur automatischen oder semi-automatischen Verarbeitung von medizinischen Bilddaten hat in den letzten Jahren mehr und mehr an Bedeutung gewonnen. Das liegt zum einen an den immer besser werdenden medizinischen Aufnahmemodalitäten, die den menschlichen Körper immer feiner virtuell abbilden können. Zum anderen liegt dies an der verbesserten Computerhardware, die eine algorithmische Verarbeitung der teilweise im Gigabyte-Bereich liegenden Datenmengen in einer vernünftigen Zeit erlaubt. Das Ziel dieser Habilitationsschrift ist die Entwicklung und Evaluation von Algorithmen für die medizinische Bildverarbeitung. Insgesamt besteht die Habilitationsschrift aus einer Reihe von Publikationen, die in drei übergreifende Themenbereiche gegliedert sind: -Segmentierung medizinischer Bilddaten anhand von vorlagenbasierten Algorithmen -Experimentelle Evaluation quelloffener Segmentierungsmethoden unter medizinischen Einsatzbedingungen -Navigation zur Unterstützung intraoperativer Therapien Im Bereich Segmentierung medizinischer Bilddaten anhand von vorlagenbasierten Algorithmen wurden verschiedene graphbasierte Algorithmen in 2D und 3D entwickelt, die einen gerichteten Graphen mittels einer Vorlage aufbauen. Dazu gehört die Bildung eines Algorithmus zur Segmentierung von Wirbeln in 2D und 3D. In 2D wird eine rechteckige und in 3D eine würfelförmige Vorlage genutzt, um den Graphen aufzubauen und das Segmentierungsergebnis zu berechnen. Außerdem wird eine graphbasierte Segmentierung von Prostatadrüsen durch eine Kugelvorlage zur automatischen Bestimmung der Grenzen zwischen Prostatadrüsen und umliegenden Organen vorgestellt. Auf den vorlagenbasierten Algorithmen aufbauend, wurde ein interaktiver Segmentierungsalgorithmus, der einem Benutzer in Echtzeit das Segmentierungsergebnis anzeigt, konzipiert und implementiert. Der Algorithmus nutzt zur Segmentierung die verschiedenen Vorlagen, benötigt allerdings nur einen Saatpunkt des Benutzers. In einem weiteren Ansatz kann der Benutzer die Segmentierung interaktiv durch zusätzliche Saatpunkte verfeinern. Dadurch wird es möglich, eine semi-automatische Segmentierung auch in schwierigen Fällen zu einem zufriedenstellenden Ergebnis zu führen. Im Bereich Evaluation quelloffener Segmentierungsmethoden unter medizinischen Einsatzbedingungen wurden verschiedene frei verfügbare Segmentierungsalgorithmen anhand von Patientendaten aus der klinischen Routine getestet. Dazu gehörte die Evaluierung der semi-automatischen Segmentierung von Hirntumoren, zum Beispiel Hypophysenadenomen und Glioblastomen, mit der frei verfügbaren Open Source-Plattform 3D Slicer. Dadurch konnte gezeigt werden, wie eine rein manuelle Schicht-für-Schicht-Vermessung des Tumorvolumens in der Praxis unterstützt und beschleunigt werden kann. Weiterhin wurde die Segmentierung von Sprachbahnen in medizinischen Aufnahmen von Hirntumorpatienten auf verschiedenen Plattformen evaluiert. Im Bereich Navigation zur Unterstützung intraoperativer Therapien wurden Softwaremodule zum Begleiten von intra-operativen Eingriffen in verschiedenen Phasen einer Behandlung (Therapieplanung, Durchführung, Kontrolle) entwickelt. Dazu gehört die erstmalige Integration des OpenIGTLink-Netzwerkprotokolls in die medizinische Prototyping-Plattform MeVisLab, die anhand eines NDI-Navigationssystems evaluiert wurde. Außerdem wurde hier ebenfalls zum ersten Mal die Konzeption und Implementierung eines medizinischen Software-Prototypen zur Unterstützung der intraoperativen gynäkologischen Brachytherapie vorgestellt. Der Software-Prototyp enthielt auch ein Modul zur erweiterten Visualisierung bei der MR-gestützten interstitiellen gynäkologischen Brachytherapie, welches unter anderem die Registrierung eines gynäkologischen Brachytherapie-Instruments in einen intraoperativen Datensatz einer Patientin ermöglichte. Die einzelnen Module führten zur Vorstellung eines umfassenden bildgestützten Systems für die gynäkologische Brachytherapie in einem multimodalen Operationssaal. Dieses System deckt die prä-, intra- und postoperative Behandlungsphase bei einer interstitiellen gynäkologischen Brachytherapie ab
    corecore