1,055 research outputs found

    Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

    Full text link
    With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets

    Information extraction from multimedia web documents: an open-source platform and testbed

    No full text
    The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval

    Diverse Cotraining Makes Strong Semi-Supervised Segmentor

    Full text link
    Deep co-training has been introduced to semi-supervised segmentation and achieves impressive results, yet few studies have explored the working mechanism behind it. In this work, we revisit the core assumption that supports co-training: multiple compatible and conditionally independent views. By theoretically deriving the generalization upper bound, we prove the prediction similarity between two models negatively impacts the model's generalization ability. However, most current co-training models are tightly coupled together and violate this assumption. Such coupling leads to the homogenization of networks and confirmation bias which consequently limits the performance. To this end, we explore different dimensions of co-training and systematically increase the diversity from the aspects of input domains, different augmentations and model architectures to counteract homogenization. Our Diverse Co-training outperforms the state-of-the-art (SOTA) methods by a large margin across different evaluation protocols on the Pascal and Cityscapes. For example. we achieve the best mIoU of 76.2%, 77.7% and 80.2% on Pascal with only 92, 183 and 366 labeled images, surpassing the previous best results by more than 5%.Comment: ICCV2023, Camera Ready Version, Code: \url{https://github.com/williamium3000/diverse-cotraining

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Weakly supervised segmentation of polyps on colonoscopy images

    Get PDF
    openIl cancro del colon-retto (CRC) è una delle principali cause di morte a livello mondiale e continua a rappresentare una sfida critica per la salute pubblica, richiedendo una precisa e tempestiva diagnosi e un intervento mirato. La colonscopia, ovvero l'esame diagnostico volto a esplorare le pareti interne del colon per scoprire eventuali masse tumorali, ha dimostrato essere un metodo efficace per ridurre l'incidenza di mortalità. Le tecniche emergenti, come l'analisi avanzata delle immagini tramite reti neurali, sono promettenti per una diagnosi accurata. Tuttavia, alcuni studi hanno riportato che, per varie ragioni, una certa percentuale di polipi non viene rilevata correttamente durante la colonscopia. Una delle più importanti è la dipendenza dalle annotazioni a livello di pixel, che richiede molte risorse computazionali; per questo si rendono necessarie soluzioni innovative. Questa tesi introduce alcune strategie per migliorare l'identificazione dei polipi. A tal fine, le tecniche principali utilizzate coinvolgono i cosiddetti metodi di Explainable AI per l'analisi delle mappe di salienza e di attivazione, attraverso diversi algoritmi di rilevamento della salienza visiva e la Gradient-weighted Class Activation Mapping (Grad-CAM). Inoltre, viene utilizzata una rete neurale per la segmentazione con architettura DeepLabV3+, in cui vengono fornite le bounding box sulle immagini di addestramento, in un contesto debolmente supervisionato.Colorectal cancer (CRC) is one of the leading causes of death worldwide and continues to pose a critical public health challenge, demanding precise early detection and intervention. Colonoscopy, the diagnostic examination aimed at exploring the inner walls of the colon to discover any tumour masses, is an effective method to decrease mortality incidence. Emerging techniques, such as advanced image analysis driven by neural networks, hold promise for accurate diagnosis. However, studies have reported that, for various reasons, a certain percentage of polyps are not correctly detected during colonoscopy. One of the most important is the dependency on pixel-level annotations, which requires a lot of computational resources, making necessary innovative solutions. This thesis introduces strategies for improving polyp identification. For this purpose, the main techniques involve the so-called Explainable AI tools for analyzing saliency maps and activation maps, through several state-of-the-art visual saliency detection algorithms and Gradient-weighted Class Activation Mapping (Grad-CAM). In addition, a neural network for segmentation with DeepLabV3+ architecture is used, in which bounding boxes are provided on the training images, within a weakly supervised framework
    corecore