1,055 research outputs found
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries
With advanced image journaling tools, one can easily alter the semantic
meaning of an image by exploiting certain manipulation techniques such as
copy-clone, object splicing, and removal, which mislead the viewers. In
contrast, the identification of these manipulations becomes a very challenging
task as manipulated regions are not visually apparent. This paper proposes a
high-confidence manipulation localization architecture which utilizes
resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder
network to segment out manipulated regions from non-manipulated ones.
Resampling features are used to capture artifacts like JPEG quality loss,
upsampling, downsampling, rotation, and shearing. The proposed network exploits
larger receptive fields (spatial maps) and frequency domain correlation to
analyze the discriminative characteristics between manipulated and
non-manipulated regions by incorporating encoder and LSTM network. Finally,
decoder network learns the mapping from low-resolution feature maps to
pixel-wise predictions for image tamper localization. With predicted mask
provided by final layer (softmax) of the proposed architecture, end-to-end
training is performed to learn the network parameters through back-propagation
using ground-truth masks. Furthermore, a large image splicing dataset is
introduced to guide the training process. The proposed method is capable of
localizing image manipulations at pixel level with high precision, which is
demonstrated through rigorous experimentation on three diverse datasets
Information extraction from multimedia web documents: an open-source platform and testbed
The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
Diverse Cotraining Makes Strong Semi-Supervised Segmentor
Deep co-training has been introduced to semi-supervised segmentation and
achieves impressive results, yet few studies have explored the working
mechanism behind it. In this work, we revisit the core assumption that supports
co-training: multiple compatible and conditionally independent views. By
theoretically deriving the generalization upper bound, we prove the prediction
similarity between two models negatively impacts the model's generalization
ability. However, most current co-training models are tightly coupled together
and violate this assumption. Such coupling leads to the homogenization of
networks and confirmation bias which consequently limits the performance. To
this end, we explore different dimensions of co-training and systematically
increase the diversity from the aspects of input domains, different
augmentations and model architectures to counteract homogenization. Our Diverse
Co-training outperforms the state-of-the-art (SOTA) methods by a large margin
across different evaluation protocols on the Pascal and Cityscapes. For
example. we achieve the best mIoU of 76.2%, 77.7% and 80.2% on Pascal with only
92, 183 and 366 labeled images, surpassing the previous best results by more
than 5%.Comment: ICCV2023, Camera Ready Version, Code:
\url{https://github.com/williamium3000/diverse-cotraining
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Weakly supervised segmentation of polyps on colonoscopy images
openIl cancro del colon-retto (CRC) è una delle principali cause di morte a livello mondiale e continua a rappresentare una sfida critica per la salute pubblica, richiedendo una precisa e tempestiva diagnosi e un intervento mirato. La colonscopia, ovvero l'esame diagnostico volto a esplorare le pareti interne del colon per scoprire eventuali masse tumorali, ha dimostrato essere un metodo efficace per ridurre l'incidenza di mortalità. Le tecniche emergenti, come l'analisi avanzata delle immagini tramite reti neurali, sono promettenti per una diagnosi accurata. Tuttavia, alcuni studi hanno riportato che, per varie ragioni, una certa percentuale di polipi non viene rilevata correttamente durante la colonscopia. Una delle più importanti è la dipendenza dalle annotazioni a livello di pixel, che richiede molte risorse computazionali; per questo si rendono necessarie soluzioni innovative. Questa tesi introduce alcune strategie per migliorare l'identificazione dei polipi. A tal fine, le tecniche principali utilizzate coinvolgono i cosiddetti metodi di Explainable AI per l'analisi delle mappe di salienza e di attivazione, attraverso diversi algoritmi di rilevamento della salienza visiva e la Gradient-weighted Class Activation Mapping (Grad-CAM). Inoltre, viene utilizzata una rete neurale per la segmentazione con architettura DeepLabV3+, in cui vengono fornite le bounding box sulle immagini di addestramento, in un contesto debolmente supervisionato.Colorectal cancer (CRC) is one of the leading causes of death worldwide and continues to pose a critical public health challenge, demanding precise early detection and intervention. Colonoscopy, the diagnostic examination aimed at exploring the inner walls of the colon to discover any tumour masses, is an effective method to decrease mortality incidence. Emerging techniques, such as advanced image analysis driven by neural networks, hold promise for accurate diagnosis. However, studies have reported that, for various reasons, a certain percentage of polyps are not correctly detected during colonoscopy. One of the most important is the dependency on pixel-level annotations, which requires a lot of computational resources, making necessary innovative solutions. This thesis introduces strategies for improving polyp identification. For this purpose, the main techniques involve the so-called Explainable AI tools for analyzing saliency maps and activation maps, through several state-of-the-art visual saliency detection algorithms and Gradient-weighted Class Activation Mapping (Grad-CAM). In addition, a neural network for segmentation with DeepLabV3+ architecture is used, in which bounding boxes are provided on the training images, within a weakly supervised framework
- …