10 research outputs found
Analisis Interaksi Pengguna di Media Sosial Dalam Mencegah Video Hoax dan Model Arsitektur Deteksi Tingkat Tinggi
Penyebaran berita hoaks dengan konten video yang berulang pada media sosial merupakan fenomenayang sangat luar biasa dan muncul bukan hanya pada kalangan pengguna dewasa saja namun sudah kesegalalapisan usia, Efek yang paling terasa adalah timbulnya perpecahan di masyarakat karena penggunaan videoyang sudah pernah tayang atau ada sebelumnya menjadi bukti kuat untuk memvalidasi konten yang dilihatnya.Penting untuk mendeteksi berita hoaks dengan konten video yang berulang dan menghentikan efek negatifnyapada individu dan masyarakat. Pada penelitian ini pembuatan model arsitektur deteksi tingkat tinggi untuksistem analisis berita hoaks dengan konten video yang digunakan kembali atau berulang pada media sosial dikenalkan, dengan menggunakan deep learning video processing, speech to text dan beberapa fitur content-baseddan context-based rancangan model arsitektur ini dibuat. Konten hoaks dengan video yang berulang diharapkandapat dicegah penyebarannya jika bisa di filter terlebih dahulu sebelum muncul di lini masa. Diharapkan modelarsitektur ini dapat menjadi referensi untuk di buat menjadi real syste
Graph Based Video Sequence Matching & BoF Method for Video Copy detection
In this paper we propose video copy detection method using Bag-of-Features and showing acyclic graph of matching frames of videos. This include use of both local (line, texture, color) and global (Scale Invariant Feature Transform i.e. SIFT) features. This process includes dividing video into small frames using dual threshold method which eliminates the redundant frames and select unique key frames. After that from each key frame binary features are extracted which known as Bag of Features (BoF) which are get stored into the database in format of matrix. When any query video is being uploading, same features are extracted and compared with stored database to detect copied video. If video detected as copied then using Graph Based Sequence Matching Method, actual matched sequence between key frames is displayed in acyclic graph.
DOI: 10.17762/ijritcc2321-8169.15067
TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision
Video copy localization aims to precisely localize all the copied segments
within a pair of untrimmed videos in video retrieval applications. Previous
methods typically start from frame-to-frame similarity matrix generated by
cosine similarity between frame-level features of the input video pair, and
then detect and refine the boundaries of copied segments on similarity matrix
under temporal constraints. In this paper, we propose TransVCL: an
attention-enhanced video copy localization network, which is optimized directly
from initial frame-level features and trained end-to-end with three main
components: a customized Transformer for feature enhancement, a correlation and
softmax layer for similarity matrix generation, and a temporal alignment module
for copied segments localization. In contrast to previous methods demanding the
handcrafted similarity matrix, TransVCL incorporates long-range temporal
information between feature sequence pair using self- and cross- attention
layers. With the joint design and optimization of three components, the
similarity matrix can be learned to present more discriminative copied
patterns, leading to significant improvements over previous methods on
segment-level labeled datasets (VCSL and VCDB). Besides the state-of-the-art
performance in fully supervised setting, the attention architecture facilitates
TransVCL to further exploit unlabeled or simply video-level labeled data.
Additional experiments of supplementing video-level labeled datasets including
SVD and FIVR reveal the high flexibility of TransVCL from full supervision to
semi-supervision (with or without video-level annotation). Code is publicly
available at https://github.com/transvcl/TransVCL.Comment: Accepted by the Thirty-Seventh AAAI Conference on Artificial
Intelligence(AAAI2023
Perceptual Video Hashing for Content Identification and Authentication
Perceptual hashing has been broadly used in the literature to identify similar contents for video copy detection. It has also been adopted to detect malicious manipulations for video authentication. However, targeting both applications with a single system using the same hash would be highly desirable as this saves the storage space and reduces the computational complexity. This paper proposes a perceptual video hashing system for content identification and authentication. The objective is to design a hash extraction technique that can withstand signal processing operations on one hand and detect malicious attacks on the other hand. The proposed system relies on a new signal calibration technique for extracting the hash using the discrete cosine transform (DCT) and the discrete sine transform (DST). This consists of determining the number of samples, called the normalizing shift, that is required for shifting a digital signal so that the shifted version matches a certain pattern according to DCT/DST coefficients. The rationale for the calibration idea is that the normalizing shift resists signal processing operations while it exhibits sensitivity to local tampering (i.e., replacing a small portion of the signal with a different one). While the same hash serves both applications, two different similarity measures have been proposed for video identification and authentication, respectively. Through intensive experiments with various types of video distortions and manipulations, the proposed system has been shown to outperform related state-of-the art video hashing techniques in terms of identification and authentication with the advantageous ability to locate tampered regions
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval
In this paper, we address the problem of high performance and computationally
efficient content-based video retrieval in large-scale datasets. Current
methods typically propose either: (i) fine-grained approaches employing
spatio-temporal representations and similarity calculations, achieving high
performance at a high computational cost or (ii) coarse-grained approaches
representing/indexing videos as global vectors, where the spatio-temporal
structure is lost, providing low performance but also having low computational
cost. In this work, we propose a Knowledge Distillation framework, which we
call Distill-and-Select (DnS), that starting from a well-performing
fine-grained Teacher Network learns: a) Student Networks at different retrieval
performance and computational efficiency trade-offs and b) a Selection Network
that at test time rapidly directs samples to the appropriate student to
maintain both high retrieval performance and high computational efficiency. We
train several students with different architectures and arrive at different
trade-offs of performance and efficiency, i.e., speed and storage requirements,
including fine-grained students that store index videos using binary
representations. Importantly, the proposed scheme allows Knowledge Distillation
in large, unlabelled datasets -- this leads to good students. We evaluate DnS
on five public datasets on three different video retrieval tasks and
demonstrate a) that our students achieve state-of-the-art performance in
several cases and b) that our DnS framework provides an excellent trade-off
between retrieval performance, computational speed, and storage space. In
specific configurations, our method achieves similar mAP with the teacher but
is 20 times faster and requires 240 times less storage space. Our collected
dataset and implementation are publicly available:
https://github.com/mever-team/distill-and-select
An image-based approach to video copy detection with spatio-temporal post-filtering
International audienceThis paper introduces a video copy detection system which efficiently matches individual frames and then verifies their spatio-temporal consistency. The approach for matching frames relies on a recent local feature indexing method, which is at the same time robust to significant video transformations and efficient in terms of memory usage and computation time. We match either keyframes or uniformly sampled frames. To further improve the results, a verification step robustly estimates a spatio-temporal model between the query video and the potentially corresponding video segments. Experimental results evaluate the different parameters of our system and measure the trade-off between accuracy and efficiency. We show that our system obtains excellent results for the TRECVID 2008 copy detection task
R-Forest for Approximate Nearest Neighbor Queries in High Dimensional Space
Searching high dimensional space has been a challenge and an area of intense research for many years. The dimensionality curse has rendered most existing index methods all but useless causing people to research other techniques. In my dissertation I will try to resurrect one of the best known index structures, R-Tree, which most have given up on as a viable method of answering high dimensional queries. I have pointed out the various advantages of R-Tree as a method for answering approximate nearest neighbor queries, and the advantages of locality sensitive hashing and locality sensitive B-Tree, which are the most successful methods today. I started by looking at improving the maintenance of R-Tree by the use of bulk loading and insertion. I proposed and implemented a new method that bulk loads the index which was an improvement of standard method. I then turned my attention to nearest neighbor queries, which is a much more challenging problem especially in high dimensional space. Initially I developed a set of heuristics, easily implemented in R-Tree, which improved the efficiency of high dimensional approximate nearest neighbor queries. To further refine my method I took another approach, by developing a new model, known as R-Forest, which takes advantage of space partitioning while still using R-Tree as its index structure. With this new approach I was able to implement new heuristics and can show that R-Forest, comprised of a set of R-Trees, is a viable solution tohigh dimensional approximate nearest neighbor queries when compared to established methods