514 research outputs found

    Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

    Get PDF
    We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard datasets shows that the proposed approach is competitive to the best two-stream video classification approaches found in the literature. At the same time: (i) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; (ii) selective decoding is up to 12 times faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video Technology. Extension of ICIP 2017 conference pape

    Escaping endpoints explode

    Get PDF
    In 1988, Mayer proved the remarkable fact that infinity is an explosion point for the set of endpoints of the Julia set of an exponential map that has an attracting fixed point. That is, the set is totally separated (in particular, it does not have any nontrivial connected subsets), but its union with the point at infinity is connected. Answering a question of Schleicher, we extend this result to the set of "escaping endpoints" in the sense of Schleicher and Zimmer, for any exponential map for which the singular value belongs to an attracting or parabolic basin, has a finite orbit, or escapes to infinity under iteration (as well as many other classes of parameters). Furthermore, we extend one direction of the theorem to much greater generality, by proving that the set of escaping endpoints joined with infinity is connected for any transcendental entire function of finite order with bounded singular set. We also discuss corresponding results for *all* endpoints in the case of exponential maps; in order to do so, we establish a version of Thurston's "no wandering triangles" theorem.Comment: 35 pages. To appear in Comput. Methods Funct. Theory. V2: Authors' final accepted manuscript. Revisions and clarifications have been made throughout from V1. This includes improvements in the proof of Proposition 6.11 and Theorem 8.1, as well as corrections in Remarks 7.1 and 7.3 concerning differing definitions of escaping endpoints in greater generalit

    Vectors of Locally Aggregated Centers for Compact Video Representation

    Full text link
    We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME 2015, Torino, Ital

    Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks

    Get PDF
    Advanced video classification systems decode video frames to derive the necessary texture and motion representations for ingestion and analysis by spatio-temporal deep convolutional neural networks (CNNs). However, when considering visual Internet-of-Things applications, surveillance systems and semantic crawlers of large video repositories, the video capture and the CNN-based semantic analysis parts do not tend to be co-located. This necessitates the transport of compressed video over networks and incurs significant overhead in bandwidth and energy consumption, thereby significantly undermining the deployment potential of such systems. In this paper, we investigate the trade-off between the encoding bitrate and the achievable accuracy of CNN-based video classification models that directly ingest AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video bitstreams and applying complex optical flow calculations prior to CNN processing, we only retain motion vector and select texture information at significantly-reduced bitrates and apply no additional processing prior to CNN ingestion. Based on three CNN architectures and two action recognition datasets, we achieve 11%-94% saving in bitrate with marginal effect on classification accuracy. A model-based selection between multiple CNNs increases these savings further, to the point where, if up to 7% loss of accuracy can be tolerated, video classification can take place with as little as 3 kbps for the transport of the required compressed video information to the system implementing the CNN models

    Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity

    Get PDF
    When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data

    Komparasi Metode Deep Learning, NaĂŻve Bayes Dan Random Forest Untuk Prediksi Penyakit Jantung

    Get PDF
    Jantung adalah organ yang mempunyai peranan penting dalam kelangsungan hidup manusia karena fungsinya untuk mendistribusikan darah dari paru-paru ke seluruh bagian tubuh, yang dimana darah tersebut mengandung banyak sekali oksigen sehingga dapat membantu proses metabolisme di dalam tubuh manusia. Ada banyak aktivitas dalam tubuh manusia yang tidak dapat diprediksi dalam bentuk umum. Serangan jantung adalah salah satunya, dan itu adalah aktivitas yang sangat serius dalam tubuh manusia yang menyebabkan kematian manusia. Meskipun tidak terlalu terlihat dalam kondisi normal, itu dilakukan secara tiba-tiba. Jadi ini adalah salah satu kejadian yang sangat tidak terduga dalam tubuh manusia. Dengan kemajuan teknologi beberapa algoritma penambangan data dikembangkan untuk memprediksi serangan jantung. Dalam kelanjutannya, algoritma penambangan data yang berbeda, dengan pendekatan machine learning mampu memprediksi terjadinya serangan jantung dalam tubuh manusia. Ini adalah salah satu tugas diagnosis yang khas, tetapi harus dicapai secara akurat dan efisien dengan bantuan pembelajaran mesin. penelitian ini adalah upaya untuk memodelkan dan memecahkan masalah prediksi serangan jantung. Algoritma mesin yang berbeda seperti Deep Learning, Naives Bayes dan Random Forest diambil di sini untuk membentuk model dalam penelitian ini. pendekatan pembelajaran mesin adalah pendekatan yang baik untuk memprediksi terjadinya serangan jantung. Dataset diambil dari laman Kaggle dengan judul heart attack analysis dan prediction dataset. Akurasi paling tinggi yang dapat dihasilkan adalah menggunakan metode algortma deep learning dimana menghasilkan akurasi yang bernilai, yaitu 83,49%

    The Shia Migration from Southwestern Iran to Kuwait: Push-Pull Factors during the Late Nineteenth and Early Twentieth Centuries

    Get PDF
    This study explores the “push-pull” dynamics of Shia migration from southwestern Iran (Fars, Khuzestan and the Persian Gulf coast) to Kuwait during the late nineteenth and early twentieth centuries. Although nowadays Shias constitute thirty five percent of the Kuwaiti population and their historical role in building the state of Kuwait have been substantial, no individual study has delved into the causes of Shia migration from Iran to Kuwait. By analyzing the internal political, economic, and social conditions of both regions in the context of the Gulf sheikhdoms, the British and Ottoman empires, and other great powers interested in dominating the Gulf region, my thesis examines why Shia migrants, such as merchants, artisans and laborers left southwestern Iran and chose Kuwait as their final destination to settle. The two-way trade between southwest Iran and Kuwait provided a pathway for the Shia migrants and settlers into Kuwait. Moreover, by highlighting the economic roles of the Shia community in Kuwait, my thesis enhances our understanding of the foundation and contributions of the Shia community in Kuwait. Thus it fills a significant gap in Kuwaiti historiography. The research for this thesis draws from a variety of primary sources, including British government documents, the writing of western travelers, the Almatrook business archive, and oral-history interviews with descendants of Shia immigrants to Kuwait

    Computer Vision for a Camel-Vehicle Collision Mitigation System

    Full text link
    As the population grows and more land is being used for urbanization, ecosystems are disrupted by our roads and cars. This expansion of infrastructure cuts through wildlife territories, leading to many instances of Wildlife-Vehicle Collision (WVC). These instances of WVC are a global issue that is having a global socio-economic impact, resulting in billions of dollars in property damage and, at times, fatalities for vehicle occupants. In Saudi Arabia, this issue is similar, with instances of Camel-Vehicle Collision (CVC) being particularly deadly due to the large size of camels, which results in a 25% fatality rate [4]. The focus of this work is to test different object detection models on the task of detecting camels on the road. The Deep Learning (DL) object detection models used in the experiments are: CenterNet, EfficientDet, Faster R-CNN, and SSD. Results of the experiments show that CenterNet performed the best in terms of accuracy and was the most efficient in training. In the future, the plan is to expand on this work by developing a system to make countryside roads safer

    Biased Mixtures Of Experts: Enabling Computer Vision Inference Under Data Transfer Limitations

    Get PDF
    We propose a novel mixture-of-experts class to optimize computer vision models in accordance with data transfer limitations at test time. Our approach postulates that the minimum acceptable amount of data allowing for highly-accurate results can vary for different input space partitions. Therefore, we consider mixtures where experts require different amounts of data, and train a sparse gating function to divide the input space for each expert. By appropriate hyperparameter selection, our approach is able to bias mixtures of experts towards selecting specific experts over others. In this way, we show that the data transfer optimization between visual sensing and processing can be solved as a convex optimization problem.To demonstrate the relation between data availability and performance, we evaluate biased mixtures on a range of mainstream computer vision problems, namely: (i) single shot detection, (ii) image super resolution, and (iii) realtime video action classification. For all cases, and when experts constitute modified baselines to meet different limits on allowed data utility, biased mixtures significantly outperform previous work optimized to meet the same constraints on available data

    Shadow Removal Based on Gamma Decoding Method for Moving Object Images

    Get PDF
    Shadows are reflections of objects exposed to light. So there is an image pixel that has a darker intensity than the object’s pixel. The development of digital technology, the shadow into noise on digital images and digital video. Therefore, the information on the image becomes inaccurate. One of the usage of digital image or video digital is intelligent transportation system using digital video-based CCTV camera. But the use of digital video has several problems, including the presence of a shadow. Therefore, it is necessary to have a method to eliminate shadows. In this paper, we use gamma decoding method to determine object pixels and pixel shadow based on the illumination of the object, so the shadow pixels can be eliminated. The result of this research is images without shadow
    • …
    corecore