525 research outputs found
Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor
We investigate video classification via a two-stream convolutional neural
network (CNN) design that directly ingests information extracted from
compressed video bitstreams. Our approach begins with the observation that all
modern video codecs divide the input frames into macroblocks (MBs). We
demonstrate that selective access to MB motion vector (MV) information within
compressed video bitstreams can also provide for selective, motion-adaptive, MB
pixel decoding (a.k.a., MB texture decoding). This in turn allows for the
derivation of spatio-temporal video activity regions at extremely high speed in
comparison to conventional full-frame decoding followed by optical flow
estimation. In order to evaluate the accuracy of a video classification
framework based on such activity data, we independently train two CNN
architectures on MB texture and MV correspondences and then fuse their scores
to derive the final classification of each test video. Evaluation on two
standard datasets shows that the proposed approach is competitive to the best
two-stream video classification approaches found in the literature. At the same
time: (i) a CPU-based realization of our MV extraction is over 977 times faster
than GPU-based optical flow methods; (ii) selective decoding is up to 12 times
faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs
perform inference at 5 to 49 times lower cloud computing cost than the fastest
methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video
Technology. Extension of ICIP 2017 conference pape
Escaping endpoints explode
In 1988, Mayer proved the remarkable fact that infinity is an explosion point
for the set of endpoints of the Julia set of an exponential map that has an
attracting fixed point. That is, the set is totally separated (in particular,
it does not have any nontrivial connected subsets), but its union with the
point at infinity is connected. Answering a question of Schleicher, we extend
this result to the set of "escaping endpoints" in the sense of Schleicher and
Zimmer, for any exponential map for which the singular value belongs to an
attracting or parabolic basin, has a finite orbit, or escapes to infinity under
iteration (as well as many other classes of parameters).
Furthermore, we extend one direction of the theorem to much greater
generality, by proving that the set of escaping endpoints joined with infinity
is connected for any transcendental entire function of finite order with
bounded singular set. We also discuss corresponding results for *all* endpoints
in the case of exponential maps; in order to do so, we establish a version of
Thurston's "no wandering triangles" theorem.Comment: 35 pages. To appear in Comput. Methods Funct. Theory. V2: Authors'
final accepted manuscript. Revisions and clarifications have been made
throughout from V1. This includes improvements in the proof of Proposition
6.11 and Theorem 8.1, as well as corrections in Remarks 7.1 and 7.3
concerning differing definitions of escaping endpoints in greater generalit
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity
When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data
Komparasi Metode Deep Learning, NaĂŻve Bayes Dan Random Forest Untuk Prediksi Penyakit Jantung
Jantung adalah organ yang mempunyai peranan penting dalam kelangsungan hidup manusia karena fungsinya untuk mendistribusikan darah dari paru-paru ke seluruh bagian tubuh, yang dimana darah tersebut mengandung banyak sekali oksigen sehingga dapat membantu proses metabolisme di dalam tubuh manusia. Ada banyak aktivitas dalam tubuh manusia yang tidak dapat diprediksi dalam bentuk umum. Serangan jantung adalah salah satunya, dan itu adalah aktivitas yang sangat serius dalam tubuh manusia yang menyebabkan kematian manusia. Meskipun tidak terlalu terlihat dalam kondisi normal, itu dilakukan secara tiba-tiba. Jadi ini adalah salah satu kejadian yang sangat tidak terduga dalam tubuh manusia. Dengan kemajuan teknologi beberapa algoritma penambangan data dikembangkan untuk memprediksi serangan jantung. Dalam kelanjutannya, algoritma penambangan data yang berbeda, dengan pendekatan machine learning mampu memprediksi terjadinya serangan jantung dalam tubuh manusia. Ini adalah salah satu tugas diagnosis yang khas, tetapi harus dicapai secara akurat dan efisien dengan bantuan pembelajaran mesin. penelitian ini adalah upaya untuk memodelkan dan memecahkan masalah prediksi serangan jantung. Algoritma mesin yang berbeda seperti Deep Learning, Naives Bayes dan Random Forest diambil di sini untuk membentuk model dalam penelitian ini. pendekatan pembelajaran mesin adalah pendekatan yang baik untuk memprediksi terjadinya serangan jantung. Dataset diambil dari laman Kaggle dengan judul heart attack analysis dan prediction dataset. Akurasi paling tinggi yang dapat dihasilkan adalah menggunakan metode algortma deep learning dimana menghasilkan akurasi yang bernilai, yaitu 83,49%
The Shia Migration from Southwestern Iran to Kuwait: Push-Pull Factors during the Late Nineteenth and Early Twentieth Centuries
This study explores the “push-pull” dynamics of Shia migration from southwestern Iran (Fars, Khuzestan and the Persian Gulf coast) to Kuwait during the late nineteenth and early twentieth centuries. Although nowadays Shias constitute thirty five percent of the Kuwaiti population and their historical role in building the state of Kuwait have been substantial, no individual study has delved into the causes of Shia migration from Iran to Kuwait. By analyzing the internal political, economic, and social conditions of both regions in the context of the Gulf sheikhdoms, the British and Ottoman empires, and other great powers interested in dominating the Gulf region, my thesis examines why Shia migrants, such as merchants, artisans and laborers left southwestern Iran and chose Kuwait as their final destination to settle. The two-way trade between southwest Iran and Kuwait provided a pathway for the Shia migrants and settlers into Kuwait. Moreover, by highlighting the economic roles of the Shia community in Kuwait, my thesis enhances our understanding of the foundation and contributions of the Shia community in Kuwait. Thus it fills a significant gap in Kuwaiti historiography. The research for this thesis draws from a variety of primary sources, including British government documents, the writing of western travelers, the Almatrook business archive, and oral-history interviews with descendants of Shia immigrants to Kuwait
Computer Vision for a Camel-Vehicle Collision Mitigation System
As the population grows and more land is being used for urbanization,
ecosystems are disrupted by our roads and cars. This expansion of
infrastructure cuts through wildlife territories, leading to many instances of
Wildlife-Vehicle Collision (WVC). These instances of WVC are a global issue
that is having a global socio-economic impact, resulting in billions of dollars
in property damage and, at times, fatalities for vehicle occupants. In Saudi
Arabia, this issue is similar, with instances of Camel-Vehicle Collision (CVC)
being particularly deadly due to the large size of camels, which results in a
25% fatality rate [4]. The focus of this work is to test different object
detection models on the task of detecting camels on the road. The Deep Learning
(DL) object detection models used in the experiments are: CenterNet,
EfficientDet, Faster R-CNN, and SSD. Results of the experiments show that
CenterNet performed the best in terms of accuracy and was the most efficient in
training. In the future, the plan is to expand on this work by developing a
system to make countryside roads safer
Biased Mixtures Of Experts: Enabling Computer Vision Inference Under Data Transfer Limitations
We propose a novel mixture-of-experts class to optimize computer vision
models in accordance with data transfer limitations at test time. Our approach
postulates that the minimum acceptable amount of data allowing for
highly-accurate results can vary for different input space partitions.
Therefore, we consider mixtures where experts require different amounts of
data, and train a sparse gating function to divide the input space for each
expert. By appropriate hyperparameter selection, our approach is able to bias
mixtures of experts towards selecting specific experts over others. In this
way, we show that the data transfer optimization between visual sensing and
processing can be solved as a convex optimization problem.To demonstrate the
relation between data availability and performance, we evaluate biased mixtures
on a range of mainstream computer vision problems, namely: (i) single shot
detection, (ii) image super resolution, and (iii) realtime video action
classification. For all cases, and when experts constitute modified baselines
to meet different limits on allowed data utility, biased mixtures significantly
outperform previous work optimized to meet the same constraints on available
data
Shadow Removal Based on Gamma Decoding Method for Moving Object Images
Shadows are reflections of objects exposed to light. So there is an image pixel that has a darker intensity than the object’s pixel. The development of digital technology, the shadow into noise on digital images and digital video. Therefore, the information on the image becomes inaccurate. One of the usage of digital image or video digital is intelligent transportation system using digital video-based CCTV camera. But the use of digital video has several problems, including the presence of a shadow. Therefore, it is necessary to have a method to eliminate shadows. In this paper, we use gamma decoding method to determine object pixels and pixel shadow based on the illumination of the object, so the shadow pixels can be eliminated. The result of this research is images without shadow
- …