52,693 research outputs found
Inexpensive fusion methods for enhancing feature detection
Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere
Multimedia information technology and the annotation of video
The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning
ELVIS: Entertainment-led video summaries
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative
Advances in Teaching & Learning Day Abstracts 2005
Proceedings of the Advances in Teaching & Learning Day Regional Conference held at The University of Texas Health Science Center at Houston in 2005
Cross-Modal Health State Estimation
Individuals create and consume more diverse data about themselves today than
any time in history. Sources of this data include wearable devices, images,
social media, geospatial information and more. A tremendous opportunity rests
within cross-modal data analysis that leverages existing domain knowledge
methods to understand and guide human health. Especially in chronic diseases,
current medical practice uses a combination of sparse hospital based biological
metrics (blood tests, expensive imaging, etc.) to understand the evolving
health status of an individual. Future health systems must integrate data
created at the individual level to better understand health status perpetually,
especially in a cybernetic framework. In this work we fuse multiple user
created and open source data streams along with established biomedical domain
knowledge to give two types of quantitative state estimates of cardiovascular
health. First, we use wearable devices to calculate cardiorespiratory fitness
(CRF), a known quantitative leading predictor of heart disease which is not
routinely collected in clinical settings. Second, we estimate inherent genetic
traits, living environmental risks, circadian rhythm, and biological metrics
from a diverse dataset. Our experimental results on 24 subjects demonstrate how
multi-modal data can provide personalized health insight. Understanding the
dynamic nature of health status will pave the way for better health based
recommendation engines, better clinical decision making and positive lifestyle
changes.Comment: Accepted to ACM Multimedia 2018 Conference - Brave New Ideas, Seoul,
Korea, ACM ISBN 978-1-4503-5665-7/18/1
- …