109,385 research outputs found
Who is the director of this movie? Automatic style recognition based on shot features
We show how low-level formal features, such as shot duration, meant as length
of camera takes, and shot scale, i.e. the distance between the camera and the
subject, are distinctive of a director's style in art movies. So far such
features were thought of not having enough varieties to become distinctive of
an author. However our investigation on the full filmographies of six different
authors (Scorsese, Godard, Tarr, Fellini, Antonioni, and Bergman) for a total
number of 120 movies analysed second by second, confirms that these
shot-related features do not appear as random patterns in movies from the same
director. For feature extraction we adopt methods based on both conventional
and deep learning techniques. Our findings suggest that feature sequential
patterns, i.e. how features evolve in time, are at least as important as the
related feature distributions. To the best of our knowledge this is the first
study dealing with automatic attribution of movie authorship, which opens up
interesting lines of cross-disciplinary research on the impact of style on the
aesthetic and emotional effects on the viewers
6 Seconds of Sound and Vision: Creativity in Micro-Videos
The notion of creativity, as opposed to related concepts such as beauty or
interestingness, has not been studied from the perspective of automatic
analysis of multimedia content. Meanwhile, short online videos shared on social
media platforms, or micro-videos, have arisen as a new medium for creative
expression. In this paper we study creative micro-videos in an effort to
understand the features that make a video creative, and to address the problem
of automatic detection of creative content. Defining creative videos as those
that are novel and have aesthetic value, we conduct a crowdsourcing experiment
to create a dataset of over 3,800 micro-videos labelled as creative and
non-creative. We propose a set of computational features that we map to the
components of our definition of creativity, and conduct an analysis to
determine which of these features correlate most with creative video. Finally,
we evaluate a supervised approach to automatically detect creative video, with
promising results, showing that it is necessary to model both aesthetic value
and novelty to achieve optimal classification accuracy.Comment: 8 pages, 1 figures, conference IEEE CVPR 201
A review of research into the development of radiologic expertise: Implications for computer-based training
Rationale and Objectives. Studies of radiologic error reveal high levels of variation between radiologists. Although it is known that experts outperform novices, we have only limited knowledge about radiologic expertise and how it is acquired.Materials and Methods. This review identifies three areas of research: studies of the impact of experience and related factors on the accuracy of decision-making; studies of the organization of expert knowledge; and studies of radiologists' perceptual processes.Results and Conclusion. Interpreting evidence from these three paradigms in the light of recent research into perceptual learning and studies of the visual pathway has a number of conclusions for the training of radiologists, particularly for the design of computer-based learning programs that are able to illustrate the similarities and differences between diagnoses, to give access to large numbers of cases and to help identify weaknesses in the way trainees build up a global representation from fixated regions
Pre-corneal tear film thickness in humans measured with a novel technique.
PurposeThe purpose of this work was to gather preliminary data in normals and dry eye subjects, using a new, non-invasive imaging platform to measure the thickness of pre-corneal tear film.MethodsHuman subjects were screened for dry eye and classified as dry or normal. Tear film thickness over the inferior paracentral cornea was measured using laser illumination and a complementary metal-oxide-semiconductor (CMOS) camera. A previously developed mathematical model was used to calculate the thickness of the tear film by applying the principle of spatial auto-correlation function (ACF).ResultsMean tear film thickness values (±SD) were 3.05 μm (0.20) and 2.48 μm (0.32) on the initial visit for normals (n=18) and dry eye subjects (n=22), respectively, and were significantly different (p<0.001, 2-sample t-test). Repeatability was good between visit 1 and 2 for normals (intraclass correlation coefficient [ICC]=0.935) and dry eye subjects (ICC=0.950). Tear film thickness increased above baseline for the dry eye subjects following viscous drop instillation and remained significantly elevated for up to approximately 32 min (n=20; p<0.05 until 32 min; general linear mixed model and Dunnett's tests).ConclusionsThis technique for imaging the ocular surface appears to provide tear thickness values in agreement with other non-invasive methods. Moreover, the technique can differentiate between normal and dry eye patient types
Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies
Stories can have tremendous power -- not only useful for entertainment, they
can activate our interests and mobilize our actions. The degree to which a
story resonates with its audience may be in part reflected in the emotional
journey it takes the audience upon. In this paper, we use machine learning
methods to construct emotional arcs in movies, calculate families of arcs, and
demonstrate the ability for certain arcs to predict audience engagement. The
system is applied to Hollywood films and high quality shorts found on the web.
We begin by using deep convolutional neural networks for audio and visual
sentiment analysis. These models are trained on both new and existing
large-scale datasets, after which they can be used to compute separate audio
and visual emotional arcs. We then crowdsource annotations for 30-second video
clips extracted from highs and lows in the arcs in order to assess the
micro-level precision of the system, with precision measured in terms of
agreement in polarity between the system's predictions and annotators' ratings.
These annotations are also used to combine the audio and visual predictions.
Next, we look at macro-level characterizations of movies by investigating
whether there exist `universal shapes' of emotional arcs. In particular, we
develop a clustering approach to discover distinct classes of emotional arcs.
Finally, we show on a sample corpus of short web videos that certain emotional
arcs are statistically significant predictors of the number of comments a video
receives. These results suggest that the emotional arcs learned by our approach
successfully represent macroscopic aspects of a video story that drive audience
engagement. Such machine understanding could be used to predict audience
reactions to video stories, ultimately improving our ability as storytellers to
communicate with each other.Comment: Data Mining (ICDM), 2017 IEEE 17th International Conference o
An affect-based video retrieval system with open vocabulary querying
Content-based video retrieval systems (CBVR) are creating
new search and browse capabilities using metadata describing significant features of the data. An often overlooked aspect of human interpretation of multimedia data is the affective dimension. Incorporating affective information into multimedia metadata can potentially enable search using
this alternative interpretation of multimedia content. Recent work has described methods to automatically assign affective labels to multimedia data using various approaches. However, the subjective and imprecise nature of affective labels makes it difficult to bridge the semantic gap between system-detected labels and user expression of information requirements in multimedia retrieval. We present a novel affect-based video retrieval system incorporating an open-vocabulary query stage based on WordNet enabling search using an unrestricted query vocabulary. The system performs automatic annotation of video data with labels of well
defined affective terms. In retrieval annotated documents are ranked using the standard Okapi retrieval model based on open-vocabulary text queries. We present experimental results examining the behaviour of the system for retrieval of a collection of automatically annotated feature films of different genres. Our results indicate that affective annotation can potentially provide useful augmentation to more traditional objective content description in multimedia retrieval
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos
When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation
- …