3,491 research outputs found
Cluster-based Deep Ensemble Learning for Emotion Classification in Internet Memes
Memes have gained popularity as a means to share visual ideas through the
Internet and social media by mixing text, images and videos, often for humorous
purposes. Research enabling automated analysis of memes has gained attention in
recent years, including among others the task of classifying the emotion
expressed in memes. In this paper, we propose a novel model, cluster-based deep
ensemble learning (CDEL), for emotion classification in memes. CDEL is a hybrid
model that leverages the benefits of a deep learning model in combination with
a clustering algorithm, which enhances the model with additional information
after clustering memes with similar facial features. We evaluate the
performance of CDEL on a benchmark dataset for emotion classification, proving
its effectiveness by outperforming a wide range of baseline models and
achieving state-of-the-art performance. Further evaluation through ablated
models demonstrates the effectiveness of the different components of CDEL
Learning from Multiple Sources for Video Summarisation
Many visual surveillance tasks, e.g.video summarisation, is conventionally
accomplished through analysing imagerybased features. Relying solely on visual
cues for public surveillance video understanding is unreliable, since visual
observations obtained from public space CCTV video data are often not
sufficiently trustworthy and events of interest can be subtle. On the other
hand, non-visual data sources such as weather reports and traffic sensory
signals are readily accessible but are not explored jointly to complement
visual data for video content analysis and summarisation. In this paper, we
present a novel unsupervised framework to learn jointly from both visual and
independently-drawn non-visual data sources for discovering meaningful latent
structure of surveillance video data. In particular, we investigate ways to
cope with discrepant dimension and representation whist associating these
heterogeneous data sources, and derive effective mechanism to tolerate with
missing and incomplete data from different sources. We show that the proposed
multi-source learning framework not only achieves better video content
clustering than state-of-the-art methods, but also is capable of accurately
inferring missing non-visual semantics from previously unseen videos. In
addition, a comprehensive user study is conducted to validate the quality of
video summarisation generated using the proposed multi-source model
Smartphone picture organization: a hierarchical approach
We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin
- …