410 research outputs found
Is This a Joke? Detecting Humor in Spanish Tweets
While humor has been historically studied from a psychological, cognitive and
linguistic standpoint, its study from a computational perspective is an area
yet to be explored in Computational Linguistics. There exist some previous
works, but a characterization of humor that allows its automatic recognition
and generation is far from being specified. In this work we build a
crowdsourced corpus of labeled tweets, annotated according to its humor value,
letting the annotators subjectively decide which are humorous. A humor
classifier for Spanish tweets is assembled based on supervised learning,
reaching a precision of 84% and a recall of 69%.Comment: Preprint version, without referra
Detecting Singleton Review Spammers Using Semantic Similarity
Online reviews have increasingly become a very important resource for
consumers when making purchases. Though it is becoming more and more difficult
for people to make well-informed buying decisions without being deceived by
fake reviews. Prior works on the opinion spam problem mostly considered
classifying fake reviews using behavioral user patterns. They focused on
prolific users who write more than a couple of reviews, discarding one-time
reviewers. The number of singleton reviewers however is expected to be high for
many review websites. While behavioral patterns are effective when dealing with
elite users, for one-time reviewers, the review text needs to be exploited. In
this paper we tackle the problem of detecting fake reviews written by the same
person using multiple names, posting each review under a different name. We
propose two methods to detect similar reviews and show the results generally
outperform the vectorial similarity measures used in prior works. The first
method extends the semantic similarity between words to the reviews level. The
second method is based on topic modeling and exploits the similarity of the
reviews topic distributions using two models: bag-of-words and
bag-of-opinion-phrases. The experiments were conducted on reviews from three
different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset
(800 reviews).Comment: 6 pages, WWW 201
Exploratory Analysis of Highly Heterogeneous Document Collections
We present an effective multifaceted system for exploratory analysis of
highly heterogeneous document collections. Our system is based on intelligently
tagging individual documents in a purely automated fashion and exploiting these
tags in a powerful faceted browsing framework. Tagging strategies employed
include both unsupervised and supervised approaches based on machine learning
and natural language processing. As one of our key tagging strategies, we
introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
KERA extracts topic-representative terms from individual documents in a purely
unsupervised fashion and is revealed to be significantly more effective than
state-of-the-art methods. Finally, we evaluate our system in its ability to
help users locate documents pertaining to military critical technologies buried
deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery
and Data Minin
Three-Dimensional Analysis of Wakefields Generated by Flat Electron Beams in Planar Dielectric-Loaded Structures
An electron bunch passing through dielectric-lined waveguide generates
erenkov radiation that can result in high-peak axial electric field
suitable for acceleration of a subsequent bunch. Axial field beyond
Gigavolt-per-meter are attainable in structures with sub-mm sizes depending on
the achievement of suitable electron bunch parameters. A promising
configuration consists of using planar dielectric structure driven by flat
electron bunches. In this paper we present a three-dimensional analysis of
wakefields produced by flat beams in planar dielectric structures thereby
extending the work of Reference [A. Tremaine, J. Rosenzweig, and P. Schoessow,
Phys. Rev. E 56, No. 6, 7204 (1997)] on the topic. We especially provide
closed-form expressions for the normal frequencies and field amplitudes of the
excited modes and benchmark these analytical results with finite-difference
time-domain particle-in-cell numerical simulations. Finally, we implement a
semi-analytical algorithm into a popular particle tracking program thereby
enabling start-to-end high-fidelity modeling of linear accelerators based on
dielectric-lined planar waveguides.Comment: 12 pages, 2 tables, 10 figure
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
Commission des Communautes Europeennes: Groupe du Porte-Parole = Commission of European Communities: Spokesman Group. Spokesman Service Note to National Offices Bio No. (81) 276, 8 July 1981
This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse – but still acceptable – performance when compared to the single language model, while benefiting from better generalization properties across languages
Towards environments that have a sense of humor
Humans have humorous conversations and interactions. Nowadays our real life existence is integrated with our life in social media, videogames, mixed reality and physical environments that sense our activities and that can adapt appearance and properties due to our activities. There are other inhabitants in these environments, not only human, but also virtual agents and social robots with which we interact and who decide about their participation in activities. In this paper we look at designing humor and humor opportunities in such environments, providing them with a sense of humor, and able to recognize opportunities to generate humorous interactions or events on the fly. Opportunities, made possible by introducing incongruities, can be exploited by the environment itself, or they can be communicated to its inhabitants
Diamond deposition on modified silicon substrates: Making diamond atomic force microscopy tips for nanofriction experiments
Fine-crystalline diamond particles are grown on standard Si atomic force microscopy tips, using hot filament-assisted chemical vapor deposition. To optimize the conditions for diamond deposition, first a series of experiments is carried out using silicon substrates covered by point-topped pyramids as obtained by wet chemical etching. The apexes and the edges of the silicon pyramids provide favorable sites for diamond nucleation and growth. The investigation of the deposited polycrystallites is done by means of optical microscopy, scanning electron microscopy and micro-Raman spectroscopy. The resulting diamond-terminated tips are tested in ultra high vacuum using contact-mode atomic force microscope on a stepped surface of sapphire showing high stability, sharpness, and hardnes
- …