16,592 research outputs found
A fine-grained approach to scene text script identification
This paper focuses on the problem of script identification in unconstrained
scenarios. Script identification is an important prerequisite to recognition,
and an indispensable condition for automatic text understanding systems
designed for multi-language environments. Although widely studied for document
images and handwritten documents, it remains an almost unexplored territory for
scene text images.
We detail a novel method for script identification in natural images that
combines convolutional features and the Naive-Bayes Nearest Neighbor
classifier. The proposed framework efficiently exploits the discriminative
power of small stroke-parts, in a fine-grained classification framework.
In addition, we propose a new public benchmark dataset for the evaluation of
joint text detection and script identification in natural scenes. Experiments
done in this new dataset demonstrate that the proposed method yields state of
the art results, while it generalizes well to different datasets and variable
number of scripts. The evidence provided shows that multi-lingual scene text
recognition in the wild is a viable proposition. Source code of the proposed
method is made available online
Reverse-Engineering Satire, or "Paper on Computational Humor Accepted Despite Making Serious Advances"
Humor is an essential human trait. Efforts to understand humor have called
out links between humor and the foundations of cognition, as well as the
importance of humor in social engagement. As such, it is a promising and
important subject of study, with relevance for artificial intelligence and
human-computer interaction. Previous computational work on humor has mostly
operated at a coarse level of granularity, e.g., predicting whether an entire
sentence, paragraph, document, etc., is humorous. As a step toward deep
understanding of humor, we seek fine-grained models of attributes that make a
given text humorous. Starting from the observation that satirical news
headlines tend to resemble serious news headlines, we build and analyze a
corpus of satirical headlines paired with nearly identical but serious
headlines. The corpus is constructed via Unfun.me, an online game that
incentivizes players to make minimal edits to satirical headlines with the goal
of making other players believe the results are serious headlines. The edit
operations used to successfully remove humor pinpoint the words and concepts
that play a key role in making the original, satirical headline funny. Our
analysis reveals that the humor tends to reside toward the end of headlines,
and primarily in noun phrases, and that most satirical headlines follow a
certain logical pattern, which we term false analogy. Overall, this paper
deepens our understanding of the syntactic and semantic structure of satirical
news headlines and provides insights for building humor-producing systems.Comment: Proceedings of the 33rd AAAI Conference on Artificial Intelligence,
201
Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants
Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research
Code choice and code-switching in Swiss-German internet relay chat rooms
In the German-speaking regions of Switzerland, dialect is spoken by all social groups in most communicative situations, Standard German being used only when prescribed. Swiss dialects rarely appeared in written form before the 1980s, apart from the genre of dialect literature. Due to the growing acceptance of informal writing styles in many European languages, dialect is increasingly employed for written personal communication, in particular in computer-mediated communication (CMC). In Swiss Internet Relay Chat (IRC) rooms, varieties of German are used side by side as all chatters have a command of both standard and dialectal varieties. Depending on the channel, the proportion of dialectal contributions can be as high as 90 percent. The choice of a particular variety depends on both individual preference and on the predominant variety used within a specific thread. In this paper I take a quantitative approach to language variation in IRC and demonstrate how such an approach can help embed qualitative research on code-switching in CMC
"'Who are you?' - Learning person specific classifiers from video"
We investigate the problem of automatically labelling
faces of characters in TV or movie material with their
names, using only weak supervision from automaticallyaligned
subtitle and script text. Our previous work (Everingham
et al. [8]) demonstrated promising results on the
task, but the coverage of the method (proportion of video
labelled) and generalization was limited by a restriction to
frontal faces and nearest neighbour classification.
In this paper we build on that method, extending the coverage
greatly by the detection and recognition of characters
in profile views. In addition, we make the following contributions:
(i) seamless tracking, integration and recognition
of profile and frontal detections, and (ii) a character specific
multiple kernel classifier which is able to learn the features
best able to discriminate between the characters.
We report results on seven episodes of the TV series
âBuffy the Vampire Slayerâ, demonstrating significantly increased
coverage and performance with respect to previous
methods on this material
- âŚ