1,276 research outputs found
LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on color
Designing a logo is a long, complicated, and expensive process for any
designer. However, recent advancements in generative algorithms provide models
that could offer a possible solution. Logos are multi-modal, have very few
categorical properties, and do not have a continuous latent space. Yet,
conditional generative adversarial networks can be used to generate logos that
could help designers in their creative process. We propose LoGAN: an improved
auxiliary classifier Wasserstein generative adversarial neural network (with
gradient penalty) that is able to generate logos conditioned on twelve
different colors. In 768 generated instances (12 classes and 64 logos per
class), when looking at the most prominent color, the conditional generation
part of the model has an overall precision and recall of 0.8 and 0.7
respectively. LoGAN's results offer a first glance at how artificial
intelligence can be used to assist designers in their creative process and open
promising future directions, such as including more descriptive labels which
will provide a more exhaustive and easy-to-use system.Comment: 6 page, ICMLA1
Massive Open Online Courses Temporal Profiling for Dropout Prediction
Massive Open Online Courses (MOOCs) are attracting the attention of people
all over the world. Regardless the platform, numbers of registrants for online
courses are impressive but in the same time, completion rates are
disappointing. Understanding the mechanisms of dropping out based on the
learner profile arises as a crucial task in MOOCs, since it will allow
intervening at the right moment in order to assist the learner in completing
the course. In this paper, the dropout behaviour of learners in a MOOC is
thoroughly studied by first extracting features that describe the behavior of
learners within the course and then by comparing three classifiers (Logistic
Regression, Random Forest and AdaBoost) in two tasks: predicting which users
will have dropped out by a certain week and predicting which users will drop
out on a specific week. The former has showed to be considerably easier, with
all three classifiers performing equally well. However, the accuracy for the
second task is lower, and Logistic Regression tends to perform slightly better
than the other two algorithms. We found that features that reflect an active
attitude of the user towards the MOOC, such as submitting their assignment,
posting on the Forum and filling their Profile, are strong indicators of
persistence.Comment: 8 pages, ICTAI1
Adapting End-to-End Speech Recognition for Readable Subtitles
Automatic speech recognition (ASR) systems are primarily evaluated on
transcription accuracy. However, in some use cases such as subtitling, verbatim
transcription would reduce output readability given limited screen size and
reading time. Therefore, this work focuses on ASR with output compression, a
task challenging for supervised approaches due to the scarcity of training
data. We first investigate a cascaded system, where an unsupervised compression
model is used to post-edit the transcribed speech. We then compare several
methods of end-to-end speech recognition under output length constraints. The
experiments show that with limited data far less than needed for training a
model from scratch, we can adapt a Transformer-based ASR model to incorporate
both transcription and compression capabilities. Furthermore, the best
performance in terms of WER and ROUGE scores is achieved by explicitly modeling
the length constraints within the end-to-end ASR system.Comment: IWSLT 202
A retrieval-based dialogue system utilizing utterance and context embeddings
Finding semantically rich and computer-understandable representations for
textual dialogues, utterances and words is crucial for dialogue systems (or
conversational agents), as their performance mostly depends on understanding
the context of conversations. Recent research aims at finding distributed
vector representations (embeddings) for words, such that semantically similar
words are relatively close within the vector-space. Encoding the "meaning" of
text into vectors is a current trend, and text can range from words, phrases
and documents to actual human-to-human conversations. In recent research
approaches, responses have been generated utilizing a decoder architecture,
given the vector representation of the current conversation. In this paper, the
utilization of embeddings for answer retrieval is explored by using
Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor
(ANN) model, to find similar conversations in a corpus and rank possible
candidates. Experimental results on the well-known Ubuntu Corpus (in English)
and a customer service chat dataset (in Dutch) show that, in combination with a
candidate selection method, retrieval-based approaches outperform generative
ones and reveal promising future research directions towards the usability of
such a system.Comment: A shorter version is accepted at ICMLA2017 conference;
acknowledgement added; typos correcte
Accumulated Gradient Normalization
This work addresses the instability in asynchronous data parallel
optimization. It does so by introducing a novel distributed optimizer which is
able to efficiently optimize a centralized model under communication
constraints. The optimizer achieves this by pushing a normalized sequence of
first-order gradients to a parameter server. This implies that the magnitude of
a worker delta is smaller compared to an accumulated gradient, and provides a
better direction towards a minimum compared to first-order gradients, which in
turn also forces possible implicit momentum fluctuations to be more aligned
since we make the assumption that all workers contribute towards a single
minima. As a result, our approach mitigates the parameter staleness problem
more effectively since staleness in asynchrony induces (implicit) momentum, and
achieves a better convergence rate compared to other optimizers such as
asynchronous EASGD and DynSGD, which we show empirically.Comment: 16 pages, 12 figures, ACML201
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Encoder-decoder models provide a generic architecture for
sequence-to-sequence tasks such as speech recognition and translation. While
offline systems are often evaluated on quality metrics like word error rates
(WER) and BLEU, latency is also a crucial factor in many practical use-cases.
We propose three latency reduction techniques for chunk-based incremental
inference and evaluate their efficiency in terms of accuracy-latency trade-off.
On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by
sacrificing 1% WER (6% rel.) compared to offline transcription. Although our
experiments use the Transformer, the hypothesis selection strategies are
applicable to other encoder-decoder models. To avoid expensive re-computation,
we use a unidirectionally-attending encoder. After an adaptation procedure to
partial sequences, the unidirectional model performs on-par with the original
model. We further show that our approach is also applicable to low-latency
speech translation. On How2 English-Portuguese speech translation, we reduce
latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5%
rel.) compared to the offline system
Visual perception of colourful petals reminds us of classical fragments
Colour has attracted the interest and attention of many of the most gifted intellects of all time. Ideas of early thinkers were not -and could not have been- grasped on a scientific level without knowledge of a kind that lay far in the future. One character that is being considered is the colourful surfaces of living tissues, which could hardly have been visualized without a corresponding reference to the microscale parallel. Millions of years before man made manipulated synthetic structures, biological systems were using nanoscale architecture to produce striking optical effects. Here we show the microsculpture of the adaxial surface of flower petals from the asphodel, the Stork's-bill and the common poppy by using optical, scanning electron and atomic force microscopy. Microsculpture has been studied in leaves and pollen grains of higher plants. To the best of our knowledge imaging and nanoscale morphometry of petals has not been reported hitherto. Our findings on flower petals' microsculpture may be linked with aspects on colour revealed from ancient literature
Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset
In this work, we compare two simple methods of tagging scientific
publications with labels reflecting their content. As a first source of labels
Wikipedia is employed, second label set is constructed from the noun phrases
occurring in the analyzed corpus. We examine the statistical properties and the
effectiveness of both approaches on the dataset consisting of abstracts from
0.7 million of scientific documents deposited in the ArXiv preprint collection.
We believe that obtained tags can be later on applied as useful document
features in various machine learning tasks (document similarity, clustering,
topic modelling, etc.)
Social Emotion Mining Techniques for Facebook Posts Reaction Prediction
As of February 2016 Facebook allows users to express their experienced
emotions about a post by using five so-called `reactions'. This research paper
proposes and evaluates alternative methods for predicting these reactions to
user posts on public pages of firms/companies (like supermarket chains). For
this purpose, we collected posts (and their reactions) from Facebook pages of
large supermarket chains and constructed a dataset which is available for other
researches. In order to predict the distribution of reactions of a new post,
neural network architectures (convolutional and recurrent neural networks) were
tested using pretrained word embeddings. Results of the neural networks were
improved by introducing a bootstrapping approach for sentiment and emotion
mining on the comments for each post. The final model (a combination of neural
network and a baseline emotion miner) is able to predict the reaction
distribution on Facebook posts with a mean squared error (or misclassification
rate) of 0.135.Comment: 10 pages, 13 figures and accepted at ICAART 2018. (Dataset:
https://github.com/jerryspan/FacebookR
- …
