104 research outputs found
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
This paper describes SentencePiece, a language-independent subword tokenizer
and detokenizer designed for Neural-based text processing, including Neural
Machine Translation. It provides open-source C++ and Python implementations for
subword units. While existing subword segmentation tools assume that the input
is pre-tokenized into word sequences, SentencePiece can train subword models
directly from raw sentences, which allows us to make a purely end-to-end and
language independent system. We perform a validation experiment of NMT on
English-Japanese machine translation, and find that it is possible to achieve
comparable accuracy to direct subword training from raw sentences. We also
compare the performance of subword training and segmentation with various
configurations. SentencePiece is available under the Apache 2 license at
https://github.com/google/sentencepiece.Comment: Accepted as a demo paper at EMNLP201
Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative
Accurately predicting conversions in advertisements is generally a
challenging task, because such conversions do not occur frequently. In this
paper, we propose a new framework to support creating high-performing ad
creatives, including the accurate prediction of ad creative text conversions
before delivering to the consumer. The proposed framework includes three key
ideas: multi-task learning, conditional attention, and attention highlighting.
Multi-task learning is an idea for improving the prediction accuracy of
conversion, which predicts clicks and conversions simultaneously, to solve the
difficulty of data imbalance. Furthermore, conditional attention focuses
attention of each ad creative with the consideration of its genre and target
gender, thus improving conversion prediction accuracy. Attention highlighting
visualizes important words and/or phrases based on conditional attention. We
evaluated the proposed framework with actual delivery history data (14,000
creatives displayed more than a certain number of times from Gunosy Inc.), and
confirmed that these ideas improve the prediction performance of conversions,
and visualize noteworthy words according to the creatives' attributes.Comment: 9 pages, 6 figures. Accepted at The 25th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD 2019) as an applied data science
pape
Causal relationship between eWOM topics and profit of rural tourism at Japanese Roadside Stations "MICHINOEKI"
Affected by urbanization, centralization and the decrease of overall
population, Japan has been making efforts to revitalize the rural areas across
the country. One particular effort is to increase tourism to these rural areas
via regional branding, using local farm products as tourist attractions across
Japan. Particularly, a program subsidized by the government called Michinoeki,
which stands for 'roadside station', was created 20 years ago and it strives to
provide a safe and comfortable space for cultural interaction between road
travelers and the local community, as well as offering refreshment, and
relevant information to travelers. However, despite its importance in the
revitalization of the Japanese economy, studies with newer technologies and
methodologies are lacking. Using sales data from establishments in the Kyushu
area of Japan, we used Support Vector to classify content from Twitter into
relevant topics and studied their causal relationship to the sales for each
establishment using LiNGAM, a linear non-gaussian acyclic model built for
causal structure analysis, to perform an improved market analysis considering
more than just correlation. Under the hypotheses stated by the LiNGAM model, we
discovered a positive causal relationship between the number of tweets
mentioning those establishments, specially mentioning deserts, a need for
better access and traf^ic options, and a potentially untapped customer base in
motorcycle biker groups
Product gas analysis of laminar premixed ammonia-methane flames in stagnation flows
Ammonia is a promising hydrogen energy vector and a carbon-free fuel; hence the use of ammonia-hydrocarbon fuel blends can be viewed as an intermediate step towards a hydrogen economy. The characterization of methane-ammonia emissions is essential for designing combustors for a broader range of fuels while fulfilling strict NOx emission requirements and global warming targets. The product gas trends of laminar premixed ammonia-methane flames at atmospheric pressure were studied for 0.1 to 0.6 ammonia heat ratios at the operable range of equivalence ratios. Gases including NO, N2O, NO2, HCN, CO and NH3 were measured using the dual dilution gas method and compared against numerical predictions. Experimental results showed the highest NO emissions at approximately 8,000 ppm for the 0.3 and 0.4 ammonia heat ratios, reducing twofold at the extreme heat ratio conditions. The optimal condition for reducing NOx emissions while maintaining low unburnt NH3 was found to occur at a 1.20 equivalence ratio for higher ammonia ratios, moving incrementally closer towards 1.35 as the methane ratio was increased. These results can aid a further reaction model analysis due to the availability of stain stabilised stagnation flame models in numerical software
Numerical and experimental study of product gas characteristics in premixed ammonia/methane/air laminar flames stabilised in a stagnation flow
The adoption of ammonia/hydrocarbon fuel blends can be viewed as an intermediate step towards a hydrogen economy, hence the characterization of methane/ammonia flame product gas trends is essential for designing combustors for a broader range of low-carbon fuel blends while fulfilling strict NOx requirements. This paper describes the product gas content of laminar premixed ammonia/methane flames for a range of equivalence ratios and ammonia heat ratios ranging from 10% to 60%, using a strain stabilized burner at atmospheric pressure and room temperature. The optimal condition to reduce NOx emissions while maintaining below 100 ppm of unburnt NH3 emissions was found to be at equivalence ratio of 1.20 for higher ammonia ratios, moving incrementally closer over 1.35 as the methane fuel content was increased. Meanwhile, the highest measured NO values were ∼6,950 ppm at an equivalence ratio of 0.9, peaking at heat ratios of 30% to 40% at this equivalence ratio. Detailed reaction mechanisms were evaluated against the experimental data and rate constants of NO production/consumption steps featuring both NH and HNO intermediates and thermal NOx reactions were updated for Okafor's mechanism. Changes in reaction rate constants improved the mechanism accuracy for NO emissions in lean to stoichiometric flames. Meanwhile, in the rich region, modelled NO values were less responsive to changes in reaction constants, suggesting the need for an alternative approach to improve NO predictions for rich, high methane content flames. However, N2O performance in the rich region could be improved, highlighting the significance of the HNO+CONH+CO2 reaction
GREEK-BERT: The Greeks visiting Sesame Street
Transformer-based language models, such as BERT and its variants, have
achieved state-of-the-art performance in several downstream natural language
processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE).
However, these models have mostly been applied to the resource-rich English
language. In this paper, we present GREEK-BERT, a monolingual BERT-based
language model for modern Greek. We evaluate its performance in three NLP
tasks, i.e., part-of-speech tagging, named entity recognition, and natural
language inference, obtaining state-of-the-art performance. Interestingly, in
two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based
models (M-BERT, XLM-R), as well as shallower neural baselines operating on
pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we
make both GREEK-BERT and our training code publicly available, along with code
illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We
expect these resources to boost NLP research and applications for modern Greek.Comment: 8 pages, 1 figure, 11th Hellenic Conference on Artificial
Intelligence (SETN 2020
Revisiting Low Resource Status of Indian Languages in Machine Translation
Indian language machine translation performance is hampered due to the lack
of large scale multi-lingual sentence aligned corpora and robust benchmarks.
Through this paper, we provide and analyse an automated framework to obtain
such a corpus for Indian language neural machine translation (NMT) systems. Our
pipeline consists of a baseline NMT system, a retrieval module, and an
alignment module that is used to work with publicly available websites such as
press releases by the government. The main contribution towards this effort is
to obtain an incremental method that uses the above pipeline to iteratively
improve the size of the corpus as well as improve each of the components of our
system. Through our work, we also evaluate the design choices such as the
choice of pivoting language and the effect of iterative incremental increase in
corpus size. Our work in addition to providing an automated framework also
results in generating a relatively larger corpus as compared to existing
corpora that are available for Indian languages. This corpus helps us obtain
substantially improved results on the publicly available WAT evaluation
benchmark and other standard evaluation benchmarks.Comment: 10 pages, few figures, Preprint under revie
Study on N2O production mechanisms of lean ammonia/hydrogen/air premixed laminar flames
Application of ammonia as fuel is a potential candidate to achieve carbon neutrality. As laminar burning velocity of ammonia is slow, hydrogen addition is also considered to improve combustion characteristics with no carbon emission. In this study, we experimentally investigated product gas characteristics of strain stabilized ammonia/hydrogen/air premixed laminar flames under atmospheric pressure for various equivalence ratios. In a lean condition, large amount of N2O production was observed. To clarify N2O production mechanisms, numerical simulations were conducted using a reaction mechanism developed by Gotama et al. In the Gotama reaction mechanism, major N2O production path was NH+NO=N2O+H and major N2O consumption paths were N2O+H=N2+OH and N2O(+M)=N2+O(+M). It was clarified that a decrease in N2O consumption via N2O(+M)=N2+O(+M) increases N2O emission for lean and strained conditions
- …