112 research outputs found
QuesNet: A Unified Representation for Heterogeneous Test Questions
Understanding learning materials (e.g. test questions) is a crucial issue in
online learning systems, which can promote many applications in education
domain. Unfortunately, many supervised approaches suffer from the problem of
scarce human labeled data, whereas abundant unlabeled resources are highly
underutilized. To alleviate this problem, an effective solution is to use
pre-trained representations for question understanding. However, existing
pre-training methods in NLP area are infeasible to learn test question
representations due to several domain-specific characteristics in education.
First, questions usually comprise of heterogeneous data including content text,
images and side information. Second, there exists both basic linguistic
information as well as domain logic and knowledge. To this end, in this paper,
we propose a novel pre-training method, namely QuesNet, for comprehensively
learning question representations. Specifically, we first design a unified
framework to aggregate question information with its heterogeneous inputs into
a comprehensive vector. Then we propose a two-level hierarchical pre-training
algorithm to learn better understanding of test questions in an unsupervised
way. Here, a novel holed language model objective is developed to extract
low-level linguistic features, and a domain-oriented objective is proposed to
learn high-level logic and knowledge. Moreover, we show that QuesNet has good
capability of being fine-tuned in many question-based tasks. We conduct
extensive experiments on large-scale real-world question data, where the
experimental results clearly demonstrate the effectiveness of QuesNet for
question understanding as well as its superior applicability
Crystallographic and Fluorescence Studies of the Interaction of Haloalkane Dehalogenase with Halide Ions. Studies with Halide Compounds Reveal a Halide Binding Site in the Active Site
Haloalkane dehalogenase from Xanthobacter autotrophicus GJ10 catalyzes the conversion of 1,2-dichloroethane to 2-chloroethanol and chloride without use of oxygen or cofactors. The active site is situated in an internal cavity, which is accesible from the solvent, even in the crystal. Crystal structures of the dehalogenase enzyme complexed with iodoacetamide, chloroacetamide, iodide, and chloride at pH 6.2 and 8.2 revealed a halide binding site between the ring NH's of two tryptophan residues, Trp-125 and Trp-175, located in the active site. The halide ion lies on the intersection of the planes of the rings of the tryptophans. The binding of iodide and chloride to haloalkane dehalogenase caused a strong decrease in protein fluorescence. The decrease could be fitted to a modified form of the Stern-Volmer equation, indicating the presence of fluorophors of different accessibilities. Halide binding was much stronger at pH 6.0 than at pH 8.2. Assuming ligand binding to Trp-125 and Trp-175 as the sole cause of fluorescence quenching, dissociation constants at pH 6.0 with chloride and iodide were calculated to be 0.49 +/- 0.04 and 0.074 +/- 0.007 mM, respectively. Detailed structural investigation showed that the halide binding site probably stabilizes the halide product as well as the negatively charged transition state occurring during the formation of the covalent intermediate
Towards Knowledge-Based Personalized Product Description Generation in E-commerce
Quality product descriptions are critical for providing competitive customer
experience in an e-commerce platform. An accurate and attractive description
not only helps customers make an informed decision but also improves the
likelihood of purchase. However, crafting a successful product description is
tedious and highly time-consuming. Due to its importance, automating the
product description generation has attracted considerable interests from both
research and industrial communities. Existing methods mainly use templates or
statistical methods, and their performance could be rather limited. In this
paper, we explore a new way to generate the personalized product description by
combining the power of neural networks and knowledge base. Specifically, we
propose a KnOwledge Based pErsonalized (or KOBE) product description generation
model in the context of e-commerce. In KOBE, we extend the encoder-decoder
framework, the Transformer, to a sequence modeling formulation using
self-attention. In order to make the description both informative and
personalized, KOBE considers a variety of important factors during text
generation, including product aspects, user categories, and knowledge base,
etc. Experiments on real-world datasets demonstrate that the proposed method
out-performs the baseline on various metrics. KOBE can achieve an improvement
of 9.7% over state-of-the-arts in terms of BLEU. We also present several case
studies as the anecdotal evidence to further prove the effectiveness of the
proposed approach. The framework has been deployed in Taobao, the largest
online e-commerce platform in China.Comment: KDD 2019 Camera-ready. Website:
https://sites.google.com/view/kobe201
What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation
Heavily pre-trained transformer models such as BERT have recently shown to be
remarkably powerful at language modelling by achieving impressive results on
numerous downstream tasks. It has also been shown that they are able to
implicitly store factual knowledge in their parameters after pre-training.
Understanding what the pre-training procedure of LMs actually learns is a
crucial step for using and improving them for Conversational Recommender
Systems (CRS). We first study how much off-the-shelf pre-trained BERT "knows"
about recommendation items such as books, movies and music. In order to analyze
the knowledge stored in BERT's parameters, we use different probes that require
different types of knowledge to solve, namely content-based and
collaborative-based. Content-based knowledge is knowledge that requires the
model to match the titles of items with their content information, such as
textual descriptions and genres. In contrast, collaborative-based knowledge
requires the model to match items with similar ones, according to community
interactions such as ratings. We resort to BERT's Masked Language Modelling
head to probe its knowledge about the genre of items, with cloze style prompts.
In addition, we employ BERT's Next Sentence Prediction head and
representations' similarity to compare relevant and non-relevant search and
recommendation query-document inputs to explore whether BERT can, without any
fine-tuning, rank relevant items first. Finally, we study how BERT performs in
a conversational recommendation downstream task. Overall, our analyses and
experiments show that: (i) BERT has knowledge stored in its parameters about
the content of books, movies and music; (ii) it has more content-based
knowledge than collaborative-based knowledge; and (iii) fails on conversational
recommendation when faced with adversarial data.Comment: Accepted for publication at RecSys'2
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods
Modeling visual search not only offers an opportunity to predict the
usability of an interface before actually testing it on real users, but also
advances scientific understanding about human behavior. In this work, we first
conduct a set of analyses on a large-scale dataset of visual search tasks on
realistic webpages. We then present a deep neural network that learns to
predict the scannability of webpage content, i.e., how easy it is for a user to
find a specific target. Our model leverages both heuristic-based features such
as target size and unstructured features such as raw image pixels. This
approach allows us to model complex interactions that might be involved in a
realistic visual search task, which can not be easily achieved by traditional
analytical models. We analyze the model behavior to offer our insights into how
the salience map learned by the model aligns with human intuition and how the
learned semantic representation of each target type relates to its visual
search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System
Multistability and dynamic transitions of intracellular Min protein patterns
Cells owe their internal organization to self-organized protein patterns, which originate and adapt to growth and external stimuli via a process that is as complex as it is little understood. Here, we study the emergence, stability, and state transitions of multistable Min protein oscillation patterns in live Escherichia coli bacteria during growth up to defined large dimensions. De novo formation of patterns from homogenous starting conditions is observed and studied both experimentally and in simulations. A new theoretical approach is developed for probing pattern stability under perturbations. Quantitative experiments and simulations show that, once established, Min oscillations tolerate a large degree of intracellular heterogeneity, allowing distinctly different patterns to persist in different cells with the same geometry. Min patterns maintain their axes for hours in experiments, despite imperfections, expansion, and changes in cell shape during continuous cell growth. Transitions between multistable Min patterns are found to be rare events induced by strong intracellular perturbations. The instances of multistability studied here are the combined outcome of boundary growth and strongly nonlinear kinetics, which are characteristic of the reaction-diffusion patterns that pervade biology at many scales
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision
We study the open-domain named entity recognition (NER) problem under distant
supervision. The distant supervision, though does not require large amounts of
manual annotations, yields highly incomplete and noisy distant labels via
external knowledge bases. To address this challenge, we propose a new
computational framework -- BOND, which leverages the power of pre-trained
language models (e.g., BERT and RoBERTa) to improve the prediction performance
of NER models. Specifically, we propose a two-stage training algorithm: In the
first stage, we adapt the pre-trained language model to the NER tasks using the
distant labels, which can significantly improve the recall and precision; In
the second stage, we drop the distant labels, and propose a self-training
approach to further improve the model performance. Thorough experiments on 5
benchmark datasets demonstrate the superiority of BOND over existing distantly
supervised NER methods. The code and distantly labeled data have been released
in https://github.com/cliang1453/BOND.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD '20
Specificity and kinetics of haloalkane dehalogenase
Haloalkane dehalogenase converts halogenated alkanes to their corresponding alcohols, The active site is buried inside the protein and lined with hydrophobic residues, The reaction proceeds via a covalent substrate-enzyme complex, This paper describes a steady-state and pre-steady-state kinetic analysis of the conversion of a number of substrates of the dehalogenase, The kinetic mechanism for the ''natural'' substrate 1,2-dichloroethane and for the brominated analog and nematocide 1,2-dibromoethane are given, In general, brominated substrates had a lower K-m, but a similar k(cat) than the chlorinated analogs, The rate of C-Br bond cleavage was higher than the rate of C-CL bond cleavage, which is in agreement with the leaving group abilities of these halogens, The lower K-m for brominated compounds therefore originates both from the higher rate of C-Br bond cleavage and from a lower K-m for bromo-compounds, However, the rate-determining step in the conversion (k(cat)) of 1,2-dibromoethane and 1,2-dichloroethane was found to be release of the charged halide ion out of the active site cavity, explaining the different K-m but similar k(cat) values for these compounds, The study provides a basis for the analysis of rate-determining steps in the hydrolysis of various environmentally important substrates
Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus
The ability to ask questions is important in both human and machine
intelligence. Learning to ask questions helps knowledge acquisition, improves
question-answering and machine reading comprehension tasks, and helps a chatbot
to keep the conversation flowing with a human. Existing question generation
models are ineffective at generating a large amount of high-quality
question-answer pairs from unstructured text, since given an answer and an
input passage, question generation is inherently a one-to-many mapping. In this
paper, we propose Answer-Clue-Style-aware Question Generation (ACS-QG), which
aims at automatically generating high-quality and diverse question-answer pairs
from unlabeled text corpus at scale by imitating the way a human asks
questions. Our system consists of: i) an information extractor, which samples
from the text multiple types of assistive information to guide question
generation; ii) neural question generators, which generate diverse and
controllable questions, leveraging the extracted assistive information; and
iii) a neural quality controller, which removes low-quality generated data
based on text entailment. We compare our question generation models with
existing approaches and resort to voluntary human evaluation to assess the
quality of the generated question-answer pairs. The evaluation results suggest
that our system dramatically outperforms state-of-the-art neural question
generation models in terms of the generation quality, while being scalable in
the meantime. With models trained on a relatively smaller amount of data, we
can generate 2.8 million quality-assured question-answer pairs from a million
sentences found in Wikipedia.Comment: Accepted by The Web Conference 2020 (WWW 2020) as full paper (oral
presentation
Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning
Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding
strategies without any presumptions, has become one of the research hotspots
for adaptive streaming. However, it typically suffers from several issues,
i.e., low sample efficiency and lack of awareness of the video quality
information. In this paper, we propose Comyco, a video quality-aware ABR
approach that enormously improves the learning-based methods by tackling the
above issues. Comyco trains the policy via imitating expert trajectories given
by the instant solver, which can not only avoid redundant exploration but also
make better use of the collected samples. Meanwhile, Comyco attempts to pick
the chunk with higher perceptual video qualities rather than video bitrates. To
achieve this, we construct Comyco's neural network architecture, video datasets
and QoE metrics with video quality features. Using trace-driven and real-world
experiments, we demonstrate significant improvements of Comyco's sample
efficiency in comparison to prior work, with 1700x improvements in terms of the
number of samples required and 16x improvements on training time required.
Moreover, results illustrate that Comyco outperforms previously proposed
methods, with the improvements on average QoE of 7.5% - 16.79%. Especially,
Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average
video quality under the same rebuffering time.Comment: ACM Multimedia 201
- …