8,389 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Two-layer classification and distinguished representations of users and documents for grouping and authorship identification
Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller number of authors. Given an anonymous document, the primary layer detects the group to which the document belongs. Then, the secondary layer determines the particular author inside the selected group. In order to extract the groups linking similar authors, clustering is applied over users rather than documents. Hence, the second novelty of this paper is introducing a new user representation that is different from document representation. Without the proposed user representation, the clustering over documents will result in documents of author(s) distributed over several clusters, instead of a single cluster membership for each author. Third, the extracted clusters are descriptive and meaningful of their users as the dimensions have psychological backgrounds. For authorship identification, the documents are labelled with the extracted groups and fed into machine learning to build classification models that predicts the group and author of a given document. The results show that the documents are highly correlated with the extracted corresponding groups, and the proposed model can be accurately trained to determine the group and the author identity
Interacting Attention-gated Recurrent Networks for Recommendation
Capturing the temporal dynamics of user preferences over items is important
for recommendation. Existing methods mainly assume that all time steps in
user-item interaction history are equally relevant to recommendation, which
however does not apply in real-world scenarios where user-item interactions can
often happen accidentally. More importantly, they learn user and item dynamics
separately, thus failing to capture their joint effects on user-item
interactions. To better model user and item dynamics, we present the
Interacting Attention-gated Recurrent Network (IARN) which adopts the attention
model to measure the relevance of each time step. In particular, we propose a
novel attention scheme to learn the attention scores of user and item history
in an interacting way, thus to account for the dependencies between user and
item dynamics in shaping user-item interactions. By doing so, IARN can
selectively memorize different time steps of a user's history when predicting
her preferences over different items. Our model can therefore provide
meaningful interpretations for recommendation results, which could be further
enhanced by auxiliary features. Extensive validation on real-world datasets
shows that IARN consistently outperforms state-of-the-art methods.Comment: Accepted by ACM International Conference on Information and Knowledge
Management (CIKM), 201
CHR(PRISM)-based Probabilistic Logic Learning
PRISM is an extension of Prolog with probabilistic predicates and built-in
support for expectation-maximization learning. Constraint Handling Rules (CHR)
is a high-level programming language based on multi-headed multiset rewrite
rules.
In this paper, we introduce a new probabilistic logic formalism, called
CHRiSM, based on a combination of CHR and PRISM. It can be used for high-level
rapid prototyping of complex statistical models by means of "chance rules". The
underlying PRISM system can then be used for several probabilistic inference
tasks, including probability computation and parameter learning. We define the
CHRiSM language in terms of syntax and operational semantics, and illustrate it
with examples. We define the notion of ambiguous programs and define a
distribution semantics for unambiguous programs. Next, we describe an
implementation of CHRiSM, based on CHR(PRISM). We discuss the relation between
CHRiSM and other probabilistic logic programming languages, in particular PCHR.
Finally we identify potential application domains
VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback
Modern recommender systems model people and items by discovering or `teasing
apart' the underlying dimensions that encode the properties of items and users'
preferences toward them. Critically, such dimensions are uncovered based on
user feedback, often in implicit form (such as purchase histories, browsing
logs, etc.); in addition, some recommender systems make use of side
information, such as product attributes, temporal information, or review text.
However one important feature that is typically ignored by existing
personalized recommendation and ranking methods is the visual appearance of the
items being considered. In this paper we propose a scalable factorization model
to incorporate visual signals into predictors of people's opinions, which we
apply to a selection of large, real-world datasets. We make use of visual
features extracted from product images using (pre-trained) deep networks, on
top of which we learn an additional layer that uncovers the visual dimensions
that best explain the variation in people's feedback. This not only leads to
significantly more accurate personalized ranking methods, but also helps to
alleviate cold start issues, and qualitatively to analyze the visual dimensions
that influence people's opinions.Comment: AAAI'1
- ā¦