48,904 research outputs found
Large Margin Random Forests On Mixed Type Data
Incorporating various sources of biological information is important for biological discovery. For example, genes have a multi-view representation. They can be represented by features such as sequence length and physical-chemical properties. They can also be represented by pairwise similarities, gene expression levels, and phylogenetics position. Hence, the types vary from numerical features to categorical features. An efficient way of learning from observations with a multi-view representation of mixed type of data is thus important. We propose a large margin random forests classification approach based on random forests proximity. Random forests accommodate mixed data types naturally. Large margin classifiers are obtained from the random forests proximity kernel or its derivative kernels. We test the approach on four biological datasets. The performance is promising compared with other state of the art methods including support vector machines (SVMs) and Random Forests classifiers. It demonstrates high potential in the discovery of functional roles of genes and proteins. We also examine the effects of mixed type of data on the algorithms used
Bayesian inference on group differences in multivariate categorical data
Multivariate categorical data are common in many fields. We are motivated by
election polls studies assessing evidence of changes in voters opinions with
their candidates preferences in the 2016 United States Presidential primaries
or caucuses. Similar goals arise routinely in several applications, but current
literature lacks a general methodology which combines flexibility, efficiency,
and tractability in testing for group differences in multivariate categorical
data at different---potentially complex---scales. We address this goal by
leveraging a Bayesian representation which factorizes the joint probability
mass function for the group variable and the multivariate categorical data as
the product of the marginal probabilities for the groups, and the conditional
probability mass function of the multivariate categorical data, given the group
membership. To enhance flexibility, we define the conditional probability mass
function of the multivariate categorical data via a group-dependent mixture of
tensor factorizations, thus facilitating dimensionality reduction and borrowing
of information, while providing tractable procedures for computation, and
accurate tests assessing global and local group differences. We compare our
methods with popular competitors, and discuss improved performance in
simulations and in American election polls studies
COTA: Improving the Speed and Accuracy of Customer Support through Ranking and Deep Networks
For a company looking to provide delightful user experiences, it is of
paramount importance to take care of any customer issues. This paper proposes
COTA, a system to improve speed and reliability of customer support for end
users through automated ticket classification and answers selection for support
representatives. Two machine learning and natural language processing
techniques are demonstrated: one relying on feature engineering (COTA v1) and
the other exploiting raw signals through deep learning architectures (COTA v2).
COTA v1 employs a new approach that converts the multi-classification task into
a ranking problem, demonstrating significantly better performance in the case
of thousands of classes. For COTA v2, we propose an Encoder-Combiner-Decoder, a
novel deep learning architecture that allows for heterogeneous input and output
feature types and injection of prior knowledge through network architecture
choices. This paper compares these models and their variants on the task of
ticket classification and answer selection, showing model COTA v2 outperforms
COTA v1, and analyzes their inner workings and shortcomings. Finally, an A/B
test is conducted in a production setting validating the real-world impact of
COTA in reducing issue resolution time by 10 percent without reducing customer
satisfaction
Bayesian modeling of networks in complex business intelligence problems
Complex network data problems are increasingly common in many fields of
application. Our motivation is drawn from strategic marketing studies
monitoring customer choices of specific products, along with co-subscription
networks encoding multiple purchasing behavior. Data are available for several
agencies within the same insurance company, and our goal is to efficiently
exploit co-subscription networks to inform targeted advertising of cross-sell
strategies to currently mono-product customers. We address this goal by
developing a Bayesian hierarchical model, which clusters agencies according to
common mono-product customer choices and co-subscription networks. Within each
cluster, we efficiently model customer behavior via a cluster-dependent mixture
of latent eigenmodels. This formulation provides key information on
mono-product customer choices and multiple purchasing behavior within each
cluster, informing targeted cross-sell strategies. We develop simple algorithms
for tractable inference, and assess performance in simulations and an
application to business intelligence
Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms
Many, if not most network analysis algorithms have been designed specifically
for single-relational networks; that is, networks in which all edges are of the
same type. For example, edges may either represent "friendship," "kinship," or
"collaboration," but not all of them together. In contrast, a multi-relational
network is a network with a heterogeneous set of edge labels which can
represent relationships of various types in a single data structure. While
multi-relational networks are more expressive in terms of the variety of
relationships they can capture, there is a need for a general framework for
transferring the many single-relational network analysis algorithms to the
multi-relational domain. It is not sufficient to execute a single-relational
network analysis algorithm on a multi-relational network by simply ignoring
edge labels. This article presents an algebra for mapping multi-relational
networks to single-relational networks, thereby exposing them to
single-relational network analysis algorithms.Comment: ISSN:1751-157
- …