Search CORE

48,904 research outputs found

Large Margin Random Forests On Mixed Type Data

Author: Liu Sheng
Publication venue: eGrove
Publication date: 01/01/2011
Field of study

Incorporating various sources of biological information is important for biological discovery. For example, genes have a multi-view representation. They can be represented by features such as sequence length and physical-chemical properties. They can also be represented by pairwise similarities, gene expression levels, and phylogenetics position. Hence, the types vary from numerical features to categorical features. An efficient way of learning from observations with a multi-view representation of mixed type of data is thus important. We propose a large margin random forests classification approach based on random forests proximity. Random forests accommodate mixed data types naturally. Large margin classifiers are obtained from the random forests proximity kernel or its derivative kernels. We test the approach on four biological datasets. The performance is promising compared with other state of the art methods including support vector machines (SVMs) and Random Forests classifiers. It demonstrates high potential in the discovery of functional roles of genes and proteins. We also examine the effects of mixed type of data on the algorithms used

eGrove (Univ. of Mississippi)

Bayesian inference on group differences in multivariate categorical data

Author: Durante Daniele
Russo Massimiliano
Scarpa Bruno
Publication venue
Publication date: 09/08/2017
Field of study

Multivariate categorical data are common in many fields. We are motivated by election polls studies assessing evidence of changes in voters opinions with their candidates preferences in the 2016 United States Presidential primaries or caucuses. Similar goals arise routinely in several applications, but current literature lacks a general methodology which combines flexibility, efficiency, and tractability in testing for group differences in multivariate categorical data at different---potentially complex---scales. We address this goal by leveraging a Bayesian representation which factorizes the joint probability mass function for the group variable and the multivariate categorical data as the product of the marginal probabilities for the groups, and the conditional probability mass function of the multivariate categorical data, given the group membership. To enhance flexibility, we define the conditional probability mass function of the multivariate categorical data via a group-dependent mixture of tensor factorizations, thus facilitating dimensionality reduction and borrowing of information, while providing tractable procedures for computation, and accurate tests assessing global and local group differences. We compare our methods with popular competitors, and discuss improved performance in simulations and in American election polls studies

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Archivio istituzionale della ricerca - Università di Padova

COTA: Improving the Speed and Accuracy of Customer Support through Ranking and Deep Networks

Author: Bahdanau Dzmitry
Diederik
Hakkani-Tür Dilek
Ioffe Sergey
Liang Chen
McCulloh Ian
Rocktäschel Tim
Sarikaya R.
Sutskever Ilya
van der Maaten Laurens
Zhang Xiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2018
Field of study

For a company looking to provide delightful user experiences, it is of paramount importance to take care of any customer issues. This paper proposes COTA, a system to improve speed and reliability of customer support for end users through automated ticket classification and answers selection for support representatives. Two machine learning and natural language processing techniques are demonstrated: one relying on feature engineering (COTA v1) and the other exploiting raw signals through deep learning architectures (COTA v2). COTA v1 employs a new approach that converts the multi-classification task into a ranking problem, demonstrating significantly better performance in the case of thousands of classes. For COTA v2, we propose an Encoder-Combiner-Decoder, a novel deep learning architecture that allows for heterogeneous input and output feature types and injection of prior knowledge through network architecture choices. This paper compares these models and their variants on the task of ticket classification and answer selection, showing model COTA v2 outperforms COTA v1, and analyzes their inner workings and shortcomings. Finally, an A/B test is conducted in a production setting validating the real-world impact of COTA in reducing issue resolution time by 10 percent without reducing customer satisfaction

arXiv.org e-Print Archive

Crossref

Bayesian modeling of networks in complex business intelligence problems

Author: Airoldi
Aldous
Azzalini
Banerjee
Bhattacharya
Bigelow
Dunson
Escobar
Fruchterman
Gershman
Griffiths
Hjort
Hoff
Hoff
Kaishev
Kamakura
Kamakura
Lau
Matiş
Medvedovic
Neal
Nowicki
Polson
Rousseau
Stephens
Thuring
Thuring
Verhoef
Publication venue: 'Wiley'
Publication date: 28/03/2016
Field of study

Complex network data problems are increasingly common in many fields of application. Our motivation is drawn from strategic marketing studies monitoring customer choices of specific products, along with co-subscription networks encoding multiple purchasing behavior. Data are available for several agencies within the same insurance company, and our goal is to efficiently exploit co-subscription networks to inform targeted advertising of cross-sell strategies to currently mono-product customers. We address this goal by developing a Bayesian hierarchical model, which clusters agencies according to common mono-product customer choices and co-subscription networks. Within each cluster, we efficiently model customer behavior via a cluster-dependent mixture of latent eigenmodels. This formulation provides key information on mono-product customer choices and multiple purchasing behavior within each cluster, informing targeted cross-sell strategies. We develop simple algorithms for tractable inference, and assess performance in simulations and an application to business intelligence

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Crossref

Archivio istituzionale della ricerca - Università di Padova

Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms

Author: Anderson
Bavelas
Bland
Bonacich
Brin
Carre
Chung
Cohen
Collins
Crestani
Crestani
Dijkstra
Harary
Joshua Shinavier
Marko A. Rodriguez
McPherson
Miller
Sowa
Publication venue: 'Elsevier BV'
Publication date: 09/12/2009
Field of study

Many, if not most network analysis algorithms have been designed specifically for single-relational networks; that is, networks in which all edges are of the same type. For example, edges may either represent "friendship," "kinship," or "collaboration," but not all of them together. In contrast, a multi-relational network is a network with a heterogeneous set of edge labels which can represent relationships of various types in a single data structure. While multi-relational networks are more expressive in terms of the variety of relationships they can capture, there is a need for a general framework for transferring the many single-relational network analysis algorithms to the multi-relational domain. It is not sufficient to execute a single-relational network analysis algorithm on a multi-relational network by simply ignoring edge labels. This article presents an algebra for mapping multi-relational networks to single-relational networks, thereby exposing them to single-relational network analysis algorithms.Comment: ISSN:1751-157

arXiv.org e-Print Archive

Crossref