848 research outputs found
Optimal client recommendation for market makers in illiquid financial products
The process of liquidity provision in financial markets can result in
prolonged exposure to illiquid instruments for market makers. In this case,
where a proprietary position is not desired, pro-actively targeting the right
client who is likely to be interested can be an effective means to offset this
position, rather than relying on commensurate interest arising through natural
demand. In this paper, we consider the inference of a client profile for the
purpose of corporate bond recommendation, based on typical recorded information
available to the market maker. Given a historical record of corporate bond
transactions and bond meta-data, we use a topic-modelling analogy to develop a
probabilistic technique for compiling a curated list of client recommendations
for a particular bond that needs to be traded, ranked by probability of
interest. We show that a model based on Latent Dirichlet Allocation offers
promising performance to deliver relevant recommendations for sales traders.Comment: 12 pages, 3 figures, 1 tabl
Detecting Large Concept Extensions for Conceptual Analysis
When performing a conceptual analysis of a concept, philosophers are
interested in all forms of expression of a concept in a text---be it direct or
indirect, explicit or implicit. In this paper, we experiment with topic-based
methods of automating the detection of concept expressions in order to
facilitate philosophical conceptual analysis. We propose six methods based on
LDA, and evaluate them on a new corpus of court decision that we had annotated
by experts and non-experts. Our results indicate that these methods can yield
important improvements over the keyword heuristic, which is often used as a
concept detection heuristic in many contexts. While more work remains to be
done, this indicates that detecting concepts through topics can serve as a
general-purpose method for at least some forms of concept expression that are
not captured using naive keyword approaches
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
This paper analyzes consumer choices over lunchtime restaurants using data
from a sample of several thousand anonymous mobile phone users in the San
Francisco Bay Area. The data is used to identify users' approximate typical
morning location, as well as their choices of lunchtime restaurants. We build a
model where restaurants have latent characteristics (whose distribution may
depend on restaurant observables, such as star ratings, food category, and
price range), each user has preferences for these latent characteristics, and
these preferences are heterogeneous across users. Similarly, each item has
latent characteristics that describe users' willingness to travel to the
restaurant, and each user has individual-specific preferences for those latent
characteristics. Thus, both users' willingness to travel and their base utility
for each restaurant vary across user-restaurant pairs. We use a Bayesian
approach to estimation. To make the estimation computationally feasible, we
rely on variational inference to approximate the posterior distribution, as
well as stochastic gradient descent as a computational approach. Our model
performs better than more standard competing models such as multinomial logit
and nested logit models, in part due to the personalization of the estimates.
We analyze how consumers re-allocate their demand after a restaurant closes to
nearby restaurants versus more distant restaurants with similar
characteristics, and we compare our predictions to actual outcomes. Finally, we
show how the model can be used to analyze counterfactual questions such as what
type of restaurant would attract the most consumers in a given location.Marie Curie Fellowship from the European Commission (H2020 programme, grant agreement 706760)
Klink-2: integrating multiple web sources to generate semantic topic networks
The amount of scholarly data available on the web is steadily increasing, enabling different types of analytics which can provide important insights into the research activity. In order to make sense of and explore this large-scale body of knowledge we need an accurate, comprehensive and up-to-date ontology of research topics. Unfortunately, human crafted classifications do not satisfy these criteria, as they evolve too slowly and tend to be too coarse-grained. Current automated methods for generating ontologies of research areas also present a number of limitations, such as: i) they do not consider the rich amount of indirect statistical and semantic relationships, which can help to understand the relation between two topics – e.g., the fact that two research areas are associated with a similar set of venues or technologies; ii) they do not distinguish between different kinds of hierarchical relationships; and iii) they are not able to handle effectively ambiguous topics characterized by a noisy set of relationships. In this paper we present Klink-2, a novel approach which improves on our earlier work on automatic generation of semantic topic networks and addresses the aforementioned limitations by taking advantage of a variety of knowledge sources available on the web. In particular, Klink-2 analyses networks of research entities (including papers, authors, venues, and technologies) to infer three kinds of semantic relationships between topics. It also identifies ambiguous keywords (e.g., “ontology”) and separates them into the appropriate distinct topics – e.g., “ontology/philosophy” vs. “ontology/semantic web”. Our experimental evaluation shows that the ability of Klink-2 to integrate a high number of data sources and to generate topics with accurate contextual meaning yields significant improvements over other algorithms in terms of both precision and recall
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 26K topics and 226K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data
Nonparametric Hierarchical Clustering of Functional Data
In this paper, we deal with the problem of curves clustering. We propose a
nonparametric method which partitions the curves into clusters and discretizes
the dimensions of the curve points into intervals. The cross-product of these
partitions forms a data-grid which is obtained using a Bayesian model selection
approach while making no assumptions regarding the curves. Finally, a
post-processing technique, aiming at reducing the number of clusters in order
to improve the interpretability of the clustering, is proposed. It consists in
optimally merging the clusters step by step, which corresponds to an
agglomerative hierarchical classification whose dissimilarity measure is the
variation of the criterion. Interestingly this measure is none other than the
sum of the Kullback-Leibler divergences between clusters distributions before
and after the merges. The practical interest of the approach for functional
data exploratory analysis is presented and compared with an alternative
approach on an artificial and a real world data set
Skew-Unfolding the Skorokhod Reflection of a Continuous Semimartingale
The Skorokhod reflection of a continuous semimartingale is unfolded, in a
possibly skewed manner, into another continuous semimartingale on an enlarged
probability space according to the excursion-theoretic methodology of Prokaj
(2009). This is done in terms of a skew version of the Tanaka equation, whose
properties are studied in some detail. The result is used to construct a system
of two diffusive particles with rank-based characteristics and skew-elastic
collisions. Unfoldings of conventional reflections are also discussed, as are
examples involving skew Brownian Motions and skew Bessel processes.Comment: 20 pages. typos corrected, added a remark after Proposition 2.3,
simplified the last part of Example 2.
Temporal Cross-Media Retrieval with Soft-Smoothing
Multimedia information have strong temporal correlations that shape the way
modalities co-occur over time. In this paper we study the dynamic nature of
multimedia and social-media information, where the temporal dimension emerges
as a strong source of evidence for learning the temporal correlations across
visual and textual modalities. So far, cross-media retrieval models, explored
the correlations between different modalities (e.g. text and image) to learn a
common subspace, in which semantically similar instances lie in the same
neighbourhood. Building on such knowledge, we propose a novel temporal
cross-media neural architecture, that departs from standard cross-media
methods, by explicitly accounting for the temporal dimension through temporal
subspace learning. The model is softly-constrained with temporal and
inter-modality constraints that guide the new subspace learning task by
favouring temporal correlations between semantically similar and temporally
close instances. Experiments on three distinct datasets show that accounting
for time turns out to be important for cross-media retrieval. Namely, the
proposed method outperforms a set of baselines on the task of temporal
cross-media retrieval, demonstrating its effectiveness for performing temporal
subspace learning.Comment: To appear in ACM MM 201
Location Dependent Dirichlet Processes
Dirichlet processes (DP) are widely applied in Bayesian nonparametric
modeling. However, in their basic form they do not directly integrate
dependency information among data arising from space and time. In this paper,
we propose location dependent Dirichlet processes (LDDP) which incorporate
nonparametric Gaussian processes in the DP modeling framework to model such
dependencies. We develop the LDDP in the context of mixture modeling, and
develop a mean field variational inference algorithm for this mixture model.
The effectiveness of the proposed modeling framework is shown on an image
segmentation task
- …