134 research outputs found
Addressing Item-Cold Start Problem in Recommendation Systems using Model Based Approach and Deep Learning
Traditional recommendation systems rely on past usage data in order to
generate new recommendations. Those approaches fail to generate sensible
recommendations for new users and items into the system due to missing
information about their past interactions. In this paper, we propose a solution
for successfully addressing item-cold start problem which uses model-based
approach and recent advances in deep learning. In particular, we use latent
factor model for recommendation, and predict the latent factors from item's
descriptions using convolutional neural network when they cannot be obtained
from usage data. Latent factors obtained by applying matrix factorization to
the available usage data are used as ground truth to train the convolutional
neural network. To create latent factor representations for the new items, the
convolutional neural network uses their textual description. The results from
the experiments reveal that the proposed approach significantly outperforms
several baseline estimators
Enhancing Sensitivity Classification with Semantic Features using Word Embeddings
Government documents must be reviewed to identify any sensitive information
they may contain, before they can be released to the public. However,
traditional paper-based sensitivity review processes are not practical for reviewing
born-digital documents. Therefore, there is a timely need for automatic sensitivity
classification techniques, to assist the digital sensitivity review process.
However, sensitivity is typically a product of the relations between combinations
of terms, such as who said what about whom, therefore, automatic sensitivity
classification is a difficult task. Vector representations of terms, such as word
embeddings, have been shown to be effective at encoding latent term features
that preserve semantic relations between terms, which can also be beneficial to
sensitivity classification. In this work, we present a thorough evaluation of the
effectiveness of semantic word embedding features, along with term and grammatical
features, for sensitivity classification. On a test collection of government
documents containing real sensitivities, we show that extending text classification
with semantic features and additional term n-grams results in significant improvements
in classification effectiveness, correctly classifying 9.99% more sensitive
documents compared to the text classification baseline
SNE: Signed Network Embedding
Several network embedding models have been developed for unsigned networks.
However, these models based on skip-gram cannot be applied to signed networks
because they can only deal with one type of link. In this paper, we present our
signed network embedding model called SNE. Our SNE adopts the log-bilinear
model, uses node representations of all nodes along a given path, and further
incorporates two signed-type vectors to capture the positive or negative
relationship of each edge along the path. We conduct two experiments, node
classification and link prediction, on both directed and undirected signed
networks and compare with four baselines including a matrix factorization
method and three state-of-the-art unsigned network embedding models. The
experimental results demonstrate the effectiveness of our signed network
embedding.Comment: To appear in PAKDD 201
Tornado Detection with Support Vector Machines
Abstract. The National Weather Service (NWS) Mesocyclone Detec-tion Algorithms (MDA) use empirical rules to process velocity data from the Weather Surveillance Radar 1988 Doppler (WSR-88D). In this study Support Vector Machines (SVM) are applied to mesocyclone detection. Comparison with other classification methods like neural networks and radial basis function networks show that SVM are more effective in meso-cyclone/tornado detection.
A Principled Approach to Analyze Expressiveness and Accuracy of Graph Neural Networks
Graph neural networks (GNNs) have known an increasing success recently, with many GNN variants achieving state-of-the-art results on node and graph classification tasks. The proposed GNNs, however, often implement complex node and graph embedding schemes, which makes challenging to explain their performance. In this paper, we investigate the link between a GNN's expressiveness, that is, its ability to map different graphs to different representations, and its generalization performance in a graph classification setting. In particular , we propose a principled experimental procedure where we (i) define a practical measure for expressiveness, (ii) introduce an expressiveness-based loss function that we use to train a simple yet practical GNN that is permutation-invariant, (iii) illustrate our procedure on benchmark graph classification problems and on an original real-world application. Our results reveal that expressiveness alone does not guarantee a better performance, and that a powerful GNN should be able to produce graph representations that are well separated with respect to the class of the corresponding graphs
The supernova rate in local galaxy clusters
We report a measurement of the supernova (SN) rates (Ia and core-collapse) in
galaxy clusters based on the 136 SNe of the sample described in Cappellaro et
al. (1999) and Mannucci et al. (2005).
Early-type cluster galaxies show a type Ia SN rate (0.066 SNuM) similar to
that obtained by Sharon et al. (2007) and more than 3 times larger than that in
field early-type galaxies (0.019 SNuM). This difference has a 98% statistical
confidence level. We examine many possible observational biases which could
affect the rate determination, and conclude that none of them is likely to
significantly alter the results. We investigate how the rate is related to
several properties of the parent galaxies, and find that cluster membership,
morphology and radio power all affect the SN rate, while galaxy mass has no
measurable effect. The increased rate may be due to galaxy interactions in
clusters, inducing either the formation of young stars or a different evolution
of the progenitor binary systems.
We present the first measurement of the core-collapse SN rate in cluster
late-type galaxies, which turns out to be comparable to the rate in field
galaxies. This suggests that no large systematic difference in the initial mass
function exists between the two environments.Comment: MNRAS, revised version after referee's comment
Stellar Populations of Bulges in 14 Cluster Disc Galaxies
‘The definitive version is available at www.blackwell-synergy.com.’ Copyright Blackwell Publishing / RAS. DOI: 10.1111/j.1365-2966.2008.13566.xPeer reviewe
Examining the Classification Accuracy of TSVMs with Feature Selection in Comparison with the GLAD Algorithm
Gene expression data sets are used to classify and predict patient diagnostic categories. As we know, it is extremely difficult and expensive to obtain gene expression labelled examples. Moreover, conventional supervised approaches cannot function properly when labelled data (training examples) are insufficient using Support Vector Machines (SVM) algorithms. Therefore, in this paper, we suggest Transductive Support Vector Machines (TSVMs) as semi-supervised learning algorithms, learning with both labelled samples data and unlabelled samples to perform the classification of microarray data. To prune the superfluous genes and samples we used a feature selection method called Recursive Feature Elimination (RFE), which is supposed to enhance the output of classification and avoid the local optimization problem. We examined the classification prediction accuracy of the TSVM-RFE algorithm in comparison with the Genetic Learning Across Datasets (GLAD) algorithm, as both are semi-supervised learning methods. Comparing these two methods, we found that the TSVM-RFE surpassed both a SVM using RFE and GLAD
- …