134 research outputs found

    Addressing Item-Cold Start Problem in Recommendation Systems using Model Based Approach and Deep Learning

    Full text link
    Traditional recommendation systems rely on past usage data in order to generate new recommendations. Those approaches fail to generate sensible recommendations for new users and items into the system due to missing information about their past interactions. In this paper, we propose a solution for successfully addressing item-cold start problem which uses model-based approach and recent advances in deep learning. In particular, we use latent factor model for recommendation, and predict the latent factors from item's descriptions using convolutional neural network when they cannot be obtained from usage data. Latent factors obtained by applying matrix factorization to the available usage data are used as ground truth to train the convolutional neural network. To create latent factor representations for the new items, the convolutional neural network uses their textual description. The results from the experiments reveal that the proposed approach significantly outperforms several baseline estimators

    Enhancing Sensitivity Classification with Semantic Features using Word Embeddings

    Get PDF
    Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. However, sensitivity is typically a product of the relations between combinations of terms, such as who said what about whom, therefore, automatic sensitivity classification is a difficult task. Vector representations of terms, such as word embeddings, have been shown to be effective at encoding latent term features that preserve semantic relations between terms, which can also be beneficial to sensitivity classification. In this work, we present a thorough evaluation of the effectiveness of semantic word embedding features, along with term and grammatical features, for sensitivity classification. On a test collection of government documents containing real sensitivities, we show that extending text classification with semantic features and additional term n-grams results in significant improvements in classification effectiveness, correctly classifying 9.99% more sensitive documents compared to the text classification baseline

    SNE: Signed Network Embedding

    Full text link
    Several network embedding models have been developed for unsigned networks. However, these models based on skip-gram cannot be applied to signed networks because they can only deal with one type of link. In this paper, we present our signed network embedding model called SNE. Our SNE adopts the log-bilinear model, uses node representations of all nodes along a given path, and further incorporates two signed-type vectors to capture the positive or negative relationship of each edge along the path. We conduct two experiments, node classification and link prediction, on both directed and undirected signed networks and compare with four baselines including a matrix factorization method and three state-of-the-art unsigned network embedding models. The experimental results demonstrate the effectiveness of our signed network embedding.Comment: To appear in PAKDD 201

    Tornado Detection with Support Vector Machines

    Full text link
    Abstract. The National Weather Service (NWS) Mesocyclone Detec-tion Algorithms (MDA) use empirical rules to process velocity data from the Weather Surveillance Radar 1988 Doppler (WSR-88D). In this study Support Vector Machines (SVM) are applied to mesocyclone detection. Comparison with other classification methods like neural networks and radial basis function networks show that SVM are more effective in meso-cyclone/tornado detection.

    A Principled Approach to Analyze Expressiveness and Accuracy of Graph Neural Networks

    Get PDF
    Graph neural networks (GNNs) have known an increasing success recently, with many GNN variants achieving state-of-the-art results on node and graph classification tasks. The proposed GNNs, however, often implement complex node and graph embedding schemes, which makes challenging to explain their performance. In this paper, we investigate the link between a GNN's expressiveness, that is, its ability to map different graphs to different representations, and its generalization performance in a graph classification setting. In particular , we propose a principled experimental procedure where we (i) define a practical measure for expressiveness, (ii) introduce an expressiveness-based loss function that we use to train a simple yet practical GNN that is permutation-invariant, (iii) illustrate our procedure on benchmark graph classification problems and on an original real-world application. Our results reveal that expressiveness alone does not guarantee a better performance, and that a powerful GNN should be able to produce graph representations that are well separated with respect to the class of the corresponding graphs

    The supernova rate in local galaxy clusters

    Get PDF
    We report a measurement of the supernova (SN) rates (Ia and core-collapse) in galaxy clusters based on the 136 SNe of the sample described in Cappellaro et al. (1999) and Mannucci et al. (2005). Early-type cluster galaxies show a type Ia SN rate (0.066 SNuM) similar to that obtained by Sharon et al. (2007) and more than 3 times larger than that in field early-type galaxies (0.019 SNuM). This difference has a 98% statistical confidence level. We examine many possible observational biases which could affect the rate determination, and conclude that none of them is likely to significantly alter the results. We investigate how the rate is related to several properties of the parent galaxies, and find that cluster membership, morphology and radio power all affect the SN rate, while galaxy mass has no measurable effect. The increased rate may be due to galaxy interactions in clusters, inducing either the formation of young stars or a different evolution of the progenitor binary systems. We present the first measurement of the core-collapse SN rate in cluster late-type galaxies, which turns out to be comparable to the rate in field galaxies. This suggests that no large systematic difference in the initial mass function exists between the two environments.Comment: MNRAS, revised version after referee's comment

    Examining the Classification Accuracy of TSVMs with Feature Selection in Comparison with the GLAD Algorithm

    Get PDF
    Gene expression data sets are used to classify and predict patient diagnostic categories. As we know, it is extremely difficult and expensive to obtain gene expression labelled examples. Moreover, conventional supervised approaches cannot function properly when labelled data (training examples) are insufficient using Support Vector Machines (SVM) algorithms. Therefore, in this paper, we suggest Transductive Support Vector Machines (TSVMs) as semi-supervised learning algorithms, learning with both labelled samples data and unlabelled samples to perform the classification of microarray data. To prune the superfluous genes and samples we used a feature selection method called Recursive Feature Elimination (RFE), which is supposed to enhance the output of classification and avoid the local optimization problem. We examined the classification prediction accuracy of the TSVM-RFE algorithm in comparison with the Genetic Learning Across Datasets (GLAD) algorithm, as both are semi-supervised learning methods. Comparing these two methods, we found that the TSVM-RFE surpassed both a SVM using RFE and GLAD
    corecore