27 research outputs found

    A deep learning-based hybrid model for recommendation generation and ranking

    Get PDF
    A recommender system plays a vital role in information filtering and retrieval, and its application is omnipresent in many domains. There are some drawbacks such as the cold-start and the data sparsity problems which affect the performance of the recommender model. Various studies help with drastically improving the performance of recommender systems via unique methods, such as the traditional way of performing matrix factorization (MF) and also applying deep learning (DL) techniques in recent years. By using DL in the recommender system, we can overcome the difficulties of collaborative filtering. DL now focuses mainly on modeling content descriptions, but those models ignore the main factor of user–item interaction. In the proposed hybrid Bayesian stacked auto-denoising encoder (HBSADE) model, it recognizes the latent interests of the user and analyzes contextual reviews that are performed through the MF method. The objective of the model is to identify the user’s point of interest, recommending products/services based on the user’s latent interests. The proposed two-stage novel hybrid deep learning-based collaborative filtering method explores the user’s point of interest, captures the communications between items and users and provides better recommendations in a personalized way. We used a multilayer neural network to manipulate the nonlinearities between the user and item communication from data. Experiments were to prove that our HBSADE outperforms existing methodologies over Amazon-b and Book-Crossing datasets

    Efficient Learning Framework for Training Deep Learning Models with Limited Supervision

    Get PDF
    In recent years, deep learning has shown tremendous success in different applications, however these modes mostly need a large labeled dataset for training their parameters. In this work, we aim to explore the potentials of efficient learning frameworks for training deep models on different problems in the case of limited supervision or noisy labels. For the image clustering problem, we introduce a new deep convolutional autoencoder with an unsupervised learning framework. We employ a relative entropy minimization as the clustering objective regularized by the frequency of cluster assignments and a reconstruction loss. In the case of noisy labels obtained by crowdsourcing platforms, we proposed a novel deep hybrid model for sentiment analysis of text data like tweets based on noisy crowd labels. The proposed model consists of a crowdsourcing aggregation model and a deep text autoencoder. We combine these sub-models based on a probabilistic framework rather than a heuristic way, and derive an efficient optimization algorithm to jointly solve the corresponding problem. In order to improve the performance of unsupervised deep hash functions on image similarity search in big datasets, we adopt generative adversarial networks to propose a new deep image retrieval model, where the adversarial loss is employed as a data-dependent regularization in our objective function. We also introduce a balanced self-paced learning algorithm for training a GAN-based model for image clustering, where the input samples are gradually included into training from easy to difficult, while the diversity of selected samples from all clusters are also considered. In addition, we explore adopting discriminative approaches for unsupervised visual representation learning rather than the generative algorithms, such as maximizing the mutual information between an input image and its representation and a contrastive loss for decreasing the distance between the representations of original and augmented image data

    Disentangling low-dimensional vector space representations of text documents

    Get PDF
    In contrast to traditional document representations such as bags-of-words, the kind of vector space representations that are currently most popular tend to be lower-dimensional. This has important advantages, e.g. making the representation of a given document less dependent on the exact words that are used. However, this also comes at an important cost, namely that the features of the representation are entangled, i.e. each feature is not individually meaningful. The main aim of this thesis is to address this problem by disentangling vector spaces into representations that are composed of meaningful features that are closely aligned with natural categories from the given domain. For instance, in the domain of IMDB movie reviews, where each document is a review, a disentangled feature representation would be separated into features that describe how ("Scary", "Romantic", ..., "Comedic") the movie is. This thesis builds on an initial approach introduced by Derrac and Schockaert [21], which derives features from low-dimensional vector spaces. The method begins by using a linear classifier to find a hyper-plane that separates documents that have a term from those that do not have a term. Then, from each hyperplane, the direction of the orthogonal vector is taken to induce a ranking from documents that are least related to the word (those furthest from the hyper-plane on the negative side) to documents that are most related to it (those furthest from the hyperplane on the positive side). To identify which of these words describe semantically important features, they are scored by how well the linear classifier performs on a standard classification metric, which approximates how linearly separable the documents are that contain the term in the vector space. The assumption is that the more separable a term is, the better modelled it is in the space. The highest scoring terms are selected to be used as features, and documents are ranked by calculating the dot product between the orthogonal vector to the hyper-plane and each document vector. This results in a ranking of documents on how strongly expressed each feature is, e.g. movies could be ranked on how "Scary" they are. Only the direction of this orthogonal vector is considered in this work, as our concern is to obtain document rankings. The work by Derrac and Schockaert [21] obtained semantic features from Multi-Dimensional Scaling (MDS) document embeddings and validated their work by classifying documents using a rule-based classifier (FOIL), resulting in rules of the form "IF x is more scary than most horror films THEN x is a horror film." This work by Derrac and Schockaert [21] was focused on showing the feasibility of learning disentangled representations, but it did not make clear which components of their method were essential. The first main contribution of this thesis therefore consists in a thorough investigation of variants of their method, where a quantitative analysis is conducted of different document representations (as opposed to only MDS), different term scoring functions (as opposed to only the Kappa score) and the proposed clustering method is revisited. This extensive evaluation is across a variety of new domains, and compares the method to stronger baselines. To quantitatively analyse the impact of these design choices, the use of low-depth decision trees that classify natural categories in the domain is proposed. A qualitative analysis of the discovered features is also presented. Neural network architectures have advanced to state-of-the-art in many tasks. The second main contribution of the thesis follows the idea that the hidden layers of a neural network can be viewed as vector space embeddings. Specifically, in our setting, meaningful feature to describe documents can be derived from the hidden layers of neural networks. In particular, to test the potential of using neural networks to discover features that cannot be discovered using standard document embedding methods, feed-forward neural networks and stacked auto-encoders are quantitatively and qualitatively investigated. Auto-encoders are stacked by using their hidden layer as the input to another auto-encoder. We find that meaningful features can indeed be derived from the hidden layers of the considered neural network architectures. We quantitatively assess how predictive these features are, compared to those of the input embeddings. Qualitatively, we find that feedforward networks tend to select and refine features that were already modelled in the input embedding. In contrast, stacked autoencoders tend to model increasingly more abstract features as additional hidden layers are added. For example, in the initial auto-encoder layers, features like "Horror" and "Comedy" can be separated well by the linear classifier. Meanwhile, features like "Society" and "Relationships" are more separable in later layers. After identifying directions that model important features of documents in each stacked auto-encoder, symbolic rules are induced that relate specific features to more general ones. These rules can clarify the nature of the transformations that are learned by the neural network, for example: IF Emotions AND Journey THEN Adventure (1) The third contribution of this thesis is the introduction of an additional post-processing step for improving disentangled feature representations. This is done by fine-tuning the original embedding such that rankings of documents induced by our disentangled features are in agreement with rankings induced by Pointwise Mutual Information scores. The motivation for this contribution stems from the fact that methods for learning document embeddings are mostly aimed at modelling similarity. It is found that there is an inherent trade-off between capturing similarity and faithfully modelling features as directions. Following this observation, a simple method to fine-tune document embeddings is proposed, with the aim of improving the quality of the feature directions obtained from them. This method is also unsupervised, requiring only a bag-of-words representation of the documents as input. In particular, clusters of terms are identified that refer to semantically meaningful and important features of the considered domain, and a simple neural network model is used to learn a new representation in which each of these features is more faithfully modelled as a direction. It is found that in most cases this method improves the ranking of documents, and results in increased performance when disentangled feature representations are used as input to classifiers. Overall, this thesis quantitatively and qualitatively confirms that disentangled feature representations of meaningful features can be derived from low-dimensional vector spaces of documents, across a variety of domains and document embedding models

    Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification

    Get PDF
    National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Machine Learning Approaches for Traffic Flow Forecasting

    Get PDF
    Intelligent Transport Systems (ITS) as a field has emerged quite rapidly in the recent years. A competitive solution coupled with big data gathered for ITS applications needs the latest AI to drive the ITS for the smart and effective public transport planning and management. Although there is a strong need for ITS applications like Advanced Route Planning (ARP) and Traffic Control Systems (TCS) to take the charge and require the minimum of possible human interventions. This thesis develops the models that can predict the traffic link flows on a junction level such as road traffic flows for a freeway or highway road for all traffic conditions. The research first reviews the state-of-the-art time series data prediction techniques with a deep focus in the field of transport Engineering along with the existing statistical and machine leaning methods and their applications for the freeway traffic flow prediction. This review setup a firm work focussed on the view point to look for the superiority in term of prediction performance of individual statistical or machine learning models over another. A detailed theoretical attention has been given, to learn the structure and working of individual chosen prediction models, in relation to the traffic flow data. In modelling the traffic flows from the real-world Highway England (HE) gathered dataset, a traffic flow objective function for highway road prediction models is proposed in a 3-stage framework including the topological breakdown of traffic network into virtual patches, further into nodes and to the basic links flow profiles behaviour estimations. The proposed objective function is tested with ten different prediction models including the statistical, shallow and deep learning constructed hybrid models for bi-directional links flow prediction methods. The effectiveness of the proposed objective function greatly enhances the accuracy of traffic flow prediction, regardless of the machine learning model used. The proposed prediction objective function base framework gives a new approach to model the traffic network to better understand the unknown traffic flow waves and the resulting congestions caused on a junction level. In addition, the results of applied Machine Learning models indicate that RNN variant LSTMs based models in conjunction with neural networks and Deep CNNs, when applied through the proposed objective function, outperforms other chosen machine learning methods for link flow predictions. The experimentation based practical findings reveal that to arrive at an efficient, robust, offline and accurate prediction model apart from feeding the ML mode with the correct representation of the network data, attention should be paid to the deep learning model structure, data pre-processing (i.e. normalisation) and the error matrices used for data behavioural learning. The proposed framework, in future can be utilised to address one of the main aims of the smart transport systems i.e. to reduce the error rates in network wide congestion predictions and the inflicted general traffic travel time delays in real-time

    Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

    Get PDF
    [...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...

    COMMUNITY DETECTION IN GRAPHS

    Get PDF
    Thesis (Ph.D.) - Indiana University, Luddy School of Informatics, Computing, and Engineering/University Graduate School, 2020Community detection has always been one of the fundamental research topics in graph mining. As a type of unsupervised or semi-supervised approach, community detection aims to explore node high-order closeness by leveraging graph topological structure. By grouping similar nodes or edges into the same community while separating dissimilar ones apart into different communities, graph structure can be revealed in a coarser resolution. It can be beneficial for numerous applications such as user shopping recommendation and advertisement in e-commerce, protein-protein interaction prediction in the bioinformatics, and literature recommendation or scholar collaboration in citation analysis. However, identifying communities is an ill-defined problem. Due to the No Free Lunch theorem [1], there is neither gold standard to represent perfect community partition nor universal methods that are able to detect satisfied communities for all tasks under various types of graphs. To have a global view of this research topic, I summarize state-of-art community detection methods by categorizing them based on graph types, research tasks and methodology frameworks. As academic exploration on community detection grows rapidly in recent years, I hereby particularly focus on the state-of-art works published in the latest decade, which may leave out some classic models published decades ago. Meanwhile, three subtle community detection tasks are proposed and assessed in this dissertation as well. First, apart from general models which consider only graph structures, personalized community detection considers user need as auxiliary information to guide community detection. In the end, there will be fine-grained communities for nodes better matching user needs while coarser-resolution communities for the rest of less relevant nodes. Second, graphs always suffer from the sparse connectivity issue. Leveraging conventional models directly on such graphs may hugely distort the quality of generate communities. To tackle such a problem, cross-graph techniques are involved to propagate external graph information as a support for target graph community detection. Third, graph community structure supports a natural language processing (NLP) task to depict node intrinsic characteristics by generating node summarizations via a text generative model. The contribution of this dissertation is threefold. First, a decent amount of researches are reviewed and summarized under a well-defined taxonomy. Existing works about methods, evaluation and applications are all addressed in the literature review. Second, three novel community detection tasks are demonstrated and associated models are proposed and evaluated by comparing with state-of-art baselines under various datasets. Third, the limitations of current works are pointed out and future research tracks with potentials are discussed as well

    Unsupervised Pretraining of Neural Networks with Multiple Targets using Siamese Architectures

    Get PDF
    A model's response for a given input pattern depends on the seen patterns in the training data. The larger the amount of training data, the more likely edge cases are covered during training. However, the more complex input patterns are, the larger the model has to be. For very simple use cases, a relatively small model can achieve very high test accuracy in a matter of minutes. On the other hand, a large model has to be trained for multiple days. The actual time to develop a model of that size can be considered to be even greater since often many different architecture types and hyper-parameter configurations have to be tried. An extreme case for a large model is the recently released GPT-3 model. This model consists of 175 billion parameters and was trained using 45 terabytes of text data. The model was trained to generate text and is able to write news articles and source code based only on a rough description. However, a model like this is only creatable for researchers with access to special hardware or immense amounts of data. Thus, it is desirable to find less resource-intensive training approaches to enable other researchers to create well performing models. This thesis investigates the use of pre-trained models. If a model has been trained on one dataset and is then trained on another similar data, it faster learns to adjust to similar patterns than a model that has not yet seen any of the task's pattern. Thus, the learned lessons from one training are transferred to another task. During pre-training, the model is trained to solve a specific task like predicting the next word in a sequence or first encoding an input image before decoding it. Such models contain an encoder and a decoder part. When transferring that model to another task, parts of the model's layers will be removed. As a result, having to discard fewer weights results in faster training since less time has to be spent on training parts of a model that are only needed to solve an auxiliary task. Throughout this thesis, the concept of siamese architectures will be discussed since when using that architecture, no parameters have to be discarded when transferring a model trained with that approach onto another task. Thus, the siamese pre-training approach positively impacts the need for resources like time and energy use and drives the development of new models in the direction of Green AI. The models trained with this approach will be evaluated by comparing them to models trained with other pre-training approaches as well as large existing models. It will be shown that the models trained for the tasks in this thesis perform as good as externally pre-trained models, given the right choice of data and training targets: It will be shown that the number and type of training targets during pre-training impacts a model's performance on transfer learning tasks. The use cases presented in this thesis cover different data from different domains to show that the siamese training approach is widely applicable. Consequently, researchers are motivated to create their own pre-trained models for data domains, for which there are no existing pre-trained models.Die Vorhersage eines Models hĂ€ngt davon ab, welche Muster in den wĂ€hrend des Trainings benutzen Daten vorhanden sind. Je grĂ¶ĂŸer die Menge an Trainingsdaten ist, desto wahrscheinlicher ist es, dass GrenzfĂ€lle in den Daten vorkommen. Je grĂ¶ĂŸer jedoch die Anzahl der zu lernenden Mustern ist, desto grĂ¶ĂŸer muss jedoch das Modell sein. FĂŒr einfache AnwendungsfĂ€lle ist es möglich ein kleines Modell in wenigen Minuten zu trainieren um bereits gute Ergebnisse auf Testdaten zu erhalten. FĂŒr komplexe AnwendungsfĂ€lle kann ein dementsprechend großes Modell jedoch bis zu mehrere Tage benötigen um ausreichend gut zu sein. Ein Extremfall fĂŒr ein großes Modell ist das kĂŒrzlich veröffentlichte Modell mit dem Namen GPT-3, welches aus 175 Milliarden Parametern besteht und mit Trainingsdaten in der GrĂ¶ĂŸenordnung von 45 Terabyte trainiert wurde. Das Modell wurde trainiert Text zu generieren und ist in der Lage Nachrichtenartikel zu generieren, basierend auf einer groben Ausgangsbeschreibung. Solch ein Modell können nur solche Forscher entwickeln, die Zugang zu entsprechender Hardware und Datenmengen haben. Es demnach von Interesse Trainingsvorgehen dahingehend zu verbessern, dass auch mit wenig vorhandenen Ressourcen Modelle fĂŒr komplexe AnwendungsfĂ€lle trainiert werden können. Diese Arbeit beschĂ€figt sich mit dem Vortrainieren von neuronalen Netzen. Wenn ein neuronales Netz auf einem Datensatz trainiert wurde und dann auf einem zweiten Datensatz weiter trainiert wird, lernt es die Merkmale des zweiten Datensatzes schneller, da es nicht von Grund auf Muster lernen muss sondern auf bereits gelerntes zurĂŒckgreifen kann. Man spricht dann davon, dass das Wissen transferiert wird. WĂ€hrend des Vortrainierens bekommt ein Modell hĂ€ufig eine Aufgabe wie zum Beispiel, im Fall von Bilddaten, die Trainingsdaten erst zu komprimieren und dann wieder herzustellen. Bei Textdaten könnte ein Modell vortrainiert werden, indem es einen Satz als Eingabe erhĂ€lt und dann den nĂ€chsten Satz aus dem Quelldokument vorhersagen muss. Solche Modelle bestehen dementsprechend aus einem Encoder und einem Decoder. Der Nachteil bei diesem Vorgehen ist, dass der Decoder lediglich fĂŒr das Vortrainieren benötigt wird und fĂŒr den spĂ€teren Anwendungsfall nur der Encoder benötigt wird. Zentraler Bestandteil in dieser Arbeit ist deswegen das Untersuchen der Vorteile und Nachteile der siamesische Modellarchitektur. Diese Architektur besteht lediglich aus einem Encoder, was dazu fĂŒhrt, dass das Vortrainieren kostengĂŒnstiger ist, da weniger Gewichte trainiert werden mĂŒssen. Der wesentliche wissenschaftliche Beitrag liegt darin, dass die siamische Architektur ausfĂŒhrlich verglichen wird mit vergleichbaren AnsĂ€tzen. Dabei werden bestimmte Nachteile gefunden, wie zum Beispiel dass die Auswahl einer Ähnlichkeitsfunktion oder das Zusammenstellen der Trainingsdaten große Auswirkung auf das Modelltraining haben. Es wird erarbeitet, welche Ähnlichkeitsfunktion in welchen Kontexten empfohlen wird sowie wie andere Nachteile der siamischen Architektur durch die Anpassung der Trainingsziele ausgeglichen werden können. Die entsprechenden Experimente werden dabei auf Daten aus unterschiedlichen DomĂ€nen ausgefĂŒhrt um zu zeigen, dass der entsprechende Ansatz universell anwendbar ist. Die Ergebnisse aus konkreten AnwendungsfĂ€llen zeigen außerdem, dass die innerhalb dieser Arbeit entwickelten Modelle Ă€hnlich gut abschneiden wie extern verfĂŒgbare Modelle, welche mit großem Ressourcenaufwand trainiert worden sind. Dies zeigt, dass mit Bedacht erarbeitete Architekturen die benötigten Ressourcen verringern können
    corecore