Search CORE

1,038 research outputs found

Text Categorization and Sorting of Web Search Results

Author: Budimac Zoran
Ivanović Mirjana
Radovanović Miloš
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

With the Internet facing the growing problem of information overload, the large volumes, weak structure and noisiness of Web data make it amenable to the application of machine learning techniques. After providing an overview of several topics in text categorization, including document representation, feature selection, and a choice of classifiers, the paper presents experimental results concerning the performance and effects of different transformations of the bag-of-words document representation and feature selection, on texts extracted from the dmoz Open Directory of Web pages. Finally, the paper describes the primary motivation for the experiments: a new meta-search engine CatS which utilizes text categorization to enhance the presentation of search results obtained from a major Web search engine

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Hierarchical Attention Network for Visually-aware Food Recommendation

Author: Chua Tat-Seng
Feng Chong
Feng Fuli
Gao Xiaoyan
Guan Xinyu
He Xiangnan
Huang Heyan
Ming Zhaoyan
Publication venue
Publication date: 06/01/2019
Field of study

Food recommender systems play an important role in assisting users to identify the desired food to eat. Deciding what food to eat is a complex and multi-faceted process, which is influenced by many factors such as the ingredients, appearance of the recipe, the user's personal preference on food, and various contexts like what had been eaten in the past meals. In this work, we formulate the food recommendation problem as predicting user preference on recipes based on three key factors that determine a user's choice on food, namely, 1) the user's (and other users') history; 2) the ingredients of a recipe; and 3) the descriptive image of a recipe. To address this challenging problem, we develop a dedicated neural network based solution Hierarchical Attention based Food Recommendation (HAFR) which is capable of: 1) capturing the collaborative filtering effect like what similar users tend to eat; 2) inferring a user's preference at the ingredient level; and 3) learning user preference from the recipe's visual images. To evaluate our proposed method, we construct a large-scale dataset consisting of millions of ratings from AllRecipes.com. Extensive experiments show that our method outperforms several competing recommender solutions like Factorization Machine and Visual Bayesian Personalized Ranking with an average improvement of 12%, offering promising results in predicting user preference for food. Codes and dataset will be released upon acceptance

arXiv.org e-Print Archive

ScholarBank@NUS

Are Deep Learning Approaches Suitable for Natural Language Processing?

Author: Alshahrani S.
Alshahrani S.
Kapetanios E.
Kapetanios E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In recent years, Deep Learning (DL) techniques have gained much at-tention from Artificial Intelligence (AI) and Natural Language Processing (NLP) research communities because these approaches can often learn features from data without the need for human design or engineering interventions. In addition, DL approaches have achieved some remarkable results. In this paper, we have surveyed major recent contributions that use DL techniques for NLP tasks. All these reviewed topics have been limited to show contributions to text understand-ing, such as sentence modelling, sentiment classification, semantic role labelling, question answering, etc. We provide an overview of deep learning architectures based on Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Recursive Neural Networks (RNNs)

Crossref

WestminsterResearch

Wikipedia-based hybrid document representation for textual news classification

Author: Anido Rifón Luis Eulogio
Mouriño García Marcos Antonio
Pérez Rodríguez Roberto
Vilares Ferro Manuel
Publication venue: GIST (Grupo de Enxeñería de Sistemas Telemáticos)
Publication date: 09/11/2023
Field of study

The sheer amount of news items that are published every day makes worth the task of automating their classification. The common approach consists in representing news items by the frequency of the words they contain and using supervised learning algorithms to train a classifier. This bag-of-words (BoW) approach is oblivious to three aspects of natural language: synonymy, polysemy, and multiword terms. More sophisticated representations based on concepts—or units of meaning—have been proposed, following the intuition that document representations that better capture the semantics of text will lead to higher performance in automatic classification tasks. The reality is that, when classifying news items, the BoW representation has proven to be really strong, with several studies reporting it to perform above different ‘flavours’ of bag of concepts (BoC). In this paper, we propose a hybrid classifier that enriches the traditional BoW representation with concepts extracted from text—leveraging Wikipedia as background knowledge for the semantic analysis of text (WikiBoC). We benchmarked the proposed classifier, comparing it with BoW and several BoC approaches: Latent Dirichlet Allocation (LDA), Explicit Semantic Analysis, and word embeddings (doc2vec). We used two corpora: the well-known Reuters-21578, composed of newswire items, and a new corpus created ex professo for this study: the Reuters-27000. Results show that (1) the performance of concept-based classifiers is very sensitive to the corpus used, being higher in the more “concept-friendly” Reuters-27000; (2) the Hybrid-WikiBoC approach proposed offers performance increases over BoW up to 4.12 and 49.35% when classifying Reuters-21578 and Reuters-27000 corpora, respectively; and (3) for average performance, the proposed Hybrid-WikiBoC outperforms all the other classifiers, achieving a performance increase of 15.56% over the best state-of-the-art approach (LDA) for the largest training sequence. Results indicate that concepts extracted with the help of Wikipedia add useful information that improves classification performance for news items.Atlantic Research Center for Information and Communication TechnologiesXunta de Galicia | Ref. R2014/034 (RedPlir)Xunta de Galicia | Ref. R2014/029 (TELGalicia

Investigo

Bengali text document categorization based on very deep convolution neural network

Author: Hoque Mohammed Moshiul
Hossain Md. Rajib
Sarker Iqbal H.
Siddique Nazmul
Publication venue: 'Elsevier BV'
Publication date: 01/12/2021
Field of study

Ulster University's Research Portal