12 research outputs found

    A Survey on Cross-domain Recommendation: Taxonomies, Methods, and Future Directions

    Full text link
    Traditional recommendation systems are faced with two long-standing obstacles, namely, data sparsity and cold-start problems, which promote the emergence and development of Cross-Domain Recommendation (CDR). The core idea of CDR is to leverage information collected from other domains to alleviate the two problems in one domain. Over the last decade, many efforts have been engaged for cross-domain recommendation. Recently, with the development of deep learning and neural networks, a large number of methods have emerged. However, there is a limited number of systematic surveys on CDR, especially regarding the latest proposed methods as well as the recommendation scenarios and recommendation tasks they address. In this survey paper, we first proposed a two-level taxonomy of cross-domain recommendation which classifies different recommendation scenarios and recommendation tasks. We then introduce and summarize existing cross-domain recommendation approaches under different recommendation scenarios in a structured manner. We also organize datasets commonly used. We conclude this survey by providing several potential research directions about this field

    StyloThai: A scalable framework for stylometric authorship identification of Thai documents

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in January 2020, available online: https://doi.org/10.1145/3365832 The accepted version of the publication may differ from the final published version.© 2020 Association for Computing Machinery. All rights reserved. Authorship identification helps to identify the true author of a given anonymous document from a set of candidate authors. The applications of this task can be found in several domains, such as law enforcement agencies and information retrieval. These application domains are not limited to a specific language, community, or ethnicity. However, most of the existing solutions are designed for English, and a little attention has been paid to Thai. These existing solutions are not directly applicable to Thai due to the linguistic differences between these two languages. Moreover, the existing solution designed for Thai is unable to (i) handle outliers in the dataset, (ii) scale when the size of the candidate authors set increases, and (iii) perform well when the number of writing samples for each candidate author is low.We identify a stylometric feature space for the Thai authorship identification task. Based on our feature space, we present an authorship identification solution that uses the probabilistic k nearest neighbors classifier by transforming each document into a collection of point sets. Specifically, this document transformation allows us to (i) use set distance measures associated with an outlier handling mechanism, (ii) capture stylistic variations within a document, and (iii) produce multiple predictions for a query document. We create a new Thai authorship identification corpus containing 547 documents from 200 authors, which is significantly larger than the corpus used by the existing study (an increase of 32 folds in terms of the number of candidate authors). The experimental results show that our solution can overcome the limitations of the existing solution and outperforms all competitors with an accuracy level of 91.02%. Moreover, we investigate the effectiveness of each stylometric features category with the help of an ablation study. We found that combining all categories of the stylometric features outperforms the other combinations. Finally, we cross compare the feature spaces and classification methods of all solutions. We found that (i) our solution can scale as the number of candidate authors increases, (ii) our method outperforms all the competitors, and (iii) our feature space provides better performance than the feature space used by the existing study.The research was partially supported by the Digital Economy Promotion Agency (project# MP-62- 0003); and Thailand Research Fund and Office of the Higher Education Commission (MRG6180266).Published versio

    Native language identification of fluent and advanced non-native writers

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202 The accepted version of the publication may differ from the final published version.Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University | Digital Economy Promotion Agency (# MP-62-0003) | Thailand Research Funds (MRG6180266 and MRG6280175).Published versio

    A survey of context-aware recommendation schemes in event-based social networks

    Full text link
    © 2020 by the authors. Licensee MDPI, Basel, Switzerland. In recent years, Event-based social network (EBSN) applications, such as Meetup and DoubanEvent, have received popularity and rapid growth. They provide convenient online platforms for users to create, publish, and organize social events, which will be held in physical places. Additionally, they not only support typical online social networking facilities (e.g., sharing comments and photos), but also promote face-to-face offline social interactions. To provide better service for users, Context-Aware Recommender Systems (CARS) in EBSNs have recently been singled out as a fascinating area of research. CARS in EBSNs provide the suitable recommendation to target users by incorporating the contextual factors into the recommendation process. This paper provides an overview on the development of CARS in EBSNs. We begin by illustrating the concept of the term context and the paradigms of conventional context-aware recommendation process. Subsequently, we introduce the formal definition of an EBSN, the characteristics of EBSNs, the challenges that are faced by CARS in EBSNs, and the implementation process of CARS in EBSNs. We also investigate which contextual factors are considered and how they are represented in the recommendation process. Next, we focus on the state-of-the-art computational techniques regarding CARS in EBSNs. We also overview the datasets and evaluation metrics for evaluation in this research area, and discuss the applications of context-aware recommendation in EBSNs. Finally, we point out research opportunities for the research community
    corecore