Search CORE

32 research outputs found

Advances in knowledge discovery and data mining Part II

Author: CAO Tru
CHEUNG David Wai-Lok
HO Tu-Bao
LIM Ee Peng
MOTODA Hiroshi
ZHOU Zhi-Hua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

Institutional Knowledge at Singapore Management University

HKU Scholars Hub

An efficient parallel method for mining frequent closed sequential patterns

Author: Huynh Bao
Snášel Václav
Vo Bay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739

Crossref

DSpace at VSB Technical University of Ostrava

AN EFFICIENT ALGORITHM FORMINING HIGH UTILITY ASSOCIATION RULES FROM LATTICE

Author: Nguyen Loan T.T.
Nguyen Trinh D.D.
Tran Quyen
Vo Bay
Publication venue: 'Publishing House for Science and Technology, Vietnam Academy of Science and Technology'
Publication date: 11/05/2020
Field of study

In business, most of companies focus on growing their profits. Besides considering profit from each product, they also focus on the relationship among products in order to support effective decision making, gain more profits and attract their customers, e.g. shelf arrangement, product displays, or product marketing, etc. Some high utility association rules have been proposed, however, they consume much memory and require long time processing. This paper proposes LHAR (Lattice-based for mining High utility Association Rules) algorithm to mine high utility association rules based on a lattice of high utility itemsets. The LHAR algorithm aims to generates high utility association rules during the process of building lattice of high utility itemsets, and thus it needs less memory and runtim

Vietnam Academy of Science and Technology: Journals Online

Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping

Author: Ahmad A
Bin Ahmad B
Bui DT
Chapi K
Chen W
Habibnejhad M
Kavian A
Khosravi K
Pham BT
Pradhan B
Shahabi H
Shirzadi A
Soliamani K
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. The main objective of this research was to introduce a novel machine learning algorithm of alternating decision tree (ADTree) based on the multiboost (MB), bagging (BA), rotation forest (RF) and random subspace (RS) ensemble algorithms under two scenarios of different sample sizes and raster resolutions for spatial prediction of shallow landslides around Bijar City, Kurdistan Province, Iran. The evaluation of modeling process was checked by some statistical measures and area under the receiver operating characteristic curve (AUROC). Results show that, for combination of sample sizes of 60%/40% and 70%/30% with a raster resolution of 10 m, the RS model, while, for 80%/20% and 90%/10% with a raster resolution of 20 m, the MB model obtained a high goodness-of-fit and prediction accuracy. The RS-ADTree and MB-ADTree ensemble models outperformed the ADTree model in two scenarios. Overall, MB-ADTree in sample size of 80%/20% with a resolution of 20 m (area under the curve (AUC) = 0.942) and sample size of 60%/40% with a resolution of 10 m (AUC = 0.845) had the highest and lowest prediction accuracy, respectively. The findings confirm that the newly proposed models are very promising alternative tools to assist planners and decision makers in the task of managing landslide prone areas

Multidisciplinary Digital Publishing Institute

OPUS - University of Technology Sydney

Directory of Open Access Journals

Universiti Teknologi Malaysia Institutional Repository

Bittm: A core biterms-based topic model for targeted analysis

Author: Chen L
Li L
Wang J
Wu X
Publication venue: 'MDPI AG'
Publication date: 18/05/2022
Field of study

While most of the existing topic models perform a full analysis on a set of documents to discover all topics, it is noticed recently that in many situations users are interested in fine-grained topics related to some specific aspects only. As a result, targeted analysis (or focused analysis) has been proposed to address this problem. Given a corpus of documents from a broad area, targeted analysis discovers only topics related with user-interested aspects that are expressed by a set of user-provided query keywords. Existing approaches for targeted analysis suffer from problems such as topic loss and topic suppression because of their inherent assumptions and strategies. Moreover, existing approaches are not designed to address computation efficiency, while targeted analysis is supposed to provide responses to user queries as soon as possible. In this paper, we propose a core BiTerms-based Topic Model (BiTTM). By modelling topics from core biterms that are potentially relevant to the target query, on one hand, BiTTM captures the context information across documents to alleviate the problem of topic loss or suppression; on the other hand, our proposed model enables the efficient modelling of topics related to specific aspects. Our experiments on nine real-world datasets demonstrate BiTTM outperforms existing approaches in terms of both effectiveness and efficiency

OPUS - University of Technology Sydney

Activity-partner recommendation

Author: Cheung DW
Lu Z
Mamoulis N
Tu W
Yang M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

LNCS v. 9077 entitled: Advances in Knowledge Discovery and Data Mining: 19th Pacific-Asia Conference, PAKDD 2015 ... Proceedings, Part 1In many activities, such as watching movies or having dinner, people prefer to find partners before participation. Therefore, when recommending activity items (e.g., movie tickets) to users, it makes sense to also recommend suitable activity partners. This way, (i) the users save time for finding activity partners, (ii) the effectiveness of the item recommendation is increased (users may prefer activity items more if they can find suitable activity partners), (iii) recommender systems become more interesting and enkindle users' social enthusiasm. In this paper, we identify the usefulness of suggesting activity partners together with items in recommender systems. In addition, we propose and compare several methods for activity-partner recommendation. Our study includes experiments that test the practical value of activity-partner recommendation and evaluate the effectiveness of all suggested methods as well as some alternative strategies.postprin

CiteSeerX

HKU Scholars Hub

Online Active Learning of Reject Option Classifiers

Author: Manwani Naresh
Shah Kulin
Publication venue
Publication date: 02/04/2020
Field of study

Active learning is an important technique to reduce the number of labeled examples in supervised learning. Active learning for binary classification has been well addressed in machine learning. However, active learning of the reject option classifier remains unaddressed. In this paper, we propose novel algorithms for active learning of reject option classifiers. We develop an active learning algorithm using double ramp loss function. We provide mistake bounds for this algorithm. We also propose a new loss function called double sigmoid loss function for reject option and corresponding active learning algorithm. We offer a convergence guarantee for this algorithm. We provide extensive experimental results to show the effectiveness of the proposed algorithms. The proposed algorithms efficiently reduce the number of label examples required

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Clustering Data of Mixed Categorical and Numerical Type with Unsupervised Feature Learning

Author: Lam Dao
Wei Mingzhen
Wunsch Donald C.
Publication venue: Scholars\u27 Mine
Publication date: 01/09/2015
Field of study

Mixed-type categorical and numerical data are a challenge in many applications. This general area of mixed-type data is among the frontier areas, where computational intelligence approaches are often brittle compared with the capabilities of living creatures. In this paper, unsupervised feature learning (UFL) is applied to the mixed-type data to achieve a sparse representation, which makes it easier for clustering algorithms to separate the data. Unlike other UFL methods that work with homogeneous data, such as image and video data, the presented UFL works with the mixed-type data using fuzzy adaptive resonance theory (ART). UFL with fuzzy ART (UFLA) obtains a better clustering result by removing the differences in treating categorical and numeric features. The advantages of doing this are demonstrated with several real-world data sets with ground truth, including heart disease, teaching assistant evaluation, and credit approval. The approach is also demonstrated on noisy, mixed-type petroleum industry data. UFLA is compared with several alternative methods. To the best of our knowledge, this is the first time UFL has been extended to accomplish the fusion of mixed data types

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Named Entity Recognition and Text Compression

Author: Nguyen Hong Vu
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2016
Field of study

Import 13/01/2017In recent years, social networks have become very popular. It is easy for users to share their data using online social networks. Since data on social networks is idiomatic, irregular, brief, and includes acronyms and spelling errors, dealing with such data is more challenging than that of news or formal texts. With the huge volume of posts each day, effective extraction and processing of these data will bring great benefit to information extraction applications. This thesis proposes a method to normalize Vietnamese informal text in social networks. This method has the ability to identify and normalize informal text based on the structure of Vietnamese words, Vietnamese syllable rules, and a trigram model. After normalization, the data will be processed by a named entity recognition (NER) model to identify and classify the named entities in these data. In our NER model, we use six different types of features to recognize named entities categorized in three predefined classes: Person (PER), Location (LOC), and Organization (ORG). When viewing social network data, we found that the size of these data are very large and increase daily. This raises the challenge of how to decrease this size. Due to the size of the data to be normalized, we use a trigram dictionary that is quite big, therefore we also need to decrease its size. To deal with this challenge, in this thesis, we propose three methods to compress text files, especially in Vietnamese text. The first method is a syllable-based method relying on the structure of Vietnamese morphosyllables, consonants, syllables and vowels. The second method is trigram-based Vietnamese text compression based on a trigram dictionary. The last method is based on an n-gram slide window, in which we use five dictionaries for unigrams, bigrams, trigrams, four-grams and five-grams. This method achieves a promising compression ratio of around 90% and can be used for any size of text file.In recent years, social networks have become very popular. It is easy for users to share their data using online social networks. Since data on social networks is idiomatic, irregular, brief, and includes acronyms and spelling errors, dealing with such data is more challenging than that of news or formal texts. With the huge volume of posts each day, effective extraction and processing of these data will bring great benefit to information extraction applications. This thesis proposes a method to normalize Vietnamese informal text in social networks. This method has the ability to identify and normalize informal text based on the structure of Vietnamese words, Vietnamese syllable rules, and a trigram model. After normalization, the data will be processed by a named entity recognition (NER) model to identify and classify the named entities in these data. In our NER model, we use six different types of features to recognize named entities categorized in three predefined classes: Person (PER), Location (LOC), and Organization (ORG). When viewing social network data, we found that the size of these data are very large and increase daily. This raises the challenge of how to decrease this size. Due to the size of the data to be normalized, we use a trigram dictionary that is quite big, therefore we also need to decrease its size. To deal with this challenge, in this thesis, we propose three methods to compress text files, especially in Vietnamese text. The first method is a syllable-based method relying on the structure of Vietnamese morphosyllables, consonants, syllables and vowels. The second method is trigram-based Vietnamese text compression based on a trigram dictionary. The last method is based on an n-gram slide window, in which we use five dictionaries for unigrams, bigrams, trigrams, four-grams and five-grams. This method achieves a promising compression ratio of around 90% and can be used for any size of text file.460 - Katedra informatikyvyhově

DSpace at VSB Technical University of Ostrava