108,730 research outputs found

    Online feature selection for mining big data

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier

    A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

    Get PDF
    Text classification, the task of metadata to documents, needs a person to take significant time and effort. Since online-generated contents are explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Recently, various state-or-art text mining methods have been applied to classification process based on the keywords extraction. However, when using these keywords as features in the classification task, it is common that the number of feature dimensions is large. In addition, how to select keywords from documents as features in the classification task is a big challenge. Especially, when using traditional machine learning algorithms in big data, the computation time is very long. On the other hand, about 80% of real data is unstructured and non-labeled in the real world. The conventional supervised feature selection methods cannot be directly used in selecting entities from massive data. Usually, statistical strategies are utilized to extract features from unlabeled data for classification tasks according to their importance scores. We propose a novel method to extract key features effectively before feeding them into the classification assignment. Another challenge in the text classification is the multi-label problem, the assignment of multiple non-exclusive labels to documents. This problem makes text classification more complicated compared with a single label classification. For the above issues, we develop a framework for extracting data and reducing data dimension to solve the multi-label problem on labeled and unlabeled datasets. In order to reduce data dimension, we develop a hybrid feature selection method that extracts meaningful features according to the importance of each feature. The Word2Vec is applied to represent each document by a feature vector for the document categorization for the big dataset. The unsupervised approach is used to extract features from real online-generated data for text classification. Our unsupervised feature selection method is applied to extract depression symptoms from social media such as Twitter. In the future, these depression symptoms will be used for depression self-screening and diagnosis

    What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

    Get PDF
    Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective

    FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification

    Full text link
    This paper introduces a novel real-time Fuzzy Supervised Learning with Binary Meta-Feature (FSL-BM) for big data classification task. The study of real-time algorithms addresses several major concerns, which are namely: accuracy, memory consumption, and ability to stretch assumptions and time complexity. Attaining a fast computational model providing fuzzy logic and supervised learning is one of the main challenges in the machine learning. In this research paper, we present FSL-BM algorithm as an efficient solution of supervised learning with fuzzy logic processing using binary meta-feature representation using Hamming Distance and Hash function to relax assumptions. While many studies focused on reducing time complexity and increasing accuracy during the last decade, the novel contribution of this proposed solution comes through integration of Hamming Distance, Hash function, binary meta-features, binary classification to provide real time supervised method. Hash Tables (HT) component gives a fast access to existing indices; and therefore, the generation of new indices in a constant time complexity, which supersedes existing fuzzy supervised algorithms with better or comparable results. To summarize, the main contribution of this technique for real-time Fuzzy Supervised Learning is to represent hypothesis through binary input as meta-feature space and creating the Fuzzy Supervised Hash table to train and validate model.Comment: FICC201

    The Role of Text Pre-processing in Sentiment Analysis

    Get PDF
    It is challenging to understand the latest trends and summarise the state or general opinions about products due to the big diversity and size of social media data, and this creates the need of automated and real time opinion extraction and mining. Mining online opinion is a form of sentiment analysis that is treated as a difficult text classification task. In this paper, we explore the role of text pre-processing in sentiment analysis, and report on experimental results that demonstrate that with appropriate feature selection and representation, sentiment analysis accuracies using support vector machines (SVM) in this area may be significantly improved. The level of accuracy achieved is shown to be comparable to the ones achieved in topic categorisation although sentiment analysis is considered to be a much harder problem in the literature
    • …
    corecore