Search CORE

672 research outputs found

CUBOS: An Internal Cluster Validity Index for Categorical Data

Author: Sen Wu
Xiaonan Gao
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2019
Field of study

Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Auto Insurance Business Analytics Approach for Customer Segmentation Using Multiple Mixed-Type Data Clustering Algorithms

Author: Kai Zhuang
Sen Wu
Xiaonan Gao
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2018
Field of study

Customer segmentation is critical for auto insurance companies to gain competitive advantage by mining useful customer related information. While some efforts have been made for customer segmentation to support auto insurance decision making, their customer segmentation results tend to be affected by the characteristics of the algorithm used and lack multiple validation from multiple algorithms. To this end, we propose an auto insurance business analytics approach that segments customers by using three mixed-type data clustering algorithms including k-prototypes, improved k-prototypes and similarity-based agglomerative clustering. The customer segmentation results of these algorithms can complement and reinforce each other and demonstrate as much information as possible to support decision-making. To confirm its practical value, the proposed approach extracts seven rules for an auto insurance company that may support the company to make customer related decisions and develop insurance products

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

A finite-difference method for the one-dimensional time-dependent schrödinger equation on unbounded domain

Author: Han Houde
Jin Jicheng
Wu Xiaonan
Publication venue: Published by Elsevier Ltd.
Publication date: 30/11/2005
Field of study

AbstractA finite-difference scheme is proposed for the one-dimensional time-dependent Schrödinger equation. We introduce an artificial boundary condition to reduce the original problem into an initial-boundary value problem in a finite-computational domain, and then construct a finite-difference scheme by the method of reduction of order to solve this reduced problem. This scheme has been proved to be uniquely solvable, unconditionally stable, and convergent. Some numerical examples are given to show the effectiveness of the scheme

Elsevier - Publisher Connector

Design and Development of Variable Pitch Quadcopter for Long Endurance Flight

Author: Wu Xiaonan
Publication venue
Publication date: 01/01/2018
Field of study

The variable pitch quadrotor is not a new concept but has been largely ignored in small unmanned aircraft, unlike the fixed pitch quadcopter which is controlled only by changing the RPM of the motors and only has about 30 minutes of total flight time. The variable pitch quadrotor can be controlled either by the change of the motor RPM or rotor blade pitch angle or by the combination of both. This gives the variable pitch quadrotor potential advantages in payload, maneuverability and long endurance flight. This research is focused on the design methodology for a variable pitch quadrotor using a single motor with potential applications for a long endurance flight. This variable pitch quadcopter uses a single power plant to power all four rotors through a power transmission system. All four rotors have the same rpm but vary the blade pitch angle to control its attitude in the air. A proof of concept variable pitch quadcopter is developed for testing the drivetrain mechanism on the vehicle and evaluating performance of the vehicle through numbers of testing.Mechanical and Aerospace Engineerin

ProQuest OAI Repository

SHAREOK repository

Understanding the Evaluation Abilities of External Cluster Validity Indices to Internal Ones

Author: Falong Fan
Guiying Wei
Sen Wu*
Xiaonan Gao
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2020
Field of study

Evaluating internal Cluster Validity Index (CVI) is a critical task in clustering research. Existing studies mainly employ the number of clusters (NC-based method) or external CVIs (external CVIs-based method) to evaluate internal CVIs, which are not always reasonable in all scenarios. Additionally, there is no guideline of choosing appropriate methods to evaluate internal CVIs in different cases. In this paper, we focus on the evaluation abilities of external CVIs to internal CVIs, and propose a novel approach, named external CVI\u27s evaluation Ability MEasurement approach through Ranking consistency (CAMER), to measure the evaluation abilities of external CVIs quantitatively, for assisting in selecting appropriate external CVIs to evaluate internal CVIs. Specifically, we formulate the evaluation ability measurement problem as a ranking consistency task, by measuring the consistency between the evaluation results of external CVIs to internal CVIs and the ground truth performance of internal CVIs. Then, the superiority of CAMER is validated through a real-world case. Moreover, the evaluation abilities of seven popular external CVIs to internal CVIs in six different scenarios are explored by CAMER. Finally, these explored evaluation abilities are validated on four real-world datasets, demonstrating the effectiveness of CAMER

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Clustering Algorithm Based on Sparse Feature Vector without Specifying Parameter

Author: Guiying Wei
Huixia He
Sen Wu*
Xiaonan Gao
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2020
Field of study

Parameter setting is an essential factor affecting algorithm performance in data mining techniques. CABOSFV is an efficient clustering algorithm which can cluster binary data with sparse features, but it is challenging to specify the threshold parameter. To solve the difficulty of parameter decision, a clustering algorithm based on sparse feature vector without specifying parameter (CASP) is proposed in this paper. The calculation method of an upper limit of threshold is firstly defined to determine the range of threshold. Furthermore, we use the sparseness index to sort the data and conduct the clustering process based on the adjusted sparse feature vector after data sorting. An interval search strategy is adopted to find a suitable threshold within the defined threshold range, and the clustering result with the selected suitable parameter is the outcome. Experiments on 7 UCI datasets demonstrate that the clustering results of the CASP algorithm are superior to other baselines in terms of both effectiveness and efficiency. CASP not only simplifies the parameter decision process, but also obtains desirable clustering results quickly and stably, which shows the practicability of the algorithm

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Emerging Synergies Between Large Language Models and Machine Learning in Ecommerce Recommendations

Author: He Yuhang
Liang Penghao
Wang Han
Wu Yichao
Xu Xiaonan
Publication venue
Publication date: 12/03/2024
Field of study

With the boom of e-commerce and web applications, recommender systems have become an important part of our daily lives, providing personalized recommendations based on the user's preferences. Although deep neural networks (DNNs) have made significant progress in improving recommendation systems by simulating the interaction between users and items and incorporating their textual information, these DNN-based approaches still have some limitations, such as the difficulty of effectively understanding users' interests and capturing textual information. It is not possible to generalize to different seen/unseen recommendation scenarios and reason about their predictions. At the same time, the emergence of large language models (LLMs), represented by ChatGPT and GPT-4, has revolutionized the fields of natural language processing (NLP) and artificial intelligence (AI) due to their superior capabilities in the basic tasks of language understanding and generation, and their impressive generalization and reasoning capabilities. As a result, recent research has sought to harness the power of LLM to improve recommendation systems. Given the rapid development of this research direction in the field of recommendation systems, there is an urgent need for a systematic review of existing LLM-driven recommendation systems for researchers and practitioners in related fields to gain insight into. More specifically, we first introduced a representative approach to learning user and item representations using LLM as a feature encoder. We then reviewed the latest advances in LLMs techniques for collaborative filtering enhanced recommendation systems from the three paradigms of pre-training, fine-tuning, and prompting. Finally, we had a comprehensive discussion on the future direction of this emerging field

arXiv.org e-Print Archive