672 research outputs found
CUBOS: An Internal Cluster Validity Index for Categorical Data
Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets
Auto Insurance Business Analytics Approach for Customer Segmentation Using Multiple Mixed-Type Data Clustering Algorithms
Customer segmentation is critical for auto insurance companies to gain competitive advantage by mining useful customer related information. While some efforts have been made for customer segmentation to support auto insurance decision making, their customer segmentation results tend to be affected by the characteristics of the algorithm used and lack multiple validation from multiple algorithms. To this end, we propose an auto insurance business analytics approach that segments customers by using three mixed-type data clustering algorithms including k-prototypes, improved k-prototypes and similarity-based agglomerative clustering. The customer segmentation results of these algorithms can complement and reinforce each other and demonstrate as much information as possible to support decision-making. To confirm its practical value, the proposed approach extracts seven rules for an auto insurance company that may support the company to make customer related decisions and develop insurance products
A finite-difference method for the one-dimensional time-dependent schrödinger equation on unbounded domain
AbstractA finite-difference scheme is proposed for the one-dimensional time-dependent Schrödinger equation. We introduce an artificial boundary condition to reduce the original problem into an initial-boundary value problem in a finite-computational domain, and then construct a finite-difference scheme by the method of reduction of order to solve this reduced problem. This scheme has been proved to be uniquely solvable, unconditionally stable, and convergent. Some numerical examples are given to show the effectiveness of the scheme
Design and Development of Variable Pitch Quadcopter for Long Endurance Flight
The variable pitch quadrotor is not a new concept but has been largely ignored in small unmanned aircraft, unlike the fixed pitch quadcopter which is controlled only by changing the RPM of the motors and only has about 30 minutes of total flight time. The variable pitch quadrotor can be controlled either by the change of the motor RPM or rotor blade pitch angle or by the combination of both. This gives the variable pitch quadrotor potential advantages in payload, maneuverability and long endurance flight. This research is focused on the design methodology for a variable pitch quadrotor using a single motor with potential applications for a long endurance flight. This variable pitch quadcopter uses a single power plant to power all four rotors through a power transmission system. All four rotors have the same rpm but vary the blade pitch angle to control its attitude in the air. A proof of concept variable pitch quadcopter is developed for testing the drivetrain mechanism on the vehicle and evaluating performance of the vehicle through numbers of testing.Mechanical and Aerospace Engineerin
Understanding the Evaluation Abilities of External Cluster Validity Indices to Internal Ones
Evaluating internal Cluster Validity Index (CVI) is a critical task in clustering research. Existing studies mainly employ the number of clusters (NC-based method) or external CVIs (external CVIs-based method) to evaluate internal CVIs, which are not always reasonable in all scenarios. Additionally, there is no guideline of choosing appropriate methods to evaluate internal CVIs in different cases. In this paper, we focus on the evaluation abilities of external CVIs to internal CVIs, and propose a novel approach, named external CVI\u27s evaluation Ability MEasurement approach through Ranking consistency (CAMER), to measure the evaluation abilities of external CVIs quantitatively, for assisting in selecting appropriate external CVIs to evaluate internal CVIs. Specifically, we formulate the evaluation ability measurement problem as a ranking consistency task, by measuring the consistency between the evaluation results of external CVIs to internal CVIs and the ground truth performance of internal CVIs. Then, the superiority of CAMER is validated through a real-world case. Moreover, the evaluation abilities of seven popular external CVIs to internal CVIs in six different scenarios are explored by CAMER. Finally, these explored evaluation abilities are validated on four real-world datasets, demonstrating the effectiveness of CAMER
Clustering Algorithm Based on Sparse Feature Vector without Specifying Parameter
Parameter setting is an essential factor affecting algorithm performance in data mining techniques. CABOSFV is an efficient clustering algorithm which can cluster binary data with sparse features, but it is challenging to specify the threshold parameter. To solve the difficulty of parameter decision, a clustering algorithm based on sparse feature vector without specifying parameter (CASP) is proposed in this paper. The calculation method of an upper limit of threshold is firstly defined to determine the range of threshold. Furthermore, we use the sparseness index to sort the data and conduct the clustering process based on the adjusted sparse feature vector after data sorting. An interval search strategy is adopted to find a suitable threshold within the defined threshold range, and the clustering result with the selected suitable parameter is the outcome. Experiments on 7 UCI datasets demonstrate that the clustering results of the CASP algorithm are superior to other baselines in terms of both effectiveness and efficiency. CASP not only simplifies the parameter decision process, but also obtains desirable clustering results quickly and stably, which shows the practicability of the algorithm
Emerging Synergies Between Large Language Models and Machine Learning in Ecommerce Recommendations
With the boom of e-commerce and web applications, recommender systems have
become an important part of our daily lives, providing personalized
recommendations based on the user's preferences. Although deep neural networks
(DNNs) have made significant progress in improving recommendation systems by
simulating the interaction between users and items and incorporating their
textual information, these DNN-based approaches still have some limitations,
such as the difficulty of effectively understanding users' interests and
capturing textual information. It is not possible to generalize to different
seen/unseen recommendation scenarios and reason about their predictions. At the
same time, the emergence of large language models (LLMs), represented by
ChatGPT and GPT-4, has revolutionized the fields of natural language processing
(NLP) and artificial intelligence (AI) due to their superior capabilities in
the basic tasks of language understanding and generation, and their impressive
generalization and reasoning capabilities. As a result, recent research has
sought to harness the power of LLM to improve recommendation systems. Given the
rapid development of this research direction in the field of recommendation
systems, there is an urgent need for a systematic review of existing LLM-driven
recommendation systems for researchers and practitioners in related fields to
gain insight into. More specifically, we first introduced a representative
approach to learning user and item representations using LLM as a feature
encoder. We then reviewed the latest advances in LLMs techniques for
collaborative filtering enhanced recommendation systems from the three
paradigms of pre-training, fine-tuning, and prompting. Finally, we had a
comprehensive discussion on the future direction of this emerging field
- …