Search CORE

9,020 research outputs found

A fuzzy set preference model for market share analysis

Author: Turksen I. B.
Willson Ian A.
Publication venue
Publication date
Field of study

Consumer preference models are widely used in new product design, marketing management, pricing, and market segmentation. The success of new products depends on accurate market share prediction and design decisions based on consumer preferences. The vague linguistic nature of consumer preferences and product attributes, combined with the substantial differences between individuals, creates a formidable challenge to marketing models. The most widely used methodology is conjoint analysis. Conjoint models, as currently implemented, represent linguistic preferences as ratio or interval-scaled numbers, use only numeric product attributes, and require aggregation of individuals for estimation purposes. It is not surprising that these models are costly to implement, are inflexible, and have a predictive validity that is not substantially better than chance. This affects the accuracy of market share estimates. A fuzzy set preference model can easily represent linguistic variables either in consumer preferences or product attributes with minimal measurement requirements (ordinal scales), while still estimating overall preferences suitable for market share prediction. This approach results in flexible individual-level conjoint models which can provide more accurate market share estimates from a smaller number of more meaningful consumer ratings. Fuzzy sets can be incorporated within existing preference model structures, such as a linear combination, using the techniques developed for conjoint analysis and market share estimation. The purpose of this article is to develop and fully test a fuzzy set preference model which can represent linguistic variables in individual-level models implemented in parallel with existing conjoint models. The potential improvements in market share prediction and predictive validity can substantially improve management decisions about what to make (product design), for whom to make it (market segmentation), and how much to make (market share prediction)

NASA Technical Reports Server

Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation

Author: Chen Keke
Guo Shumin
Xu Huiqi
Publication venue
Publication date: 09/01/2013
Field of study

With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We propose the RASP data perturbation method to provide secure and efficient range query and kNN query services for protected data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random noise injection, and random projection, to provide strong resilience to attacks on the perturbed data and queries. It also preserves multidimensional ranges, which allows existing indexing techniques to be applied to speedup range query processing. The kNN-R algorithm is designed to work with the RASP range query algorithm to process the kNN queries. We have carefully analyzed the attacks on data and queries under a precisely defined threat model and realistic security assumptions. Extensive experiments have been conducted to show the advantages of this approach on efficiency and security.Comment: 18 pages, to appear in IEEE TKDE, accepted in December 201

arXiv.org e-Print Archive

CiteSeerX

CORE

Cache Conscious Data Layouting for In-Memory Databases

Author: Pirk H. (Holger)
Publication venue: Humboldt-Universität zu Berlin
Publication date: 01/01/2010
Field of study

Many applications with manually implemented data management exhibit a data storage pattern in which semantically related data items are stored closer in memory than unrelated data items. The strong sematic relationship between these data items commonly induces contemporary accesses to them. This is called the principle of data locality and has been recognized by hardware vendors. It is commonly exploited to improve the performance of hardware. General Purpose Database Management Systems (DBMSs), whose main goal is to simplify optimal data storage and processing, generally fall short of this claim because the usage pattern of the stored data cannot be anticipated when designing the system. The current interest in column oriented databases indicates that one strategy does not fit all applications. A DBMS that automatically adapts it’s storage strategy to the workload of the database promises a significant performance increase by maximizing the benefit of hardware optimizations that are based on the principle of data locality. This thesis gives an overview of optimizations that are based on the principle of data locality and the effect they have on the data access performance of applications. Based on the findings, a model is introduced that allows an estimation of the costs of data accesses based on the arrangement of the data in the main memory. This model is evaluated through a series of experiments and incorporated into an automatic layouting component for a DBMS. This layouting component allows the calculation of an analytically optimal storage layout. The performance benefits brought by this component are evaluated in an application benchmark

CWI's Institutional Repository

Identity Disclosure Protection: A Data Reconstruction Approach for Preserving Privacy in Data Mining

Author: Li Xiao-Bai
Wu Shuning
Zhu Dan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2007
Field of study

AIS Electronic Library (AISeL)

Data Mining Feature Subset Weighting and Selection Using Genetic Algorithms

Author: Yilmaz Okan
Publication venue: AFIT Scholar
Publication date: 01/03/2002
Field of study

We present a simple genetic algorithm (sGA), which is developed under Genetic Rule and Classifier Construction Environment (GRaCCE) to solve feature subset selection and weighting problem to have better classification accuracy on k-nearest neighborhood (KNN) algorithm. Our hypotheses are that weighting the features will affect the performance of the KNN algorithm and will cause better classification accuracy rate than that of binary classification. The weighted-sGA algorithm uses real-value chromosomes to find the weights for features and binary-sGA uses integer-value chromosomes to select the subset of features from original feature set. A Repair algorithm is developed for weighted-sGA algorithm to guarantee the feasibility of chromosomes. By feasibility we mean that the sum of values of each gene in a chromosome must be equal to 1. To calculate the fitness values for each chromosome in the population, we use K Nearest Neighbor Algorithm (KNN) as our fitness function. The Euclidean distance from one individual to other individuals is calculated on the d-dimensional feature space to classify an unknown instance. GRaCCE searches for good feature subsets and their associated weights. These feature weights are then multiplied with normalized feature values and these new values are used to calculate the distance between features

AFTI Scholar (Air Force Institute of Technology)

Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis

Author: Cahyani N.
Muslim M.A
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 31/03/2020
Field of study

Data mining is a technique of research necessary hidden information in a database to find interesting pattern. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset almost half of attributes are numeric types that are continuous. Continuous attributes can make accuracy lower because the data forms are unlimited, so it need to be transformed into discrete. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. So, these attributes need to be selected in advance to get more accurate results. Classification is one technique in data mining. Which one of classification algorithms is C4.5. Purpose of this study is increasing accuracy of C4.5 algorithm by applaying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. Accuracy improvement is done by applying discretization and CFS. Discretization is used to handle continuous value, while CFS is used as attribute selection. Experiment was conducted with WEKA (Waikato Environment for Knowledge Analysis). By applying discretization and CFS in C4.5 shows an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization are 97.25% and accuracy of C4.5 algorithm with discretization and CFS is 97.5%

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System