Search CORE

52,776 research outputs found

Evolving Neural-Symbolic Systems Guided by Adaptive Training Schemes: Applications in Finance.

Author: Athanasios Tsakonas
Clark P.
Fahrmeir L.
Georgios Dounias
Koza J.
Koza J. R.
Lavrac N.
Quinlan J. R.
Quinlan J. R.
Singleton A.
Teh H.-H.
Publication venue: 'Informa UK Limited'
Publication date: 14/08/2007
Field of study

Crossref

Bournemouth University Research Online

Intelligent Financial Fraud Detection Practices: An Investigation

Author: B Bai
B Hoogs
C Holton
C Whitrow
CA Paasch
D Sánchez
D Zhang
E Duman
E Kirkos
E Ngai
FH Glancy
HC Koh
I Yeh
J Pinquet
JE Sohl
JT Quah
L Bermúdez
M Cecchini
M Jans
P Ravisankar
S Bhattacharyya
S Panigrahi
S Viaene
SL Humpherys
V Vatsa
W Zhou
W-S Yang
Publication venue
Publication date: 24/10/2015
Field of study

Financial fraud is an issue with far reaching consequences in the finance industry, government, corporate sectors, and for ordinary consumers. Increasing dependence on new technologies such as cloud and mobile computing in recent years has compounded the problem. Traditional methods of detection involve extensive use of auditing, where a trained individual manually observes reports or transactions in an attempt to discover fraudulent behaviour. This method is not only time consuming, expensive and inaccurate, but in the age of big data it is also impractical. Not surprisingly, financial institutions have turned to automated processes using statistical and computational methods. This paper presents a comprehensive investigation on financial fraud detection practices using such data mining methods, with a particular focus on computational intelligence-based techniques. Classification of the practices based on key aspects such as detection algorithm used, fraud type investigated, and success rate have been covered. Issues and challenges associated with the current practices and potential future direction of research have also been identified.Comment: Proceedings of the 10th International Conference on Security and Privacy in Communication Networks (SecureComm 2014

arXiv.org e-Print Archive

Crossref

A computational model of evolution: haploidy versus diploidy

Author: Berlanga de Jesús Antonio
Isasi Pedro
Molina López José Manuel
Sanchis de Miguel María Araceli
Publication venue: Institute of Informatics. Slovak Academy of Sciences
Publication date: 01/01/1999
Field of study

In this paper, the study of diploidy is introduced like and important mechanism for memory reinforcement in artificial environments where adaptation is very important. The individuals of this ecosystem are able to genetically "learn" the best behaviour for survival. Critical changes, happening in the environmental conditions, require the presence of diploidy to ensure the survival of species. By means of new gene-dominance configurations, a way to shield the individuals from erroneous selection is provided. These two concepts appear like important elements for artificial systems which have to evolve in environments with some degree of instability.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Genetic programming for mining DNA chip data from cancer patients

Author: Buxton BF
Langdon WB
Publication venue
Publication date: 01/01/2004
Field of study

In machine learning terms DNA (gene) chip data is unusual in having thousands of attributes (the gene expression values) but few (<100) records (the patients). A GP based method for both feature selection and generating simple models based on a few genes is demonstrated on cancer data

CiteSeerX

UCL Discovery

A Probabilistic One-Step Approach to the Optimal Product Line Design Problem Using Conjoint and Cost Data

Author: Harald Hruschka
Winfried Steiner
Publication venue
Publication date
Field of study

Designing and pricing new products is one of the most critical activities for a firm, and it is well-known that taking into account consumer preferences for design decisions is essential for products later to be successful in a competitive environment (e.g., Urban and Hauser 1993). Consequently, measuring consumer preferences among multiattribute alternatives has been a primary concern in marketing research as well, and among many methodologies developed, conjoint analysis (Green and Rao 1971) has turned out to be one of the most widely used preference-based techniques for identifying and evaluating new product concepts. Moreover, a number of conjoint-based models with special focus on mathematical programming techniques for optimal product (line) design have been proposed (e.g., Zufryden 1977, 1982, Green and Krieger 1985, 1987b, 1992, Kohli and Krishnamurti 1987, Kohli and Sukumar 1990, Dobson and Kalish 1988, 1993, Balakrishnan and Jacob 1996, Chen and Hausman 2000). These models are directed at determining optimal product concepts using consumers' idiosyncratic or segment level part-worth preference functions estimated previously within a conjoint framework. Recently, Balakrishnan and Jacob (1996) have proposed the use of Genetic Algorithms (GA) to solve the problem of identifying a share maximizing single product design using conjoint data. In this paper, we follow Balakrishnan and Jacob's idea and employ and evaluate the GA approach with regard to the problem of optimal product line design. Similar to the approaches of Kohli and Sukumar (1990) and Nair et al. (1995), product lines are constructed directly from part-worths data obtained by conjoint analysis, which can be characterized as a one-step approach to product line design. In contrast, a two-step approach would start by first reducing the total set of feasible product profiles to a smaller set of promising items (reference set of candidate items) from which the products that constitute a product line are selected in a second step. Two-step approaches or partial models for either the first or second stage in this context have been proposed by Green and Krieger (1985, 1987a, 1987b, 1989), McBride and Zufryden (1988), Dobson and Kalish (1988, 1993) and, more recently, by Chen and Hausman (2000). Heretofore, with the only exception of Chen and Hausman's (2000) probabilistic model, all contributors to the literature on conjoint-based product line design have employed a deterministic, first-choice model of idiosyncratic preferences. Accordingly, a consumer is assumed to choose from her/his choice set the product with maximum perceived utility with certainty. However, the first choice rule seems to be an assumption too rigid for many product categories and individual choice situations, as the analyst often won't be in a position to control for all relevant variables influencing consumer behavior (e.g., situational factors). Therefore, in agreement with Chen and Hausman (2000), we incorporate a probabilistic choice rule to provide a more flexible representation of the consumer decision making process and start from segment-specific conjoint models of the conditional multinomial logit type. Favoring the multinomial logit model doesn't imply rejection of the widespread max-utility rule, as the MNL includes the option of mimicking this first choice rule. We further consider profit as a firm's economic criterion to evaluate decisions and introduce fixed and variable costs for each product profile. However, the proposed methodology is flexible enough to accomodate for other goals like market share (as well as for any other probabilistic choice rule). This model flexibility is provided by the implemented Genetic Algorithm as the underlying solver for the resulting nonlinear integer programming problem. Genetic Algorithms merely use objective function information (in the present context on expected profits of feasible product line solutions) and are easily adjustable to different objectives without the need for major algorithmic modifications. To assess the performance of the GA methodology for the product line design problem, we employ sensitivity analysis and Monte Carlo simulation. Sensitivity analysis is carried out to study the performance of the Genetic Algorithm w.r.t. varying GA parameter values (population size, crossover probability, mutation rate) and to finetune these values in order to provide near optimal solutions. Based on more than 1500 sensitivity runs applied to different problem sizes ranging from 12.650 to 10.586.800 feasible product line candidate solutions, we can recommend: (a) as expected, that a larger problem size be accompanied by a larger population size, with a minimum popsize of 130 for small problems and a minimum popsize of 250 for large problems, (b) a crossover probability of at least 0.9 and (c) an unexpectedly high mutation rate of 0.05 for small/medium-sized problems and a mutation rate in the order of 0.01 for large problem sizes. Following the results of the sensitivity analysis, we evaluated the GA performance for a large set of systematically varying market scenarios and associated problem sizes. We generated problems using a 4-factorial experimental design which varied by the number of attributes, number of levels in each attribute, number of items to be introduced by a new seller and number of competing firms except the new seller. The results of the Monte Carlo study with a total of 276 data sets that were analyzed show that the GA works efficiently in both providing near optimal product line solutions and CPU time. Particularly, (a) the worst-case performance ratio of the GA observed in a single run was 96.66%, indicating that the profit of the best product line solution found by the GA was never less than 96.66% of the profit of the optimal product line, (b) the hit ratio of identifying the optimal solution was 84.78% (234 out of 276 cases) and (c) it tooks at most 30 seconds for the GA to converge. Considering the option of Genetic Algorithms for repeated runs with (slightly) changed parameter settings and/or different initial populations (as opposed to many other heuristics) further improves the chances of finding the optimal solution.

Research Papers in Economics

A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

Author: Adibi N
Ahmadzadeh MR
Barati E
Mohammadi A
Saraee MH
Publication venue: Cyber Journals
Publication date: 01/03/2011
Field of study

Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

University of Salford Institutional Repository