211 research outputs found

    Multiobjective Evolutionary Optimization of Type-2 Fuzzy Rule-Based Systems for Financial Data Classification

    Get PDF
    Classification techniques are becoming essential in the financial world for reducing risks and possible disasters. Managers are interested in not only high accuracy, but in interpretability and transparency as well. It is widely accepted now that the comprehension of how inputs and outputs are related to each other is crucial for taking operative and strategic decisions. Furthermore, inputs are often affected by contextual factors and characterized by a high level of uncertainty. In addition, financial data are usually highly skewed toward the majority class. With the aim of achieving high accuracies, preserving the interpretability, and managing uncertain and unbalanced data, this paper presents a novel method to deal with financial data classification by adopting type-2 fuzzy rule-based classifiers (FRBCs) generated from data by a multiobjective evolutionary algorithm (MOEA). The classifiers employ an approach, denoted as scaled dominance, for defining rule weights in such a way to help minority classes to be correctly classified. In particular, we have extended PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs, for managing both interval type-2 fuzzy sets and unbalanced datasets. To the best of our knowledge, this is the first work that generates type-2 FRBCs by concurrently maximizing accuracy and minimizing the number of rules and the rule length with the objective of producing interpretable models of real-world skewed and incomplete financial datasets. The rule bases are generated by exploiting a rule and condition selection (RCS) approach, which selects a reduced number of rules from a heuristically generated rule base and a reduced number of conditions for each selected rule during the evolutionary process. The weight associated with each rule is scaled by the scaled dominance approach on the fuzzy frequency of the output class, in order to give a higher weight to the minority class. As regards the data base learning, the membership function parameters of the interval type-2 fuzzy sets used in the rules are learned concurrently to the application of RCS. Unbalanced datasets are managed by using, in addition to complexity, selectivity and specificity as objectives of the MOEA rather than only the classification rate. We tested our approach, named IT2-PAES-RCS, on 11 financial datasets and compared our results with the ones obtained by the original PAES-RCS with three objectives and with and without scaled dominance, the FRBCs, fuzzy association rule-based classification model for high-dimensional dataset (FARC-HD) and fuzzy unordered rules induction algorithm (FURIA), the classical C4.5 decision tree algorithm, and its cost-sensitive version. Using nonparametric statistical tests, we will show that IT2-PAES-RCS generates FRBCs with, on average, accuracy statistically comparable with and complexity lower than the ones generated by the two versions of the original PAES-RCS. Further, the FRBCs generated by FARC-HD and FURIA and the decision trees computed by C4.5 and its cost-sensitive version, despite the highest complexity, result to be less accurate than the FRBCs generated by IT2-PAES-RCS. Finally, we will highlight how these FRBCs are easily interpretable by showing and discussing one of them

    Dataset for multimodal fake news detection and verification tasks

    Get PDF
    The proliferation of online disinformation and fake news, particularly in the context of breaking news events, demands the development of effective detection mechanisms. While textual content remains the predominant medium for disseminating misleading information, the contribution of other modalities is increasingly emerging within online outlets and social media platforms. However, multimodal datasets, which incorporate diverse modalities such as texts and images, are not very common yet, especially in low-resource languages. This study addresses this gap by releasing a dataset tailored for multimodal fake news detection in the Italian language. This dataset was originally employed in a shared task on the Italian language. The dataset is divided into two data subsets, each corresponding to a distinct sub-task. In sub-task 1, the goal is to assess the effectiveness of multimodal fake news detection systems. Sub-task 2 aims to delve into the interplay between text and images, specifically analyzing how these modalities mutually influence the interpretation of content when distinguishing between fake and real news. Both sub-tasks were managed as classification problems. The dataset consists of social media posts and news articles. After collecting it, it was labeled via crowdsourcing. Annotators were provided with external knowledge about the topic of the news to be labeled, enhancing their ability to discriminate between fake and real news. The data subsets for sub-task 1 and sub-task 2 consist of 913 and 1350 items, respectively, encompassing newspaper articles and tweets

    Delaying Inconsistency Resolution Using Fuzzy Logic

    No full text
    While developing complex systems, software engineers generally have to deal with various kinds of inconsistencies. Certain kinds of inconsistencies are inevitable, for instance, in case of multiple persons working independently of each other within the same project. Some inconsistencies are desirable when, for instance, alternative solutions exist for the same problem, and these solutions have to be preserved to allow further refinements along the development process. Current software development methods do not provide adequate means to model the desired inconsistencies and, therefore, aim to resolve the inconsistencies whenever they are detected. Although early resolution of inconsistencies reduces complexity of design by eliminating possible alternatives, it results in loss of information and excessive restriction of the design space. This paper aims to enhance the current methods by modelling and controlling the desired inconsistencies through the application of fuzzy logic

    Feature Selection based on a Modified Fuzzy C-means Algorithm with Supervision

    No full text
    In this paper we propose a new approach to feature selection based on a modified fuzzy C-means algorithm with supervision (MFCMS). MFCMS completes the unsupervised learning of classical fuzzy C-means with labeled patterns. The labeled patterns allow MFCMS to accurately model the shape of each cluster and consequently to highlight the features which result to be particularly effective to characterize a cluster. These features are distinguished by a low variance of their values for the patterns with a high membership degree to the cluster. If, with respect to these features, the distance between the prototype of the cluster and the prototypes of the other clusters is high, then these features have the property of discriminating between the cluster and the other clusters. To take these two aspects into account, for each cluster and each feature, we introduce a purposely defined index: the higher the value of the index, the higher the discrimination capability of the feature for the cluster. We execute MFCMS on the training set considering all patterns as labeled. Then, we retain the features which are associated, at least for one cluster, with an index larger than a threshold Ï„. We applied MFCMS to several real-world pattern classification benchmarks. We used the well-known k-nearest neighbors as learning algorithm. We show that feature selection performed by MFCMS achieved an improvement in generalization on all data sets

    Automating Software Development Process Using Fuzzy Logic

    No full text

    k-NN algorithm based on Neural Similarity

    No full text
    The aim of this paper is to present a k-nearest neighbour (k-NN) classifier based on a neural model of the similarity measure between data. After a preliminary phase of supervised learning for similarity determination, we use the neural similarity measure to guide the k-NN rule. Experiments on both synthetic and real-world data show that the similarity-based k-NN rule outperforms the Euclidean distance-based k-NN rule

    A Simple Algorithm for Data Compression in Wireless Sensor Networks

    No full text
    Power saving is a critical issue in wireless sensor networks (WSNs) since sensor nodes are powered by batteries which cannot be generally changed or recharged. As radio communication is often the main cause of energy consumption, extension of sensor node lifetime is generally achieved by reducing transmissions/receptions of data, for instance through data compression. Exploiting the natural correlation that exists in data typically collected by WSNs and the principles of entropy compression, in this Letter we propose a simple and efficient data compression algorithm particularly suited to be used on available commercial nodes of a WSN, where energy, memory and computational resources are very limited. Some experimental results and comparisons with, to the best of our knowledge, the only lossless compression algorithm previously proposed in the literature to be embedded in sensor nodes and with two well-known compression algorithms are shown and discussed
    • …
    corecore