939 research outputs found

    EHHR: an efficient evolutionary hyper-heuristic based recommender framework for short-text classifier selection

    Get PDF
    With various machine learning heuristics, it becomes difficult to choose an appropriate heuristic to classify short-text emerging from various social media sources in the form of tweets and reviews. The No Free Lunch theorem asserts that no heuristic applies to all problems indiscriminately. Regardless of their success, the available classifier recommendation algorithms only deal with numeric data. To cater to these limitations, an umbrella classifier recommender must determine the best heuristic for short-text data. This paper presents an efficient reminisce-enabled classifier recommender framework to recommend a heuristic for new short-text data classification. The proposed framework, “Efficient Evolutionary Hyper-heuristic based Recommender Framework for Short-text Classifier Selection (EHHR),” reuses the previous solutions to predict the performance of various heuristics for an unseen problem. The Hybrid Adaptive Genetic Algorithm (HAGA) in EHHR facilitates dataset-level feature optimization and performance prediction. HAGA reveals that the influential features for recommending the best short-text heuristic are the average entropy, mean length of the word string, adjective variation, verb variation II, and average hard examples. The experimental results show that HAGA is 80% more accurate when compared to the standard Genetic Algorithm (GA). Additionally, EHHR clusters datasets and rank heuristics cluster-wise. EHHR clusters 9 out of 10 problems correctly

    Sparse data embedding and prediction by tropical matrix factorization

    Get PDF
    Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values in sparse data. Results We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. Conclusion STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.This work is supported by the Slovene Research Agency, Young Researcher Grant (52096) awarded to AO, and research core funding (P1-0222 to PO and P2-0209 to TC)

    Adaptive ML-based technique for renewable energy system power forecasting in hybrid PV-Wind farms power conversion systems

    Get PDF
    Large scale integration of renewable energy system with classical electrical power generation system requires a precise balance to maintain and optimize the supply–demand limitations in power grids operations. For this purpose, accurate forecasting is needed from wind energy conversion systems (WECS) and solar power plants (SPPs). This daunting task has limits with long-short term and precise term forecasting due to the highly random nature of environmental conditions. This paper offers a hybrid variational decomposition model (HVDM) as a revolutionary composite deep learning-based evolutionary technique for accurate power production forecasting in microgrid farms. The objective is to obtain precise short-term forecasting in five steps of development. An improvised dynamic group-based cooperative search (IDGC) mechanism with a IDGC-Radial Basis Function Neural Network (IDGC-RBFNN) is proposed for enhanced accurate short-term power forecasting. For this purpose, meteorological data with time series is utilized. SCADA data provide the values to the system. The improvisation has been made to the metaheuristic algorithm and an enhanced training mechanism is designed for the short term wind forecasting (STWF) problem. The results are compared with two different Neural Network topologies and three heuristic algorithms: particle swarm intelligence (PSO), IDGC, and dynamic group cooperation optimization (DGCO). The 24 h ahead are studied in the experimental simulations. The analysis is made using seasonal behavior for year-round performance analysis. The prediction accuracy achieved by the proposed hybrid model shows greater results. The comparison is made statistically with existing works and literature showing highly effective accuracy at a lower computational burden. Three seasonal results are compared graphically and statistically.publishedVersio

    PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

    Get PDF
    Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

    From statistical- to machine learning-based network traffic prediction

    Get PDF
    Nowadays, due to the exponential and continuous expansion of new paradigms such as Internet of Things (IoT), Internet of Vehicles (IoV) and 6G, the world is witnessing a tremendous and sharp increase of network traffic. In such large-scale, heterogeneous, and complex networks, the volume of transferred data, as big data, is considered a challenge causing different networking inefficiencies. To overcome these challenges, various techniques are introduced to monitor the performance of networks, called Network Traffic Monitoring and Analysis (NTMA). Network Traffic Prediction (NTP) is a significant subfield of NTMA which is mainly focused on predicting the future of network load and its behavior. NTP techniques can generally be realized in two ways, that is, statistical- and Machine Learning (ML)-based. In this paper, we provide a study on existing NTP techniques through reviewing, investigating, and classifying the recent relevant works conducted in this field. Additionally, we discuss the challenges and future directions of NTP showing that how ML and statistical techniques can be used to solve challenges of NTP.publishedVersio

    A novel IoT intrusion detection framework using Decisive Red Fox optimization and descriptive back propagated radial basis function models.

    Get PDF
    The Internet of Things (IoT) is extensively used in modern-day life, such as in smart homes, intelligent transportation, etc. However, the present security measures cannot fully protect the IoT due to its vulnerability to malicious assaults. Intrusion detection can protect IoT devices from the most harmful attacks as a security tool. Nevertheless, the time and detection efficiencies of conventional intrusion detection methods need to be more accurate. The main contribution of this paper is to develop a simple as well as intelligent security framework for protecting IoT from cyber-attacks. For this purpose, a combination of Decisive Red Fox (DRF) Optimization and Descriptive Back Propagated Radial Basis Function (DBRF) classification are developed in the proposed work. The novelty of this work is, a recently developed DRF optimization methodology incorporated with the machine learning algorithm is utilized for maximizing the security level of IoT systems. First, the data preprocessing and normalization operations are performed to generate the balanced IoT dataset for improving the detection accuracy of classification. Then, the DRF optimization algorithm is applied to optimally tune the features required for accurate intrusion detection and classification. It also supports increasing the training speed and reducing the error rate of the classifier. Moreover, the DBRF classification model is deployed to categorize the normal and attacking data flows using optimized features. Here, the proposed DRF-DBRF security model's performance is validated and tested using five different and popular IoT benchmarking datasets. Finally, the results are compared with the previous anomaly detection approaches by using various evaluation parameters

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Ensemble and continual federated learning for classifcation tasks

    Get PDF
    Federated learning is the state-of-the-art paradigm for training a learning model collaboratively across multiple distributed devices while ensuring data privacy. Under this framework, different algorithms have been developed in recent years and have been successfully applied to real use cases. The vast majority of work in federated learning assumes static datasets and relies on the use of deep neural networks. However, in real world problems, it is common to have a continual data stream, which may be non stationary, leading to phenomena such as concept drift. Besides, there are many multi-device applications where other, non-deep strategies are more suitable, due to their simplicity, explainability, or generalizability, among other reasons. In this paper we present Ensemble and Continual Federated Learning, a federated architecture based on ensemble techniques for solving continual classification tasks. We propose the global federated model to be an ensemble, consisting of several independent learners, which are locally trained. Thus, we enable a flexible aggregation of heterogeneous client models, which may differ in size, structure, or even algorithmic family. This ensemble-based approach, together with drift detection and adaptation mechanisms, also allows for continual adaptation in situations where data distribution changes over time. In order to test our proposal and illustrate how it works, we have evaluated it in different tasks related to human activity recognition using smartphonesOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has received financial support from AEI/FEDER (European Union) Grant Number PID2020-119367RB-I00, as well as the Consellería de Cultura, Educación e Universitade of Galicia (accreditation ED431G-2019/04, ED431G2019/01, and ED431C2018/29), and the European Regional Development Fund (ERDF). It has also been supported by the Ministerio de Universidades of Spain in the FPU 2017 program (FPU17/04154)S
    corecore