10 research outputs found

    Citation Count Prediction of Academic Papers (Bilimsel Makalelerin Atıf Sayısı Tahmini)

    Get PDF
    Even though measuring the impact of scientific papers is not a straightforward process, their citation counts play a significant role in this determination. Citation count of a paper, however, is not available until the paper gets published and a substantial amount of time passes until it spreads through the community. To overcome this issue, we relax the problem by building a deep learning model that predicts whether a paper will receive at least one citation in a one-year interval after its publication. Our model employs Long ShortTerm Memory (LSTM) to capture the relationship between word sequences. In our study, we also analyze the effect of using the abstract versus full-text of papers over performance. We utilize publicly available datasets in our experiments: Kaggle for the full-text of papers, and Microsoft Academic Graph for extracting the abstract, metadata features and the initial year citation counts of papers. Our obtained results show that the use of full-text leads to higher accuracy, yet with an enormous trade-off on training time. Additionally, paper abstracts are easier to access as compared to the full-text. Finally, our model predicts that this paper will receive at least one citation during its initial year of publication

    On initial population generation in feature subset selection

    Get PDF
    Performance of evolutionary algorithms depends on many factors such as population size, number of generations, crossover or mutation probability, etc. Generating the initial population is one of the important steps in evolutionary algorithms. A poor initial population may unnecessarily increase the number of searches or it may cause the algorithm to converge at local optima. In this study, we aim to find a promising method for generating the initial population, in the Feature Subset Selection (FSS) domain. FSS is not considered as an expert system by itself, yet it constitutes a significant step in many expert systems. It eliminates redundancy in data, which decreases training time and improves solution quality. To achieve our goal, we compare a total of five different initial population generation methods; Information Gain Ranking (IGR), greedy approach and three types of random approaches. We evaluate these methods using a specialized Teaching Learning Based Optimization searching algorithm (MTLBO-MD), and three supervised learning classifiers: Logistic Regression, Support Vector Machines, and Extreme Learning Machine. In our experiments, we employ 12 publicly available datasets, mostly obtained from the well-known UCI Machine Learning Repository. According to their feature sizes and instance counts, we manually classify these datasets as small, medium, or large-sized. Experimental results indicate that all tested methods achieve similar solutions on small-sized datasets. For medium-sized and large-sized datasets, however, the IGR method provides a better starting point in terms of execution time and learning performance. Finally, when compared with other studies in literature, the IGR method proves to be a viable option for initial population generation

    Boosting initial population in multiobjective feature selection with knowledge-based partitioning

    Get PDF
    The quality of features is one of the main factors that affect classification performance. Feature selection aims to remove irrelevant and redundant features from data in order to increase classification accuracy. However, identifying these features is not a trivial task due to a large search space. Evolutionary algorithms have been proven to be effective in many optimization problems, including feature selection. These algorithms require an initial population to start their search mechanism, and a poor initial population may cause getting stuck in local optima. Diversifying the initial population is known as an effective approach to overcome this issue; yet, it may not suffice as the search space grows exponentially with increasing feature sizes. In this study, we propose an enhanced initial population strategy to boost the performance of the feature selection task. In our proposed method, we ensure the diversity of the initial population by partitioning the candidate solutions according to their selected number of features. In addition, we adjust the chances of features being selected into a candidate solution regarding their information gain values, which enables wise selection of features among a vast search space. We conduct extensive experiments on many benchmark datasets retrieved from UCI Machine Learning Repository. Moreover, we apply our algorithm on a real-world, large-scale dataset, i.e., Stanford Sentiment Treebank. We observe significant improvements after the comparisons with three off-the-shelf initialization strategies

    On base station localization in wireless sensor networks

    Get PDF
    Wireless sensor networks (WSN) has been a prominent topic for the past decade. WSN consist of multiple sensor nodes, which collect and convey data to the base station(s). Sensor nodes are expected to run on batteries, and it makes energy the scarce resource for sensor nodes. Energy expenditure of a sensor node mainly depends on data transmission, which is exponentially affected by transmission distance. Consequently, if sensor nodes forward their data to the base station directly, distant sensor nodes will exhaust quickly. On contrary, minimization of transmission distance for each sensor node, i.e., each node transmits its data to the closest sensor node on its path to the base station, depletes the energy of sensor nodes that are closer to the base station fast. As a result, the flow balance in the network must be optimized. In this study, we investigate the effect of optimization of the base station location along with flow balance optimization. For this purpose, we compare five different localization methods on different topologies; three statically located linear programming approaches, a dynamically located nonlinear programming approach and a heuristic based hybrid approach. Experimental results indicate that lifetime improvement of up to 42% is possible in selected scenarios

    A Closer Look at Pure-Text Human-Interaction Proofs

    No full text
    Human-interaction proofs (HIPs) are used to mitigate automated attacks. Security and usability have always been a critical problem for HIPs, especially when "accessibility" is a system requirement. Pure-text HIPs are more favorable from the usability perspective, but they are not secure. Audio HIPs usually cannot reliably distinguish attacks from legitimate use; they are either easy, and can be automatically solved, or hard, even for humans. In this study, we first compare the usability of a currently used pure-text HIP service, textCAPTCHA, against Google's re-CAPTCHA. After analyzing the results, we propose a new HIP system (SMARTCHA). In this system, by using human computation we generate around 21 000 HIP tests. We conduct a user study among 31 visually impaired users to compare SMARTCHA against the latest version of audio reCAPTCHA HIPs. The study results show that SMARTCHA takes less time and is more enjoyable to solve, which suggests that pure-text HIPs could be a promising solution for secure, usable, and accessible HIPs

    Leveraging human computation for pure-text Human Interaction Proofs

    No full text
    Even though purely text-based Human Interaction Proofs (HIPS) have desirable usability and accessibility attributes; they could not overcome the security problems yet. Given the fact that fully automated techniques to generate pure-text HIPs securely do not exist, we propose leveraging human computation for this purpose. We design and implement a system called SMARTCHA, which involves a security engine to perform automated proactive checks on the security of human-generated HIPs and a module for combining human computation with automation to increase the number of HIP questions. In our work, we employ HIP operators who generate around 22 000 questions in total for SMARTCHA system. With a user study of 372 participants, we evaluate the usability of SMARTCHA system and observe that users find solving pure-text HIPs of SMARTCHA system significantly more enjoyable than solving reCAPTCHA visual HIPs. (C) 2016 Elsevier Ltd. All rights reserved

    A Comprehensive Survey on Recent Metaheuristics for Feature Selection

    No full text
    Feature selection has become an indispensable machine learning process for data preprocessing due to the ever-increasing sizes in actual data. There have been many solution methods proposed for feature selection since the 1970s. For the last two decades, we have witnessed the superiority of metaheuristic feature selection algorithms, and tens of new ones are being proposed every year. This survey focuses on the most outstanding recent metaheuristic feature selection algorithms of the last two decades in terms of their performance in exploration/exploitation operators, selection methods, transfer functions, fitness value evaluation, and parameter setting techniques. Current challenges of the metaheuristic feature selection algorithms and possible future research topics are examined and brought to the attention of the researchers as well. Keywords: Feature selection, Survey, Metaheuristic algorithms, Machine learning, Classification

    Predicting the severity of COVID-19 patients using a multi-threaded evolutionary feature selection algorithm

    No full text
    The COVID-19 pandemic has huge effects on the global community and an extreme burden on health systems. There are more than 185 million confirmed cases and 4 million deaths as of July 2021. Besides, the exponential rise in COVID-19 cases requires a quick prediction of the patients' severity for better treatment. In this study, we propose a Multi-threaded Genetic feature selection algorithm combined with Extreme Learning Machines (MG-ELM) to predict the severity level of the COVID-19 patients. We conduct a set of experiments on a recently published real-world dataset. We reprocess the dataset via feature construction to improve the learning performance of the algorithm. Upon comprehensive experiments, we report the most impactful features and symptoms for predicting the patients' severity level. Moreover, we investigate the effects of multi-threaded implementation with statistical analysis. In order to verify the efficiency of MG-ELM, we compare our results with traditional and state-of-the-art techniques. The proposed algorithm outperforms other algorithms in terms of prediction accuracy

    Novel multiobjective TLBO algorithms for the feature subset selection problem

    No full text
    Teaching Learning Based Optimization (TLBO) is a new metaheuristic that has been successfully applied to several intractable optimization problems in recent years. In this study, we propose a set of novel multiobjective TLBO algorithms combined with supervised machine learning techniques for the solution of Feature Subset Selection (FSS) in Binary Classification Problems (FSS-BCP). Selecting the minimum number of features while not compromising the accuracy of the results in FSS-BCP is a multiobjective optimization problem. We propose TLBO as a FSS mechanism and utilize its algorithm-specific parameterless concept that does not require any parameters to be tuned during the optimization. Most of the classical metaheuristics such as Genetic and Particle Swarm Optimization algorithms need additional efforts for tuning their parameters (crossover ratio, mutation ratio, velocity of particle, inertia weight, etc.), which may have an adverse influence on their performance. Comprehensive experiments are carried out on the well-known machine learning datasets of UCI Machine Learning Repository and significant improvements have been observed when the proposed multiobjective TLBO algorithms are compared with state-of-the-art NSGA-II, Particle Swarm Optimization, Tabu Search, Greedy Search, and Scatter Search algorithms

    Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques

    No full text
    This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed during the feature evaluation and classification phases, also possibly reducing the accuracy of the results. Therefore, selecting the minimum number of features while preserving and/or increasing the accuracy of the results at a high level becomes an important issue for achieving fast and accurate binary classification. Our multiobjective evolutionary algorithm includes two phases, FSS using a GA and applying ML techniques for the BCP. Since exhaustively investigating all of the feature subsets is intractable, a GA is preferred for the first phase of the algorithm for intelligently detecting the most appropriate feature subset. The GA uses multiobjective crossover and mutation operators to improve a population of individuals (each representing a selected feature subset) and obtain (near-) optimal solutions through generations. In the second phase of the algorithms, the fitness of the selected subset is decided by using state-of-the-art ML techniques; Logistic Regression, Support Vector Machines, Extreme Learning Machine, K-means, and Affinity Propagation. The performance of the multiobjective evolutionary algorithm (and the ML techniques) is evaluated with comprehensive experiments and compared with state-of-the-art algorithms, Greedy Search, Particle Swarm Optimization, Tabu Search, and Scatter Search. The proposed algorithm was observed to be robust and it performed better than the existing methods on most of the datasets
    corecore