7 research outputs found

    ADR-Miner: An Ant-based data reduction algorithm for classification

    Get PDF
    Classi cation is a central problem in the elds of data mining and machine learning. Using a training set of labeled instances, the task is to build a model (classi er) that can be used to predict the class of new unlabeled instances. Data preparation is crucial to the data mining process, and its focus is to improve the tness of the training data for the learning algorithms to produce more e ective classi ers. Two widely applied data preparation methods are feature selection and instance selection, which fall under the umbrella of data reduction. For my research I propose ADR-Miner, a novel data reduction algorithm that utilizes ant colony optimization (ACO). ADR-Miner is designed to perform instance selection to improve the predictive e ectiveness of the constructed classi cation models. Two versions of ADR-Miner are developed: a base version that uses a single classi cation algorithm during both training and testing, and an extended version which uses separate classi cation algorithms for each phase. The base version of the ADR-Miner algorithm is evaluated against 20 data sets using three classi cation algorithms, and the results are compared to a benchmark data reduction algorithm. The non-parametric Wilcoxon signed-ranks test will is employed to gauge the statistical signi cance of the results obtained. The extended version of ADR-Miner is evaluated against 37 data sets using pairings from fi ve classi cation algorithms and these results are benchmarked against the performance of the classi cation algorithms but without reduction applied as pre-processing. Keywords: Ant Colony Optimization (ACO), Data Mining, Classi cation, Data Reduction

    Automatyczna kategoryzacja wiadomości elektronicznych z zastosowaniem sieci społecznych oraz algorytmów mrowiskowych

    Get PDF
    The dissertation deals with methods that allow the use of Ant Colony Optimization algorithms and Social Networks to solve the problem of automatic categorization of e-mails to folders. The main aim of this work is to create an algorithm that would allow one to improve the classification of emails into folders along with the ability to suggest the creation of new folders. During the implementation of the objectives, the Enron E-mail data set was thoroughly analyzed, cleaned up, adapted to the analyzed problem and transformed to the appropriate structure. Next, a social network was created based on the contacts between the senders and recipients of e-mail messages, as well as on the basis of conducted analysis and observation of the social network, groups of users with a similar social structure were identified. Mailboxes of users belonging to selected groups have been transformed into decision tables. Based on them, a classifier was built using Ant Colony Optimization algorithms, thanks to which it is possible to search a larger space of solutions and find alternative methods of solutions. For the study used classic classifiers, ensembles of classifiers, Ant Colony Decision Tree and Ant Colony Decision Forest algorithms. Analysis of the results obtained contributed to the design of an original algorithm for automatically categorizing e-mails to folders. The proposed algorithm was also used to develop a mechanism for suggesting the creation of new folders, based on the structure of the folders of other users in a given group. All the aims in the dissertation were fully achieved. The theoretical analysis and experiments carried out completely confirmed the thesis

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    River flow monitoring: LS-PIV technique, an image-based method to assess discharge

    Get PDF
    The measurement of the river discharge within a natural ort artificial channel is still one of the most challenging tasks for hydrologists and the scientific community. Although discharge is a physical quantity that theoretically can be measured with very high accuracy, since the volume of water flows in a well-defined domain, there are numerous critical issues in obtaining a reliable value. Discharge cannot be measured directly, so its value is obtained by coupling a measurement of a quantity related to the volume of flowing water and the area of a channel cross-section. Direct measurements of current velocity are made, traditionally with instruments such as current meters. Although measurements with current meters are sufficiently accurate and even if there are universally recognized standards for the current application of such instruments, they are often unusable under specific flow conditions. In flood conditions, for example, due to the need for personnel to dive into the watercourse, it is impossible to ensure adequate safety conditions to operators for carrying out flow measures. Critical issue arising from the use of current meters has been partially addressed thanks to technological development and the adoption of acoustic sensors. In particular, with the advent of Acoustic Doppler Current Profilers (ADCPs), flow measurements can take place without personnel having direct contact with the flow, performing measurements either from the bridge or from the banks. This made it possible to extend the available range of discharge measurements. However, the flood conditions of a watercourse also limit the technology of ADCPs. The introduction of the instrument into the current with high velocities and turbulence would put the instrument itself at serious risk, making it vulnerable and exposed to damage. In the most critical case, the instrument could be torn away by the turbulent current. On the other hand, considering smaller discharges, both current meters and ADCPs are technologically limited in their measurement as there are no adequate water levels for the use of the devices. The difficulty in obtaining information on the lowest and highest values of discharge has important implications on how to define the relationships linking flows to water levels. The stage-discharge relationship is one of the tools through which it is possible to monitor the flow in a specific section of a watercourse. Through this curve, a discharge value can be obtained from knowing the water stage. Curves are site-specific and must be continuously updated to account for changes in geometry that the sections for which they are defined may experience over time. They are determined by making simultaneous discharge and stage measurements. Since instruments such as current meters and ADCPs are traditionally used, stage-discharge curves suffer from instrumental limitations. So, rating curves are usually obtained by interpolation of field-measured data and by extrapolate them for the highest and the lowest discharge values, with a consequent reduction in accuracy. This thesis aims to identify a valid alternative to traditional flow measurements and to show the advantages of using new methods of monitoring to support traditional techniques, or to replace them. Optical techniques represent the best solution for overcoming the difficulties arising from the adoption of a traditional approach to flow measurement. Among these, the most widely used techniques are the Large-Scale Particle Image Velocimetry (LS-PIV) and the Large-Scale Particle Tracking Velocimetry. They are able to estimate the surface velocity fields by processing images representing a moving tracer, suitably dispersed on the liquid surface. By coupling velocity data obtained from optical techniques with geometry of a cross-section, a discharge value can easily be calculated. In this thesis, the study of the LS-PIV technique was deepened, analysing the performance of the technique, and studying the physical and environmental parameters and factors on which the optical results depend. As the LS-PIV technique is relatively new, there are no recognized standards available for the proper application of the technique. A preliminary numerical analysis was conducted to identify the factors on which the technique is significantly dependent. The results of these analyses enabled the development of specific guidelines through which the LS-PIV technique could subsequently be applied in open field during flow measurement campaigns in Sicily. In this way it was possible to observe experimentally the criticalities involved in applying the technique on real cases. These measurement campaigns provided the opportunity to carry out analyses on field case studies and structure an automatic procedure for optimising the LS-PIV technique. In all case studies it was possible to observe how the turbulence phenomenon is a worsening factor in the output results of the LS-PIV technique. A final numerical analysis was therefore performed to understand the influence of turbulence factor on the performance of the technique. The results obtained represent an important step for future development of the topic

    Applied Methuerstic computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Mass Transfer in Multiphase Systems and its Applications

    Get PDF
    This book covers a number of developing topics in mass transfer processes in multiphase systems for a variety of applications. The book effectively blends theoretical, numerical, modeling and experimental aspects of mass transfer in multiphase systems that are usually encountered in many research areas such as chemical, reactor, environmental and petroleum engineering. From biological and chemical reactors to paper and wood industry and all the way to thin film, the 31 chapters of this book serve as an important reference for any researcher or engineer working in the field of mass transfer and related topics
    corecore