1,425 research outputs found
Recommended from our members
High performance latent dirichlet allocation for text mining
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space
A Micro-Genetic Algorithm Approach for Soft Constraint Satisfaction Problem in University Course Scheduling
A university course timetabling problem is a combination of optimization problems. The problems are more challenging when a set of events need to be scheduled in the time slot, to be located to the suitable rooms, which is subjected to several sets of hard and soft constraints. All these constraints that exist as regulations within each resource for the event need to be fulfilled in order to achieve the optimum tasks. In addition, the design of course timetables for universities is a very difficult task because it is a non-deterministic polynomial, (NP) hard problem. This problem can be minimized by using a Micro Genetic Algorithm approach. This approach, encodes a chromosome representation as one of the key elements to ensure the infeasible individual chromosome produced is minimized. Thus, this study proposes an encoding chromosome representation using one-dimensional arrays to improve the Micro Genetic algorithm approach to soft constraint problems in the university course schedule. The research contribution of this study is in developing effective and feasible timetabling software using Micro Genetic Algorithm approach in order to minimize the production of an infeasible individual chromosome compared to the existing optimization algorithm for university course timetabling where UNITAR International University have been used as a data sample. The Micro Genetic Algorithm proposed has been tested in a test comparison with the Standard Genetic algorithm and the Guided Search Genetic algorithm as a benchmark. The results showed that the proposed algorithm is able to generate a minimum number of an infeasible individual chromosome. The result from the experiment also demonstrated that the Micro Genetic Algorithm is capable to produce the best course schedule to the UNITAR International University
Random Keys Genetic Algorithms Scheduling and Rescheduling Systems for Common Production Systems
The majority of scheduling research deals with problems in specific production environments with specific objective functions. However, in many cases, more than one problem type and/or objective function exists, resulting in the need for a more generic and flexible system to generate schedules. Furthermore, most of the published scheduling research focuses on creating an optimal or near optimal initial schedule during the planning phase. However, after production processes start, circumstances like machine breakdowns, urgent jobs, and other unplanned events may render the schedule suboptimal, obsolete or even infeasible resulting in a rescheduling problem, which is typically also addressed for a specific production environment, constraints, and objective functions.
This dissertation introduces a generic framework consisting of models and algorithms based on Random Keys Genetic Algorithms (RKGA) to handle both the scheduling and rescheduling problems in the most common production environments and for various types of objective functions. The Scheduling system produces predictive (initial) schedules for environments including single machines, flow shops, job shops and parallel machine production systems to optimize regular objective functions such as the Makespan and the Total Tardiness as well as non-regular objective functions such as the Total Earliness and Tardiness.
To deal with the rescheduling problem, and using as a basis the same RKGA, a reactive Rescheduling system capable of repairing initial schedules after the occurrence of unexpected events is introduced. The reactive Rescheduling system was designed not only to optimize regular and non-regular objective functions but also to minimize the instability, a very important aspect in rescheduling to avoid shop chaos due to disruptions. Minimizing both schedule inefficiency and instability, however, turns the problem into a multi-objective optimization problem, which is even more difficult to solve.
The computational experiments for the predictive model show that it is able to produce optimal or near optimal schedules to benchmark problems for different production environments and objective functions. Additional computational experiments conducted to test the reactive Rescheduling system under two types of unexpected events, machine breakdowns and the arrival of a rush job, show that the proposed framework and algorithms are robust in handling various problem types and computationally reasonable
Development of simulation-based genetic algorithms model for crew allocation in the precast industry
Lattice QCD Thermodynamics on the Grid
We describe how we have used simultaneously nodes of the
EGEE Grid, accumulating ca. 300 CPU-years in 2-3 months, to determine an
important property of Quantum Chromodynamics. We explain how Grid resources
were exploited efficiently and with ease, using user-level overlay based on
Ganga and DIANE tools above standard Grid software stack. Application-specific
scheduling and resource selection based on simple but powerful heuristics
allowed to improve efficiency of the processing to obtain desired scientific
results by a specified deadline. This is also a demonstration of combined use
of supercomputers, to calculate the initial state of the QCD system, and Grids,
to perform the subsequent massively distributed simulations. The QCD simulation
was performed on a lattice. Keeping the strange quark mass at
its physical value, we reduced the masses of the up and down quarks until,
under an increase of temperature, the system underwent a second-order phase
transition to a quark-gluon plasma. Then we measured the response of this
system to an increase in the quark density. We find that the transition is
smoothened rather than sharpened. If confirmed on a finer lattice, this finding
makes it unlikely for ongoing experimental searches to find a QCD critical
point at small chemical potential
Energy aware hybrid flow shop scheduling
Only if humanity acts quickly and resolutely can we limit global warming' conclude more than 25,000 academics with the statement of SCIENTISTS FOR FUTURE. The concern about global warming and the extinction of species has steadily increased in recent years
Examination timetabling at the University of Cape Town: a tabu search approach to automation
With the rise of schedules and scheduling problems, solutions proposed in literature have expanded yet the disconnect between research and reality remains. The University of Cape Town's (UCT) Examinations Office currently produces their schedules manually with software relegated to error-checking status. While they have requested automation, this study is the first attempt to integrate optimisation techniques into the examination timetabling process. Tabu search and Nelder-Mead methodologies were tested on the UCT November 2014 examination timetabling data with tabu search proving to be more effective, capable of producing feasible solutions from randomised initial solutions. To make this research more accessible, a user-friendly app was developed which showcased the optimisation techniques in a more digestible format. The app includes data cleaning specific to UCT's data management system and was presented to the UCT Examinations Office where they expressed support for further development: in its current form, the app would be used as a secondary tool after an initial solution has been manually obtained
- âŠ