13,971 research outputs found
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Updating, Upgrading, Refining, Calibration and Implementation of Trade-Off Analysis Methodology Developed for INDOT
As part of the ongoing evolution towards integrated highway asset management, the Indiana Department of Transportation (INDOT), through SPR studies in 2004 and 2010, sponsored research that developed an overall framework for asset management. This was intended to foster decision support for alternative investments across the program areas on the basis of a broad range of performance measures and against the background of the various alternative actions or spending amounts that could be applied to the several different asset types in the different program areas. The 2010 study also developed theoretical constructs for scaling and amalgamating the different performance measures, and for analyzing the different kinds of trade-offs. The research products from the present study include this technical report which shows how theoretical underpinnings of the methodology developed for INDOT in 2010 have been updated, upgraded, and refined. The report also includes a case study that shows how the trade-off analysis framework has been calibrated using available data. Supplemental to the report is Trade-IN Version 1.0, a set of flexible and easy-to-use spreadsheets that implement the tradeoff framework. With this framework and using data at the current time or in the future, INDOT’s asset managers are placed in a better position to quantify and comprehend the relationships between budget levels and system-wide performance, the relationships between different pairs of conflicting or non-conflicting performance measures under a given budget limit, and the consequences, in terms of system-wide performance, of funding shifts across the management systems or program areas
Genetic Programming + Unfolding Embryology in Automated Layout Planning
Automated layout planning aims to the implementation of computational methods for the generation and the optimization of floor plans, considering the spatial configuration and the assignment of activities. Sophisticated strategies such as Genetic Algorithms have been implemented as heuristics of good solutions. However, the generative forces that derive from the social structures have been often neglected. This research aims to illustrate that the data that encode the layout’s social and cultural generative forces, can be implemented within an evolutionary system for the design of residential layouts. For that purpose a co-operative system was created, which is composed of a Genetic Programming algorithm and an agent-based unfolding embryology procedure that assigns activities to the spaces generated by the GP algorithm. The assignment of activities is a recursive process which follows instructions encoded as permeability graphs. Furthermore, the Ranking Sum Fitness evaluation method is proposed and applied for the achievement of multi-objective optimization. Its efficiency is tested against the Weighted-Sum Fitness function. The system’s results, both numerical and spatial, are compared to the results of a conventional evolutionary approach. This comparison showed that, in general, the proposed system can yield better solutions
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Automatic online algorithm selection for optimization in cyber-physical production systems
Shrinking product lifecycles, progressing market penetration of innovative product technologies, and increasing demand for product individualization lead to frequent adjustments of production processes and thus to an increasing demand for frequent optimization of production processes. Offline solutions are not always available, and even the optimization problem class itself may have changed in terms of the value landscape of the objective function: Parameters may have been added, the locations of optimal values and the values themselves may have changed. This thesis develops an automatic solution to the algorithm selection problem for continuous optimization. Furthermore, based on the evaluation of three different real-world use cases and a review of well-known architectures from the field of automation and cognitive science, a system architecture suitable for use in large data scenarios was developed. The developed architecture has been implemented and evaluated on two real-world problems: A Versatile Production System (VPS) and Injection Molding Optimization (IM). The developed solution for the VPS was able to automatically tune the feasible algorithms and select the most promising candidate, which significantly outperformed the competitors. This was evaluated by applying statistical tests based on the generated test instances using the process data and by performing benchmark experiments. This solution was extended to the area of multi-objective optimization for the IM use case by specifying an appropriate algorithm portfolio and selecting a suitable performance metric to automatically compare the algorithms. This allows the automatic optimization of three largely uncorrelated objectives: cycle time, average volume shrinkage, and maximum warpage of the parts to be produced. The extension to multi-objective handling for IM optimization showed a huge benefit in terms of manual implementation effort, as most of the work could be done by configuration. The implementation effort was reduced to selecting optimizers and hypervolume computation
- …