12 research outputs found
Meta-QSAR: a large-scale application of meta-learning to drug design and discovery.
We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small molecules) with associated bioactivities (e.g. inhibition of the target), learn a predictive mapping from molecular representation to activity. Although almost every type of machine learning method has been applied to QSAR learning there is no agreed single best way of learning QSARs, and therefore the problem area is well-suited to meta-learning. We first carried out the most comprehensive ever comparison of machine learning methods for QSAR learning: 18 regression methods, 3 molecular representations, applied to more than 2700 QSAR problems. (These results have been made publicly available on OpenML and represent a valuable resource for testing novel meta-learning methods.) We then investigated the utility of algorithm selection for QSAR problems. We found that this meta-learning approach outperformed the best individual QSAR learning method (random forests using a molecular fingerprint representation) by up to 13%, on average. We conclude that meta-learning outperforms base-learning methods for QSAR learning, and as this investigation is one of the most extensive ever comparisons of base and meta-learning methods ever made, it provides evidence for the general effectiveness of meta-learning over base-learning
Parametric study for optimizing fiber-reinforced concrete properties
Concrete with fiber reinforcement is stronger and more ductile than concrete without reinforcement. Significant efforts have been made to demonstrate the properties and enhancements of concrete after reinforcement with various types and shapes of fibers. However, the issue of optimization in the reinforcement process is still unanswered. There is no academic study in the literature now available that can pinpoint the ideal fiber type, quantity, and shape and, more crucially, the overall technical viability of the reinforcement. The parametric analysis in this study determines the ideal shape, size, and proportion of fibers. The input and output parameters were separated from the optimization design variables. Input parameters included assessment of samples of fresh and mechanical concrete properties and the influence of type, length, and percentage of fiber on concrete performance. The aim was to establish the most efficient relationship between fiber dose and dimension to optimize the combined responses of workability and splitting tensile, flexural, and compressive strength. The mechanical and fresh properties of concrete reinforced with four different fibers, PFRC-1, PFRC-2, SFRC-1, and SFRC-2, were tested. The analysis showed that SFRC-2-20 mm-1%, with compressive, split tensile, flexural, and workability values of 44.7 MPa, 3.64 MPa, 5.3 MPa, and 6.5 cm respectively, was the most effective combination among the materials investigated. The optimization technique employed in this study offers new, important insights into how input and output parameters relate to one another
A rule-based approach for recognition of chemical structure diagrams
In chemical literature much information is given in the form of diagrams depicting chemical structures. In order to access this information electronically, diagrams have to be recognized and translated into a processable format. Although a number of approaches have been proposed for the recognition of molecule diagrams in the literature, they traditionally employ procedural methods with limited flexibility and extensibility. This thesis presents a novel approach that models the principal recognition steps for molecule diagrams in a strictly rule based system. We develop a framework that enables the definition of a set of rules for the recognition of different bond types and arrangements as well as for resolving possible ambiguities. This allows us to view the diagram recognition problem as a process of rewriting an initial set of geometric artefacts into a graph representation of a chemical diagram without the need to adhere to a rigid procedure. We demonstrate the flexibility of the approach by extending it to capture new bond types and compositions. In experimental evaluation we can show that an implementation of our approach outperforms the currently available leading open source system. Finally, we discuss how our framework could be applied to other automatic diagram recognition tasks
IntelliRehabDS (IRDS)—A Dataset of Physical Rehabilitation Movements
In this article, we present a dataset that comprises different physical rehabilitation movements. The dataset was captured as part of a research project intended to provide automatic feedback on the execution of rehabilitation exercises, even in the absence of a physiotherapist. A Kinect motion sensor camera was used to record gestures. The dataset contains repetitions of nine gestures performed by 29 subjects, out of which 15 were patients and 14 were healthy controls. The data are presented in an easily accessible format, provided as 3D coordinates of 25 body joints along with the corresponding depth map for each frame. Each movement was annotated with the gesture type, the position of the person performing the gesture (sitting or standing) as well as a correctness label. The data are publicly available and were released with to provide a comprehensive dataset that can be used for assessing the performance of different patients while performing simple movements in a rehabilitation setting and for comparing these movements with a control group of healthy individuals