307 research outputs found

    Sequential Symbolic Regression with Genetic Programming

    Get PDF
    This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference to GP. More importantly, they show the potential of the proposed strategy: an effective way of applying geometric semantic operators to combine different (partial) solutions, avoiding the exponential growth problem arising from the use of these operators

    Impact of UV radiation on the physical properties of polypropylene floating row covers

    Get PDF
    In the intensive horticulture, various ways of protected area are used for the growth of seedlings and the cultivation of vegetables in all seasons. The easiest and the cheapest form of protected area is agrotextile, which can be laid directly over vegetable crops (row cover). Agrotextiles are nonwovens which are manufactured from textile fibres which are usually of chemical origin. Textiles, used as agrotextiles require suitable tensile strength and good permeability characteristics with no significant deterioration under the influence of weather changes and UV radiation. Properties of agrotextiles depend on the fibres made of and on the type and conditions of production. The purpose of this study was to analyse the influence of simulated sun light radiation (xenon lamp) on physical properties of polypropylene (PP) nonwoven material, which is used for the production of agrotextiles. The research showed that the properties of row cover change when radiated with UV light. Tensile, tearing and bursting properties worsen after radiation and air permeability and water vapour show little increase. The changes in the properties are a consequence of changes in fibres, molecular and supermolecular structure which is exhibited in changed fibres and consequently also nonwoven properties.Key words: Agrotextile, polypropylene, nonwovens, UV radiation, properties

    Randomized Reference Classifier with Gaussian Distribution and Soft Confusion Matrix Applied to the Improving Weak Classifiers

    Full text link
    In this paper, an issue of building the RRC model using probability distributions other than beta distribution is addressed. More precisely, in this paper, we propose to build the RRR model using the truncated normal distribution. Heuristic procedures for expected value and the variance of the truncated-normal distribution are also proposed. The proposed approach is tested using SCM-based model for testing the consequences of applying the truncated normal distribution in the RRC model. The experimental evaluation is performed using four different base classifiers and seven quality measures. The results showed that the proposed approach is comparable to the RRC model built using beta distribution. What is more, for some base classifiers, the truncated-normal-based SCM algorithm turned out to be better at discovering objects coming from minority classes.Comment: arXiv admin note: text overlap with arXiv:1901.0882

    Classification of time series by shapelet transformation

    Get PDF
    Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A \emph{shapelet} is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best kk shapelets, which are used to produce a transformed dataset, where each of the kk features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselve

    Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

    Full text link
    Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.Comment: Preprint version. To appear in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further information. arXiv admin note: text overlap with arXiv:1812.0005

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Combination of linear classifiers using score function -- analysis of possible combination strategies

    Full text link
    In this work, we addressed the issue of combining linear classifiers using their score functions. The value of the scoring function depends on the distance from the decision boundary. Two score functions have been tested and four different combination strategies were investigated. During the experimental study, the proposed approach was applied to the heterogeneous ensemble and it was compared to two reference methods -- majority voting and model averaging respectively. The comparison was made in terms of seven different quality criteria. The result shows that combination strategies based on simple average, and trimmed average are the best combination strategies of the geometrical combination

    Iso-osmotic regulation of nitrate accumulation in lettuce (Lactuca sativa L.)

    Get PDF
    Concerns about possible health hazards arising from human consumption of lettuce and other edible vegetable crops with high concentrations of nitrate have generated demands for a greater understanding of processes involved in its uptake and accumulation in order to devise more sustainable strategies for its control. This paper evaluates a proposed iso-osmotic mechanism for the regulation of nitrate accumulation in lettuce (Lactuca sativa L.) heads. This mechanism assumes that changes in the concentrations of nitrate and all other endogenous osmotica (including anions, cations and neutral solutes) are continually adjusted in tandem to minimise differences in osmotic potential of the shoot sap during growth, with these changes occurring independently of any variations in external water potential. The hypothesis was tested using data from six new experiments, each with a single unique treatment comprising a separate combination of light intensity, N source (nitrate with or without ammonium) and nitrate concentration carried out hydroponically in a glasshouse using a butterhead lettuce variety. Repeat measurements of plant weights and estimates of all of the main soluble constituents (nitrate, potassium, calcium, magnesium, organic anions, chloride, phosphate, sulphate and soluble carbohydrates) in the shoot sap were made at intervals from about 2 weeks after transplanting until commercial maturity, and the data used to calculate changes in average osmotic potential in the shoot. Results showed that nitrate concentrations in the sap increased when average light levels were reduced by between 30 and 49 % and (to a lesser extent) when nitrate was supplied at a supra-optimal concentration, and declined with partial replacement of nitrate by ammonium in the external nutrient supply. The associated changes in the proportions of other endogenous osmotica, in combination with the adjustment of shoot water content, maintained the total solute concentrations in shoot sap approximately constant and minimised differences in osmotic potential between treatments at each sampling date. There was, however, a gradual increase in osmotic potential (ie a decline in total solute concentration) over time largely caused by increases in shoot water content associated with the physiological and morphological development of the plants. Regression analysis using normalised data (to correct for these time trends) showed that the results were consistent with a 1:1 exchange between the concentrations of nitrate and the sum of all other endogenous osmotica throughout growth, providing evidence that an iso-osmotic mechanism (incorporating both concentration and volume regulation) was involved in controlling nitrate concentrations in the shoot

    Improving the k-Nearest Neighbour Rule by an Evolutionary Voting Approach

    Get PDF
    This work presents an evolutionary approach to modify the voting system of the k-Nearest Neighbours (kNN). The main novelty of this article lies on the optimization process of voting regardless of the distance of every neighbour. The calculated real-valued vector through the evolutionary process can be seen as the relative contribution of every neighbour to select the label of an unclassified example. We have tested our approach on 30 datasets of the UCI repository and results have been compared with those obtained from other 6 variants of the kNN predictor, resulting in a realistic improvement statistically supported

    Evolving an optimal decision template for combining classifiers.

    Get PDF
    In this paper, we aim to develop an effective combining algorithm for ensemble learning systems. The Decision Template method, one of the most popular combining algorithms for ensemble systems, does not perform well when working on certain datasets like those having imbalanced data. Moreover, point estimation by computing the average value on the outputs of base classifiers in the Decision Template method is sometimes not a good representation, especially for skewed datasets. Here we propose to search for an optimal decision template in the combining algorithm for a heterogeneous ensemble. To do this, we first generate the base classifier by training the pre-selected learning algorithms on the given training set. The meta-data of the training set is then generated via cross validation. Using the Artificial Bee Colony algorithm, we search for the optimal template that minimizes the empirical 0–1 loss function on the training set. The class label is assigned to the unlabeled sample based on the maximum of the similarity between the optimal decision template and the sample’s meta-data. Experiments conducted on the UCI datasets demonstrated the superiority of the proposed method over several benchmark algorithms
    corecore