10 research outputs found

    Exploiting monotonicity constraints for active learning in ordinal classification

    No full text
    We consider ordinal classification and instance ranking problems where each attribute is known to have an increasing or decreasing relation with the class label or rank. For example, it stands to reason that the number of query terms occurring in a document has a positive influence on its relevance to the query. We aim to exploit such monotonicity constraints by using labeled attribute vectors to draw conclusions about the class labels of order related unlabeled ones. Assuming we have a pool of unlabeled attribute vectors, and an oracle that can be queried for class labels, the central problem is to choose a query point whose label is expected to provide the most information. We evaluate different query strategies by comparing the number of inferred labels after some limited number of queries, as well as by comparing the prediction errors of models trained on the points whose labels have been determined so far. We present an efficient algorithm to determine the query point preferred by the well-known active learning strategy generalized binary search. This algorithm can be applied to binary classification on incomplete matrix orders. For non-binary classification, we propose to include attribute vectors in the training set whose class labels have not been uniquely determined yet. We perform experiments on artificial and real data

    Exploiting Monotonicity Constraints in Active Learning for Ordinal Classification

    No full text
    We consider ordinal classication and instance ranking problems where each attribute is known to have an increasing or decreasing relation with the class label or rank. For example, it stands to reason that the number of query terms occurring in a document has a positive in uence on its relevance to the query. We aim to exploit such monotonicity constraints by using labeled attribute vectors to draw conclusions about the class labels of order related unlabeled ones. Assuming we have a pool of unlabeled attribute vectors, and an oracle that can be queried for class labels, the central problem is to choose a query point whose label is expected to provide the most information. We evaluate dierent query strategies by comparing the number of inferred labels after some limited number of queries, as well as by comparing the prediction errors of models trained on the points whose labels have been determined so far. We present an ecient algorithm to determine the query point preferred by the well-known active learning strategy generalized binary search. This algorithm can be applied to binary classication on incomplete matrix orders. For non-binary classication, we propose to include attribute vectors in the training set whose class labels have not been uniquely determined yet. We perform experiments on articial and real data

    A Quantitative Comparison of Semantic Web Page Segmentation Approaches

    No full text

    Real-Time Adaptive Residual Calculation for Detecting Trend Deviations in Systems with Natural Variability

    No full text
    Real-time detection of potential problems from animal production data is challenging, since these data do not just include chance fluctuations but reflect natural variability as well. This variability makes future observations from a specific instance of the production process hard to predict, even though a general trend may be known. Given the importance of well-established residuals for reliable detection of trend deviations, we present a new method for real-time residual calculation which aims at reducing the effects of natural variability and hence results in residuals reflecting chance fluctuations mostly. The basic idea is to exploit prior knowledge about the general expected data trend and to adapt this trend to the instance of the production process at hand as real data becomes available. We study the behavioural performance of our method by means of artificially generated and real-world data, and compare it against Bayesian linear regression.</p

    Efficient algorithms for finding optimal binary features in numeric and nominal labeled data

    No full text
    An important subproblem in supervised tasks such as decision tree induction and subgroup discovery is finding an interesting binary feature (such as a node split or a subgroup refinement) based on a numeric or nominal attribute, with respect to some discrete or continuous target variable. Often one is faced with a trade-off between the expressiveness of such features on the one hand and the ability to efficiently traverse the feature search space on the other hand. In this article, we present efficient algorithms to mine binary features that optimize a given convex quality measure. For numeric attributes, we propose an algorithm that finds an optimal interval, whereas for nominal attributes, we give an algorithm that finds an optimal value set. By restricting the search to features that lie on a convex hull in a coverage space, we can significantly reduce computation time. We present some general theoretical results on the cardinality of convex hulls in coverage spaces of arbitrary dimensions and perform a complexity analysis of our algorithms. In the important case of a binary target, we show that these algorithms have linear runtime in the number of examples. We further provide algorithms for additive quality measures, which have linear runtime regardless of the target type. Additive measures are particularly relevant to feature discovery in subgroup discovery. Our algorithms are shown to perform well through experimentation and furthermore provide additional expressive power leading to higher-quality results

    Real-time adaptive problem detection in poultry

    No full text
    Real-time identification of unexpected values upon monitoring the production parameters of egg laying hens is quite challenging, as the collected data includes natural variability in addition to chance fluctuation. We present an adaptive method for calculating residuals that reflect the latter type of fluctuation only, and thereby provide for more accurate detection of potential problems. We report on the application of our method to real-world poultry data

    Exploiting Monotonicity Constraints in Active Learning for Ordinal Classification

    No full text
    We consider ordinal classication and instance ranking problems where each attribute is known to have an increasing or decreasing relation with the class label or rank. For example, it stands to reason that the number of query terms occurring in a document has a positive in uence on its relevance to the query. We aim to exploit such monotonicity constraints by using labeled attribute vectors to draw conclusions about the class labels of order related unlabeled ones. Assuming we have a pool of unlabeled attribute vectors, and an oracle that can be queried for class labels, the central problem is to choose a query point whose label is expected to provide the most information. We evaluate dierent query strategies by comparing the number of inferred labels after some limited number of queries, as well as by comparing the prediction errors of models trained on the points whose labels have been determined so far. We present an ecient algorithm to determine the query point preferred by the well-known active learning strategy generalized binary search. This algorithm can be applied to binary classication on incomplete matrix orders. For non-binary classication, we propose to include attribute vectors in the training set whose class labels have not been uniquely determined yet. We perform experiments on articial and real data

    Exceptional Model Mining

    No full text
    Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (common use of subgroup discovery). These, however, do not encompass all forms of “interesting”. To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these targets is chosen to be the target concept. Then, we strive to find subgroups: subsets of the dataset that can be described by a few conditions on single attributes. Such subgroups are deemed interesting when the model over the targets on the subgroup is substantially different from the model on the whole dataset. For instance, we can find subgroups where two target attributes have an unusual correlation, a classifier has a deviating predictive performance, or a Bayesian network fitted on several target attributes has an exceptional structure. We give an algorithmic solution for the EMM framework, and analyze its computational complexity.We also discuss some illustrative applications ofEMMinstances, including using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand

    Real-time adaptive problem detection in poultry

    No full text
    Real-time identification of unexpected values upon monitoring the production parameters of egg laying hens is quite challenging, as the collected data includes natural variability in addition to chance fluctuation. We present an adaptive method for calculating residuals that reflect the latter type of fluctuation only, and thereby provide for more accurate detection of potential problems. We report on the application of our method to real-world poultry data
    corecore