34 research outputs found

    Proceedings of the Third Computing Women Congress (CWC 2008): Student papers

    Get PDF
    The Third Computing Women Congress was held at the University of Waikato, Hamilton, New Zealand from February 11th to 13th, 2008. The Computing Women Congress (CWC) is a Summer University for women in Computer Science. It is a meeting-place for female students, academics and professionals who study or work in Information Technology. CWC provides a forum to learn about and share the latest ideas of computing related topics in a supportive environment. CWC provides an open, explorative learning and teaching environment. Experimentation with new styles of learning is encouraged, with an emphasis on hands-on experience and engaging participatory techniques

    Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows

    Get PDF
    peer-reviewedThe ability to accurately predict the conception outcome for a future mating would be of considerable benefit for producers in deciding what mating plan (i.e., expensive semen or less expensive semen) to implement for a given cow. The objective of the present study was to use herd- and cow-level factors to predict the likelihood of conception success to a given insemination (i.e., conception outcome not including embryo loss); of particular interest in the present study was the usefulness of milk mid-infrared (MIR) spectral data in augmenting the accuracy of the prediction model. A total of 4,341 insemination records with conception outcome information from 2,874 lactations on 1,789 cows from 7 research herds for the years 2009 to 2014 were available. The data set was separated into a calibration data set and a validation data set using either of 2 approaches: (1) the calibration data set contained records from all 7 farms for the years 2009 to 2011, inclusive, and the validation data set included data from the 7 farms for the years 2012 to 2014, inclusive, or (2) the calibration data set contained records from 5 farms for all 6 yr and the validation data set contained information from the other 2 farms for all 6 yr. The prediction models were developed with 8 different machine learning algorithms in the calibration data set using standard 10-times 10-fold cross-validation and also by evaluating in the validation data set. The area under curve (AUC) for the receiver operating curve varied from 0.487 to 0.675 across the different algorithms and scenarios investigated. Logistic regression was generally the best-performing algorithm. The AUC was generally inferior for the external validation data sets compared with the calibration data sets. The inclusion of milk MIR in the prediction model generally did not improve the accuracy of prediction. Despite the fair AUC for predicting conception outcome under the different scenarios investigated, the model provided a reasonable prediction of the likelihood of conception success when the high predicted probability instances were considered; a conception rate of 85% was evident in the top 10% of inseminations ranked on predicted probability of conception success in the validation data set

    Continuous Typist Verification using Machine Learning

    Get PDF
    A keyboard is a simple input device. Its function is to send keystroke information to the computer (or other device) to which it is attached. Normally this information is employed solely to produce text, but it can also be utilized as part of an authentication system. Typist verification exploits a typist’s patterns to check whether they are who they say they are, even after standard authentication schemes have confirmed their identity. This thesis investigates whether typists behave in a sufficiently unique yet consistent manner to enable an effective level of verification based on their typing patterns. Typist verification depends on more than the typist’s behaviour. The quality of the patterns and the algorithms used to compare them also determine how accurately verification is performed. This thesis sheds light on all technical aspects of the problem, including data collection, feature identification and extraction, and sample classification. A dataset has been collected that is comparable in size, timing accuracy and content to others in the field, with one important exception: it is derive

    One-class Classification by Combining Density and Class Probability Estimation

    No full text
    Abstract. One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density estimation techniques or by adapting a standard classification algorithm to the problem of carving out a decision boundary that describes the location of the target data. In this paper we investigate a simple method for one-class classification that combines the application of a density estimator, used to form a reference distribution, with the induction of a standard model for class probability estimation. In this method, the reference distribution is used to generate artificial data that is employed to form a second, artificial class. In conjunction with the target class, this artificial class is the basis for a standard two-class learning problem. We explain how the density function of the reference distribution can be combined with the class probability estimates obtained in this way to form an adjusted estimate of the density function of the target class. Using UCI datasets, and data from a typist recognition problem, we show that the combined model, consisting of both a density estimator and a class probability estimator, can improve on using either component technique alone when used for one-class classification. We also compare the method to one-class classification using support vector machines.

    One-Class Classification by Combining Density and Class Probability Estimation

    No full text
    One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density estimation techniques or by adapting a standard classification algorithm to the problem of carving out a decision boundary that describes the location of the target data. In this paper we investigate a simple method for one-class classification that combines the application of a density estimator, used to form a reference distribution, with the induction of a standard model for class probability estimation. In this method, the reference distribution is used to generate artificial data that is employed to form a second, artificial class. In conjunction with the target class, this artificial class is the basis for a standard two-class learning problem. We explain how the density function of the reference distribution can be combined with the class probability estimates obtained in this way to form an adjusted estimate of the density function of the target class. Using UCI datasets, and data from a typist recognition problem, we show that the combined model, consisting of both a density estimator and a class probability estimator, can improve on using either component technique alone when used for one-class classification. We also compare the method to one-class classification using support vector machines

    Failure Prediction – An Application in the Railway Industry

    No full text

    On the pattern recognition and classification of stochastically episodic events

    Get PDF
    Researchers in the field of Pattern Recognition (PR) have traditionally presumed the availability of a representative set of data drawn from the classes of interest, say ω 1 and ω 2 in a 2-class problem. These samples are typically utilized in the development of the system’s discriminant function. It is, however, widely recognized that there exists a particularly challenging class of PR problems for which a representative set is not available for the second class, which has motivated a great deal of research into the so-called domain of One Class ( OC ) classification. In this paper, we extend the frontiers of novelty detection by the introduction of a new field of problems open for analysis. In particular, we note that this new realm deviates from the standard set of OC problems based on the presence of three characteristics, which ultimately amplify the classification challenge. They involve the temporal nature of the appearance of the data, the fact that the data from the classes are “interwoven”, and that a labelling procedure is not merely impractical - it is almost, by definition, impossible. As a first attempt to tackle these problems, we present two specialized classification strategies denoted by Scenarios S 1 and S 2 respectively. In Scenarios S 1, the data is such that standard binary and one-class classifiers can be applied. Alternatively, in Scenarios S 2, the labelling challenge prevents the application of binary classifiers, and instead dictates the novel application of one-class classifiers. The validity of these scenarios has been demonstrated for the exemplary domain involving the Comprehensive Nuclear Test-Ban-Treaty (CTBT), for which our research endeavour has also developed a simulation model. As far as we know, our research in this field is of a pioneering sort, and the results presented here are novel

    Clustering Based One-Class Classification for Compliance Verification of the Comprehensive Nuclear-Test-Ban Treaty

    No full text
    Abstract. Monitoring the levels of radioxenon isotopes in the atmosphere has been proposed as a means of verifying the Comprehensive Nuclear-Test-Ban Treaty (CTBT). This translates into a classification problem, whereby the measured concentrations either belong to an explosion class or a background class. Instances drawn from the explosions class are extremely rare, if not non-existent. Therefore, the resulting dataset is extremely imbalanced, and inherently suited for one-class classification. Further exacerbating the problem is the fact that the background distribution can be extremely complex, and thus, modelling it using one-class learning is difficult. In order to improve upon the previous classification results, we investigate the augmentation of one-class learning methods with clustering. The purpose of clustering is to convert a complex distribution into simpler distributions, the clusters, over which more effective models can be built. The resulting model, built from oneclass learners trained over the clusters, performs more effectively than a model that is built over the original distribution. This thesis is empirically tested on three different data domains; in particular, a number of artificial datasets, datasets from the UCI repository, and data modelled after the extremely challenging CTBT. The results offer credence to the fact that there is an improvement in performance when clustering is used with one-class classification on complex distributions.
    corecore