12 research outputs found
Stream-based active learning with linear models
The proliferation of automated data collection schemes and the advances in
sensorics are increasing the amount of data we are able to monitor in
real-time. However, given the high annotation costs and the time required by
quality inspections, data is often available in an unlabeled form. This is
fostering the use of active learning for the development of soft sensors and
predictive models. In production, instead of performing random inspections to
obtain product information, labels are collected by evaluating the information
content of the unlabeled data. Several query strategy frameworks for regression
have been proposed in the literature but most of the focus has been dedicated
to the static pool-based scenario. In this work, we propose a new strategy for
the stream-based scenario, where instances are sequentially offered to the
learner, which must instantaneously decide whether to perform the quality check
to obtain the label or discard the instance. The approach is inspired by the
optimal experimental design theory and the iterative aspect of the
decision-making process is tackled by setting a threshold on the
informativeness of the unlabeled data points. The proposed approach is
evaluated using numerical simulations and the Tennessee Eastman Process
simulator. The results confirm that selecting the examples suggested by the
proposed algorithm allows for a faster reduction in the prediction error.Comment: Published in Knowledge-Based Systems (2022
Can Students' Attitudes and Behaviors be Changed by Educational Interventions? A Comparative Case Study
This study examined engineering studentsâ attitudes and behaviors in a first-year Calculus course. Not surprisingly, High School mathematics and physics grades correlated closely with self-reported Calculus grades, and a student survey conducted four years apart demonstrated almost identical attitudes and behaviors despite the introduction of a range of measures aimed to enhance learning. The better the grades, the fairer students deemed it to be, and the less of in-depth learning, the poorer the grades. The higher the ambitions, and the more active and hardworking, the better the grades. Academic success factors included an ability to keep pace with progression, and a commitment to advance learning. The minimal impact of interventions appears as surprising; however, this study brings perspectives to make sense of such data, also capable of producing greater future successes
Introducing Statistical Design of Experiments to SPARQL Endpoint Evaluation
This paper argues that the common practice of benchmarking is inadequate as a scientific evaluation methodology. It further attempts to introduce the empirical tradition of the physical sciences by using techniques from Statistical Design of Experiments applied to the example of SPARQL endpoint performance evaluation. It does so by studying full as well as fractional factorial experiments designed to evaluate an assertion that some change introduced in a system has improved performance. This paper does not present a finished experimental design, rather its main focus is didactical, to shift the focus of the community away from benchmarking towards higher scientific rigor.
The Semantic Web â ISWC 2013. Lecture Notes in Computer Science Volume 8219, 2013, pp 360-375. The final publication is available at Springe
Can students' attitudes and behaviors be changed by educational interventions? A comparative case study
This study examined engineering studentsâ attitudes and behaviors in a first-year Calculus course. Not surprisingly, High School mathematics and physics grades correlated closely with self-reported Calculus grades, and a student survey conducted four years apart demonstrated almost identical attitudes and behaviors despite the introduction of a range of measures aimed to enhance learning. The better the grades, the fairer students deemed it to be, and the less of in-depth learning, the poorer the grades. The higher the ambitions, and the more active and hardworking, the better the grades. Academic success factors included an ability to keep pace with progression, and a commitment to advance learning. The minimal impact of interventions appears as surprising; however, this study brings perspectives to make sense of such data, also capable of producing greater future successes
Split-plot designs for multistage Experimentation
Most of todayâs complex systems and processes involve several stages through which input or the raw material has to go before the final product is obtained. Also in many cases factors at different stages interact. Therefore, a holistic approach for experimentation that considers all stages at the same time will be more efficient. However, there have been only a few attempts in the literature to provide an adequate and easy-to-use approach for this problem. In this paper, we present a novel methodology for constructing two-level split-plot and multistage experiments. The methodology is based on the Kronecker product representation of orthogonal designs and can be used for any number of stages, for various numbers of subplots and for different number of subplots for each stage. The procedure is demonstrated on both regular and nonregular designs and provides the maximum number of factors that can be accommodated in each stage. Furthermore, split-plot designs for multistage experiments with good projective properties are also provided
Assessing some aspects of factor screening with nonnormal responses
Nonnormally distributed response values, such as count data for instance, create challenges for factor screening. One problem is that variances may vary from run to run. Another is the choice of screening design for such responses. In this paper, we assess some screening performances for three popular screening designs: a definite screening design, a minimum resolution IV design, and a PlackettâBurman design. Four distributions, two binomials, one gamma, and one Poisson are chosen for the response values. For each distribution, we test out if it is best to use the raw data, a varianceâstabilizing transformation of the data, or perform a generalized linear modeling assuming three factors are active. From our investigations, twoâlevel nonregular designs gave the highest success rate in identifying the subset of active factors and a varianceâstabilizing transformation turned out to perform equally good or better than generalized linear modeling in most cases
Robust online active learning
In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.</p