4,065 research outputs found

    Concept drift region identification via competence-based discrepancy distribution estimation

    Full text link
    © 2017 IEEE. Real-world data analytics often involves cumulative data. While such data contains valuable information, the pattern or concept underlying these data may change over time and is known as concept drift. When learning under concept drift, it is essential to know when, how and where the context has evolved. Most existing drift detection methods focus only on triggering a signal when drift is detected, and little research has endeavored to explain how and where the data changes. To address this issue, we introduce kernel density estimation into competence-based drift detection method, and invent competence-based discrepancy distribution estimation to identify specific regions in the data feature space where drift has occurred. Two experiments demonstrate that our proposed approach, competence-based discrepancy density estimation, can quantitatively highlight drift regions through data feature space, and produce results that are very close to preset drift regions

    Accumulating regional density dissimilarity for concept drift detection in data streams

    Full text link
    © 2017 Elsevier Ltd In a non-stationary environment, newly received data may have different knowledge patterns from the data used to train learning models. As time passes, a learning model's performance may become increasingly unreliable. This problem is known as concept drift and is a common issue in real-world domains. Concept drift detection has attracted increasing attention in recent years. However, very few existing methods pay attention to small regional drifts, and their accuracy may vary due to differing statistical significance tests. This paper presents a novel concept drift detection method, based on regional-density estimation, named nearest neighbor-based density variation identification (NN-DVI). It consists of three components. The first is a k-nearest neighbor-based space-partitioning schema (NNPS), which transforms unmeasurable discrete data instances into a set of shared subspaces for density estimation. The second is a distance function that accumulates the density discrepancies in these subspaces and quantifies the overall differences. The third component is a tailored statistical significance test by which the confidence interval of a concept drift can be accurately determined. The distance applied in NN-DVI is sensitive to regional drift and has been proven to follow a normal distribution. As a result, the NN-DVI's accuracy and false-alarm rate are statistically guaranteed. Additionally, several benchmarks have been used to evaluate the method, including both synthetic and real-world datasets. The overall results show that NN-DVI has better performance in terms of addressing problems related to concept drift-detection

    A paired learner-based approach for concept drift detection and adaptation in software defect prediction

    Get PDF
    The early and accurate prediction of defects helps in testing software and therefore leads to an overall higher-quality product. Due to drift in software defect data, prediction model performances may degrade over time. Very few earlier works have investigated the significance of concept drift (CD) in software-defect prediction (SDP). Their results have shown that CD is present in software defect data and tha it has a significant impact on the performance of defect prediction. Motivated from this observation, this paper presents a paired learner-based drift detection and adaptation approach in SDP that dynamically adapts the varying concepts by updating one of the learners in pair. For a given defect dataset, a subset of data modules is analyzed at a time by both learners based on their learning experience from the past. A difference in accuracies of the two is used to detect drift in the data. We perform an evaluation of the presented study using defect datasets collected from the SEACraft and PROMISE data repositories. The experimentation results show that the presented approach successfully detects the concept drift points and performs better compared to existing methods, as is evident from the comparative analysis performed using various performance parameters such as number of drift points, ROC-AUC score, accuracy, and statistical analysis using Wilcoxon signed rank test. Keywords: concept drift; naive Bayes; random forest; software defect prediction; software quality assurance.publishedVersio

    Explainable adaptation of time series forecasting

    Get PDF
    A time series is a collection of data points captured over time, commonly found in many fields such as healthcare, manufacturing, and transportation. Accurately predicting the future behavior of a time series is crucial for decision-making, and several Machine Learning (ML) models have been applied to solve this task. However, changes in the time series, known as concept drift, can affect model generalization to future data, requiring thus online adaptive forecasting methods. This thesis aims to extend the State-of-the-Art (SoA) in the ML literature for time series forecasting by developing novel online adaptive methods. The first part focuses on online time series forecasting, including a framework for selecting time series variables and developing ensemble models that are adaptive to changes in time series data and model performance. Empirical results show the usefulness and competitiveness of the developed methods and their contribution to the explainability of both model selection and ensemble pruning processes. Regarding the second part, the thesis contributes to the literature on online ML model-based quality prediction for three Industry 4.0 applications: NC-milling, bolt installation in the automotive industry, and Surface Mount Technology (SMT) in electronics manufacturing. The thesis shows how process simulation can be used to generate additional knowledge and how such knowledge can be integrated efficiently into the ML process. The thesis also presents two applications of explainable model-based quality prediction and their impact on smart industry practices

    Annual Research Report 2021

    Get PDF

    Dispersal and connectivity of northeastern atlantic patellid limpets: a multidisciplinary approach

    No full text
    Dispersal and connectivity of patellid limpets (Patella spp.) in the eastern North Atlantic have been examined by addressing reproductive biology, larval development, population genetics and physical modelling of dispersal. The reproductive cycles of four limpet species were assessed on the northern and central Portugese coast, to determine spawning periods. This information was incorporated into dispersal models. The results showed that P. depressa and P. ulyssiponensis have almost year-round breeding, with a brief resting phase in the early summer. Conversely, the two other species displayed much shorter spawning periods, with gamete release taking place between December and March in P. vulgata and between September and December in P. rustica. The relationship between temperature and planktonic periods in P. depressa, P. ulyssiponensis, and P. vulgata was investigated with laboratory rearing experiments. Average duration of precompetent periods varied inversely with temperature, ranging between 3.7-14.0 days in P. depressa, 2.8-13.7 days in P. ulyssiponensis and 5.7-14.6 days in P. vulgata, whilst delay periods ranged between 15.8-25.4 days in P. depressa, 14.5-27 days in P. ulyssiponensis and 16.5-25 days in P. vulgata. Population genetic structure was examined on a range-wide scale in P. depressa and along the Iberian coast in P. rustica using microsatellite markers, plus one mtDNA locus in P. rustica. Results suggested high levels of gene flow throughout the study area and widespread lack of population differentiation in both species. A biophysical model of dispersal has been developed to assess the degree of demographic connectivity over ecological and evolutionary time frames, and to identify possible barriers to dispersal for P. depressa and P. rustica. The model predicted high levels of connectivity through most of the study area in both species, but in P. depressa simulations identified two large extensions of adult habitat discontinuity as barriers to larval dispersal. The model also showed that despite the potential for long-distance dispersal, most of the larvae released at one given location settle within much shorter distances. These results illustrate the need to view the study of marine dispersal as a multidisciplinary task, and suggest that relying on just one line of evidence may produce misleading results

    A Design Process Centric Application of State Space Modeling as a Function of Communications and Cognitive Skills Assessments.

    Full text link
    Humans have a reliable basic probabilistic intuition. We utilize our probabilistic intuition in many day-to-day activities such as driving. In fact any interaction that occurs in the presence of other independent actors requires some probabilistic assessment. While we are good at sorting between rare and common events, determining if these events are statistical significant is always subject to scrutiny. Quite often the bounds of statistical significance are at ends with the ‘common sense’ expectation. While our probabilistic intuition is good for first moment effects such as driving a car, throwing a football and understanding simplistic mathematical models, our probabilistic intuition fails when we need to evaluate secondary effects such as high speed turns, playing golf or understanding complex mathematical models. When our probabilistic intuition is challenged misinterpretation of results and skewed perspectives of possible outcomes will occur. The work presented in this dissertation provides a mathematical formulation that will provide a guide to when our probabilistic intuition will be challenged. This dissertation will discuss the development of the Process Failure Estimation Technique (ProFET). A multitude of potential team parameters could have been selected, interpersonal communication effectiveness and cognitive skill assessments seemed the most obvious first steps. This is due to the prolific discussion on communication and the general acceptance of the cognitive testing as an indicator of performance potential. The teams skill set must be variable with respect to time in order to accomplish the required objectives of each phase of the design process. ProFET develops a metric for the design process that is sensitive to the team composition and structure. This metric is applied to a domain that is traditionally devoid of objective scoring. With the use of ProFET more informed decisions on team structure and composition can be made at critical junctions of the design process. Specifically, ProFET looks at how variability propagates through the design activities as opposed to attempting to quantify the actual values of design activities, which is the focus of the majority of other design research.PhDNaval Architecture and Marine EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116679/1/jdstrick_1.pd
    corecore