32 research outputs found

    Predicting and Evaluating Software Model Growth in the Automotive Industry

    Full text link
    The size of a software artifact influences the software quality and impacts the development process. In industry, when software size exceeds certain thresholds, memory errors accumulate and development tools might not be able to cope anymore, resulting in a lengthy program start up times, failing builds, or memory problems at unpredictable times. Thus, foreseeing critical growth in software modules meets a high demand in industrial practice. Predicting the time when the size grows to the level where maintenance is needed prevents unexpected efforts and helps to spot problematic artifacts before they become critical. Although the amount of prediction approaches in literature is vast, it is unclear how well they fit with prerequisites and expectations from practice. In this paper, we perform an industrial case study at an automotive manufacturer to explore applicability and usability of prediction approaches in practice. In a first step, we collect the most relevant prediction approaches from literature, including both, approaches using statistics and machine learning. Furthermore, we elicit expectations towards predictions from practitioners using a survey and stakeholder workshops. At the same time, we measure software size of 48 software artifacts by mining four years of revision history, resulting in 4,547 data points. In the last step, we assess the applicability of state-of-the-art prediction approaches using the collected data by systematically analyzing how well they fulfill the practitioners' expectations. Our main contribution is a comparison of commonly used prediction approaches in a real world industrial setting while considering stakeholder expectations. We show that the approaches provide significantly different results regarding prediction accuracy and that the statistical approaches fit our data best

    A novel approach of gait recognition through fusion with footstep information

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. R. Vera-Rodríguez, J. Fiérrez, J. S.D. Mason, J. Ortega-García, "A novel approach of gait recognition through fusion with footstep information" in International Conference on Biometrics (ICB), Madrid (Spain), 2013, 1-6This paper is focused on two biometric modes which are very linked together: gait and footstep biometrics. Footstep recognition is a relatively new biometric based on signals extracted from floor sensors, while gait has been more researched and it is based on video sequences of people walking. This paper reports a directly comparative assessment of both biometrics using the same database (SFootBD) and experimental protocols. A fusion of the two modes leads to an enhanced gait recognition performance, as the information from both modes comes from different capturing devices and is not very correlated. This fusion could find application in indoor scenarios where a gait recognition system is present, such as in security access (e.g. security gate at airports) or smart homes. Gait and footstep systems achieve results of 8.4% and 10.7% EER respectively, which can be significantly improved to 4.8% EER with their fusion at the score level into a walking biometric.This work has been partially supported by projects Bio-Shield (TEC2012-34881), Contexts (S2009/TIC-1485), TeraSense (CSD2008-00068) and “Cátedra UAM-Telefónica”

    Marginalized iterative ensemble smoothers for data assimilation

    Get PDF
    Data assimilation is an important tool in many geophysical applications. One of many key elements of data assimilation algorithms is the measurement error that determines the weighting of the data in the cost function to be minimized. Although the algorithms used for data assimilation treat the measurement uncertainty as known, it is in many cases estimated or set based on some expert opinion. Here we treat the measurement uncertainty as a hyperparameter in a fully Bayesian hierarchical model and derive a new class of iterative ensemble methods for data assimilation where the measurement uncertainty is integrated out. The proposed algorithms are compared with the standard iterative ensemble smoother on a 2D synthetic reservoir model.publishedVersio

    Statistical Analysis and Deep Learning Associated Modeling for Early stage Detection of Carinoma

    Get PDF
    The high death rate and overall complexity of the cancer epidemic is a global health crisis. Progress in cancer prediction based on gene expression has increased in light of the speedy advancement using modern high-throughput sequencing methods and a wide range of machine learning techniques, bringing insights into efficient and precise treatment decision-making. Therefore, it is of significant interest to create machine learning systems that accurately identify cancer patients and healthy people. Although several classification systems have been applied to cancer prediction, no single strategy has proven superior. This research shows how to apply deep learning to an optimization method that uses numerous machine learning models. Statistical analysis has helped us choose informative genes, and we've been feeding those to five different categorization models. The results from the five different classifiers are ensembled in the next step using a deep learning technique. The three most common types of adenocarcinoma are those of the lungs, stomach, and breasts. The suggested deep learning-based inter-ensembles model was tested with deep learning-based algorithms on Carcinoma data. The results of the tests show that relative to using only one set of classifiers or the simple consensus algorithm, it improves the precision of cancer prognosis in every analyzed carcinoma dataset. The suggested deep learning-based inter-ensemble approach is demonstrated to be reliable and efficient for cancer diagnosis by entirely using diverse classifiers

    Efficient Bayesian inference for stochastic volatility models with ensemble MCMC methods

    Full text link
    In this paper, we introduce efficient ensemble Markov Chain Monte Carlo (MCMC) sampling methods for Bayesian computations in the univariate stochastic volatility model. We compare the performance of our ensemble MCMC methods with an improved version of a recent sampler of Kastner and Fruwirth-Schnatter (2014). We show that ensemble samplers are more efficient than this state of the art sampler by a factor of about 3.1, on a data set simulated from the stochastic volatility model. This performance gain is achieved without the ensemble MCMC sampler relying on the assumption that the latent process is linear and Gaussian, unlike the sampler of Kastner and Fruwirth-Schnatter

    Preserving the knowledge of long clinical texts using aggregated ensembles of large language models

    Full text link
    Clinical texts, such as admission notes, discharge summaries, and progress notes, contain rich and valuable information that can be used for various clinical outcome prediction tasks. However, applying large language models, such as BERT-based models, to clinical texts poses two major challenges: the limitation of input length and the diversity of data sources. This paper proposes a novel method to preserve the knowledge of long clinical texts using aggregated ensembles of large language models. Unlike previous studies which use model ensembling or text aggregation methods separately, we combine ensemble learning with text aggregation and train multiple large language models on two clinical outcome tasks: mortality prediction and length of stay prediction. We show that our method can achieve better results than baselines, ensembling, and aggregation individually, and can improve the performance of large language models while handling long inputs and diverse datasets. We conduct extensive experiments on the admission notes from the MIMIC-III clinical database by combining multiple unstructured and high-dimensional datasets, demonstrating our method's effectiveness and superiority over existing approaches. We also provide a comprehensive analysis and discussion of our results, highlighting our method's applications and limitations for future research in the domain of clinical healthcare. The results and analysis of this study is supportive of our method assisting in clinical healthcare systems by enabling clinical decision-making with robust performance overcoming the challenges of long text inputs and varied datasets.Comment: 17 pages, 4 figures, 4 tables, 9 equations and 1 algorith

    Improving the prediction of air pollution peak episodes generated by urban transport networks

    Get PDF
    This paper illustrates the early results of ongoing research developing novel methods to analyse and simulate the relationship between trasport-related air pollutant concentrations and easily accessible explanatory variables. The final scope is to integrate the new models in traditional traffic management support systems for a sustainable mobility of road vehicles in urban areas.This first stage concerns the relationship between the hourly mean concentration of nitrogen dioxide (NO2) and explanatory factors reflecting the NO2 mean level one hour back, along with traffic and weather conditions. Particular attention is given to the prediction of pollution peaks, defined as exceedances of normative concentration limits. Two model frameworks are explored: the Artificial Neural Network approach and the ARIMAX model. Furthermore, the benefit of a synergic use of both models for air quality forecasting is investigated.The analysis of findings points out that the prediction of extreme concentrations is best performed by integrating the two models into an ensemble. The neural network is outperformed by the ARIMAX model in foreseeing peaks, but gives a more realistic representation of the concentration's dependency upon wind characteristics. So, the Neural Network can be exploited to highlight the involved functional forms and improve the ARIMAX model specification. In the end, the study shows that the ability to forecast exceedances of legal pollution limits can be enhanced by requiring traffic management actions when the predicted concentration exceeds a lower threshold than the normative one
    corecore