83 research outputs found

    Classifying the Correctness of Generated White-Box Tests: An Exploratory Study

    Full text link
    White-box test generator tools rely only on the code under test to select test inputs, and capture the implementation's output as assertions. If there is a fault in the implementation, it could get encoded in the generated tests. Tool evaluations usually measure fault-detection capability using the number of such fault-encoding tests. However, these faults are only detected, if the developer can recognize that the encoded behavior is faulty. We designed an exploratory study to investigate how developers perform in classifying generated white-box test as faulty or correct. We carried out the study in a laboratory setting with 54 graduate students. The tests were generated for two open-source projects with the help of the IntelliTest tool. The performance of the participants were analyzed using binary classification metrics and by coding their observed activities. The results showed that participants incorrectly classified a large number of both fault-encoding and correct tests (with median misclassification rate 33% and 25% respectively). Thus the real fault-detection capability of test generators could be much lower than typically reported, and we suggest to take this human factor into account when evaluating generated white-box tests.Comment: 13 pages, 7 figure

    AN EXHAUSTIVE COEFFICIENT OF RANK CORRELATION

    Get PDF
    Rank association is a fundamental tool for expressing dependence in cases in which data are arranged in order. Measures of rank correlation have been accumulated in several contexts for more than a century and we were able to cite more than thirty of these coefficients, from simple ones to relatively complicated definitions invoking one or more systems of weights. However, only a few of these can actually be considered to be admissible substitutes for Pearson’s correlation. The main drawback with the vast majority of coefficients is their “resistance-tochange” which appears to be of limited value for the purposes of rank comparisons that are intrinsically robust. In this article, a new nonparametric correlation coefficient is defined that is based on the principle of maximization of a ratio of two ranks. In comparing it with existing rank correlations, it was found to have extremely high sensitivity to permutation patterns. We have illustrated the potential improvement that our index can provide in economic contexts by comparing published results with those obtained through the use of this new index. The success that we have had suggests that our index may have important applications wherever the discriminatory power of the rank correlation coefficient should be particularly strong.Ordinal data, Nonparametric agreement, Economic applications

    Analysis of Microarray Data using Machine Learning Techniques on Scalable Platforms

    Get PDF
    Microarray-based gene expression profiling has been emerged as an efficient technique for classification, diagnosis, prognosis, and treatment of cancer disease. Frequent changes in the behavior of this disease, generate a huge volume of data. The data retrieved from microarray cover its veracities, and the changes observed as time changes (velocity). Although, it is a type of high-dimensional data which has very large number of features rather than number of samples. Therefore, the analysis of microarray high-dimensional dataset in a short period is very much essential. It often contains huge number of data, only a fraction of which comprises significantly expressed genes. The identification of the precise and interesting genes which are responsible for the cause of cancer is imperative in microarray data analysis. Most of the existing schemes employ a two phase process such as feature selection/extraction followed by classification. Our investigation starts with the analysis of microarray data using kernel based classifiers followed by feature selection using statistical t-test. In this work, various kernel based classifiers like Extreme learning machine (ELM), Relevance vector machine (RVM), and a new proposed method called kernel fuzzy inference system (KFIS) are implemented. The proposed models are investigated using three microarray datasets like Leukemia, Breast and Ovarian cancer. Finally, the performance of these classifiers are measured and compared with Support vector machine (SVM). From the results, it is revealed that the proposed models are able to classify the datasets efficiently and the performance is comparable to the existing kernel based classifiers. As the data size increases, to handle and process these datasets becomes very bottleneck. Hence, a distributed and a scalable cluster like Hadoop is needed for storing (HDFS) and processing (MapReduce as well as Spark) the datasets in an efficient way. The next contribution in this thesis deals with the implementation of feature selection methods, which are able to process the data in a distributed manner. Various statistical tests like ANOVA, Kruskal-Wallis, and Friedman tests are implemented using MapReduce and Spark frameworks which are executed on the top of Hadoop cluster. The performance of these scalable models are measured and compared with the conventional system. From the results, it is observed that the proposed scalable models are very efficient to process data of larger dimensions (GBs, TBs, etc.), as it is not possible to process with the traditional implementation of those algorithms. After selecting the relevant features, the next contribution of this thesis is the scalable viii implementation of the proximal support vector machine classifier, which is an efficient variant of SVM. The proposed classifier is implemented on the two scalable frameworks like MapReduce and Spark and executed on the Hadoop cluster. The obtained results are compared with the results obtained using conventional system. From the results, it is observed that the scalable cluster is well suited for the Big data. Furthermore, it is concluded that Spark is more efficient than MapReduce due to its an intelligent way of handling the datasets through Resilient distributed dataset (RDD) as well as in-memory processing and conventional system to analyze the Big datasets. Therefore, the next contribution of the thesis is the implementation of various scalable classifiers base on Spark. In this work various classifiers like, Logistic regression (LR), Support vector machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), and Radial basis function network (RBFN) with two variants hybrid and gradient descent learning algorithms are proposed and implemented using Spark framework. The proposed scalable models are executed on Hadoop cluster as well as conventional system and the results are investigated. From the obtained results, it is observed that the execution of the scalable algorithms are very efficient than conventional system for processing the Big datasets. The efficacy of the proposed scalable algorithms to handle Big datasets are investigated and compared with the conventional system (where data are not distributed, kept on standalone machine and processed in a traditional manner). The comparative analysis shows that the scalable algorithms are very efficient to process Big datasets on Hadoop cluster rather than the conventional system

    GROWTH ANALYSIS OF Cineraria maritima PLANTS IN GREEN FAÇADE SYSTEMS: NORTHEASTERN ROMANIA CLIMATE STUDY

    Get PDF
    Green facades are gradually gaining popularity and may become a modern architectural solution for higher microclimate quality and better urban comfort in densely populated urban areas. This study aimed to monitor the behaviour of Cineraria maritima planted in green facade systems oriented towards four cardinal points in the specific climatic conditions of northeastern Romania in order to test its adaptability and growth in this system. Comparisons were made of its behaviour between the facades of the experimental structure, and between the facades of the experimental structure and the traditional ‘planted in soil’ variant (control variant). Cineraria maritima exhibited good adaptability to vertical cultivation, maintaining its aesthetic properties throughout the growing season. All specimens that overwintered on the facades successfully survived the cold season of 2021–2022 without requiring any cutting or protection measures

    Growth analysis of Cineraria maritima plants in green façade systems: Northeastern Romania climate study

    Get PDF
    Green facades are gradually gaining popularity and may become a modern architectural solution for higher microclimate quality and better urban comfort in densely populated urban areas. This study aimed to monitor the behaviour of Cineraria maritima planted in green facade systems oriented towards four cardinal points in the specific climatic conditions of northeastern Romania in order to test its adaptability and growth in this system. Comparisons were made of its behaviour between the facades of the experimental structure, and between the facades of the experimental structure and the traditional ‘planted in soil’ variant (control variant). Cineraria maritima exhibited good adaptability to vertical cultivation, maintaining its aesthetic properties throughout the growing season. All specimens that overwintered on the facades successfully survived the cold season of 2021–2022 without requiring any cutting or protection measures

    ASSESSMENT AND PREDICTION OF CARDIOVASCULAR STATUS DURING CARDIAC ARREST THROUGH MACHINE LEARNING AND DYNAMICAL TIME-SERIES ANALYSIS

    Get PDF
    In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic signals. Ventricular Fibrillation (VF) is a common presenting dysrhythmia in the setting of cardiac arrest whose main treatment is defibrillation through direct current countershock to achieve return of spontaneous circulation. However, often defibrillation is unsuccessful and may even lead to the transition of VF to more nefarious rhythms such as asystole or pulseless electrical activity. Multiple methods have been proposed for predicting defibrillation success based on examination of the VF waveform. To date, however, no analytical technique has been widely accepted. For a given desired sensitivity, the proposed model provides a significantly higher accuracy and specificity as compared to the state-of-the-art. Notably, within the range of 80-90% of sensitivity, the method provides about 40% higher specificity. This means that when trained to have the same level of sensitivity, the model will yield far fewer false positives (unnecessary shocks). Also introduced is a new model that predicts recurrence of arrest after a successful countershock is delivered. To date, no other work has sought to build such a model. I validate the method by reporting multiple performance metrics calculated on (blind) test sets

    Assessing probabilistic reasoning in verbal-numerical and graphical-pictorial formats: An evaluation of the psychometric properties of an instrument

    Get PDF
    Research on the graphical facilitation of probabilistic reasoning has been characterised by the effort expended to identify valid assessment tools. The authors developed an assessment instrument to compare reasoning performances when problems were presented in verbal-numerical and graphical-pictorial formats. A sample of undergraduate psychology students (n=676) who had not developed statistical skills, solved problems requiring probabilistic reasoning. They attended universities in Spain (n=127; f=71.7%) and Italy (n=549; f=72.9%). In Italy 173 undergraduates solved these problems under time pressure. The remaining students solved the problems without time limits. Classical Test Theory (CTT) and Item Response Theory (IRT) were applied to assess the effect of two formats and to evaluate criterion and discriminant validity. The instrument showed acceptable psychometric properties, providing preliminary evidence of validity

    Learning programming via worked-examples: the effects of cognitive load and learning styles

    Get PDF
    This research explored strategies for learning programming via worked-examples that promote schema acquisition and transfer. However, learning style is a factor in how much learners are willing to expend serious effort on understanding worked-examples, with active learners tending to be more impatient of them than reflective learners. It was hypothesised that these two learning styles might also interact with learners’ cognitive load. The research proposed a worked-example format, called a Paired-method strategy that combines a Structure-emphasising strategy with a Completion strategy. An experiment was conducted to compare the effects of the three worked-examples strategies on cognitive load measures and on learning performance. The experiment also examined the degree to which individual learning style influenced the learning process and performance. Overall, the results of the experiment were inconsistent. In comparing the effects of the three strategies, there were significant differences in reported difficulty and effort during the learning phase, with difficulty but not effort in favour of the Completion strategy. However no significant differences were detected in reported mental effort during the post-tests in the transfer phase. This was also the case for the performance on the post-tests. Concerning efficiency measures, the results revealed significant differences between the three strategy groups in terms of the learning process and task involvement, with the learning process in favour of the Completion strategy. Unexpectedly, no significant differences were observed in learning outcome efficiencies. Despite this, there was a trend in the data that suggested a partial reversal effect for the Completion strategy. Moreover, the results partially replicated earlier findings on the explanation effect. In comparing the effects of the two learning styles, there were no significant differences between active and reflective learners in the three strategy groups on cognitive load measures and on learning performance (nor between reflective learners in the Paired-method strategy and the other strategies). Finally, concerning efficiency measures, there was a significant difference between active learners in the three strategy groups on task involvement. Despite all these, effect sizes ranging from a medium to large suggested that learning styles might have interacted with learners’ cognitive load

    Topics in Complex and Large-scale Data Analysis

    Get PDF
    Past few decades have witnessed skyrocketed development of modern technologies. As a result, data collected from modern technologies are evolving towards a direction with more complicated structure and larger scale, driving the traditional data analysis methods to develop and adapt. In this dissertation, we study three statistical issues rising in data with complicated structure and/or in large scale. In Chapter 2, we propose a Bayesian framework via exponential random graph models (ERGM) to estimate the model parameters and network structures for networks with measurement errors; In Chapter 3, we design a novel network sampling algorithm for large-scale networks with community structure; In Chapter 4, we introduce a proper framework to conduct discrete large-scale hypothesis testing procedure based on local false discovery rate (FDR). The performances of our procedures are evaluated through various simulations and real applications, while necessary theoretical properties are carefully studied as well
    corecore