2,095,050 research outputs found

    Data reliability in complex directed networks

    Get PDF
    The availability of data from many different sources and fields of science has made it possible to map out an increasing number of networks of contacts and interactions. However, quantifying how reliable these data are remains an open problem. From Biology to Sociology and Economy, the identification of false and missing positives has become a problem that calls for a solution. In this work we extend one of newest, best performing models -due to Guimera and Sales-Pardo in 2009- to directed networks. The new methodology is able to identify missing and spurious directed interactions, which renders it particularly useful to analyze data reliability in systems like trophic webs, gene regulatory networks, communication patterns and social systems. We also show, using real-world networks, how the method can be employed to help searching for new interactions in an efficient way.Comment: Submitted for publicatio

    Reliability of recall in agricultural data

    Get PDF
    Despite the importance of agriculture to economic development, and a vast accompanying literature on the subject, little research has been done on the quality of the underlying data. Due to survey logistics, agricultural data are usually collected by asking respondents to recall the details of events occurring during past agricultural seasons that took place a number of months prior to the interview. This gap can lead to recall bias in reported data on agricultural activities. The problem is further complicated when interviews are conducted over the course of several months, thus leading to recall of variable length. To test for such recall bias, the length of time between harvest and interview is examined for three African countries with respect to several common agricultural input and harvest measures. The analysis shows little evidence of recall bias impacting data quality. There is some indication that more salient events are less subject to recall decay. Overall, the results allay some concerns about the quality of some types of agricultural data collected through recall over lengthy periods.Crops&Crop Management Systems,Educational Sciences,Rural Development Knowledge&Information Systems,Regional Economic Development,Rural Poverty Reduction

    Assessment of SVM Reliability for Microarray Data Analysis

    Get PDF
    The goal of our research is to provide techniques that can assess and validate the results of SVM-based analysis of microarray data. We present preliminary results of the effect of mislabeled training samples. We conducted several systematic experiments on artificial and real medical data using SVMs. We systematically flipped the labels of a fraction of the training data. We show that a relatively small number of mislabeled examples can dramatically decrease the performance as visualized on the ROC graphs. This phenomenon persists even if the dimensionality of the input space is drastically decreased, by using for example feature selection. Moreover we show that for SVM recursive feature elimination, even a small fraction of mislabeled samples can completely change the resulting set of genes. This work is an extended version of the previous paper [MBN04]

    Big Data and Reliability Applications: The Complexity Dimension

    Full text link
    Big data features not only large volumes of data but also data with complicated structures. Complexity imposes unique challenges in big data analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an extensive discussion of the opportunities and challenges in big data and reliability, and described engineering systems that can generate big data that can be used in reliability analysis. Meeker and Hong (2014) focused on large scale system operating and environment data (i.e., high-frequency multivariate time series data), and provided examples on how to link such data as covariates to traditional reliability responses such as time to failure, time to recurrence of events, and degradation measurements. This paper intends to extend that discussion by focusing on how to use data with complicated structures to do reliability analysis. Such data types include high-dimensional sensor data, functional curve data, and image streams. We first provide a review of recent development in those directions, and then we provide a discussion on how analytical methods can be developed to tackle the challenging aspects that arise from the complexity feature of big data in reliability applications. The use of modern statistical methods such as variable selection, functional data analysis, scalar-on-image regression, spatio-temporal data models, and machine learning techniques will also be discussed.Comment: 28 pages, 7 figure

    Software reliability experiments data analysis and investigation

    Get PDF
    The objectives are to investigate the fundamental reasons which cause independently developed software programs to fail dependently, and to examine fault tolerant software structures which maximize reliability gain in the presence of such dependent failure behavior. The authors used 20 redundant programs from a software reliability experiment to analyze the software errors causing coincident failures, to compare the reliability of N-version and recovery block structures composed of these programs, and to examine the impact of diversity on software reliability using subpopulations of these programs. The results indicate that both conceptually related and unrelated errors can cause coincident failures and that recovery block structures offer more reliability gain than N-version structures if acceptance checks that fail independently from the software components are available. The authors present a theory of general program checkers that have potential application for acceptance tests

    Probabilistic estimation of microarray data reliability and underlying gene expression

    Get PDF
    Background: The availability of high throughput methods for measurement of mRNA concentrations makes the reliability of conclusions drawn from the data and global quality control of samples and hybridization important issues. We address these issues by an information theoretic approach, applied to discretized expression values in replicated gene expression data. Results: Our approach yields a quantitative measure of two important parameter classes: First, the probability P(σS)P(\sigma | S) that a gene is in the biological state σ\sigma in a certain variety, given its observed expression SS in the samples of that variety. Second, sample specific error probabilities which serve as consistency indicators of the measured samples of each variety. The method and its limitations are tested on gene expression data for developing murine B-cells and a tt-test is used as reference. On a set of known genes it performs better than the tt-test despite the crude discretization into only two expression levels. The consistency indicators, i.e. the error probabilities, correlate well with variations in the biological material and thus prove efficient. Conclusions: The proposed method is effective in determining differential gene expression and sample reliability in replicated microarray data. Already at two discrete expression levels in each sample, it gives a good explanation of the data and is comparable to standard techniques.Comment: 11 pages, 4 figure
    corecore