2,095,050 research outputs found
Data reliability in complex directed networks
The availability of data from many different sources and fields of science
has made it possible to map out an increasing number of networks of contacts
and interactions. However, quantifying how reliable these data are remains an
open problem. From Biology to Sociology and Economy, the identification of
false and missing positives has become a problem that calls for a solution. In
this work we extend one of newest, best performing models -due to Guimera and
Sales-Pardo in 2009- to directed networks. The new methodology is able to
identify missing and spurious directed interactions, which renders it
particularly useful to analyze data reliability in systems like trophic webs,
gene regulatory networks, communication patterns and social systems. We also
show, using real-world networks, how the method can be employed to help
searching for new interactions in an efficient way.Comment: Submitted for publicatio
Reliability of recall in agricultural data
Despite the importance of agriculture to economic development, and a vast accompanying literature on the subject, little research has been done on the quality of the underlying data. Due to survey logistics, agricultural data are usually collected by asking respondents to recall the details of events occurring during past agricultural seasons that took place a number of months prior to the interview. This gap can lead to recall bias in reported data on agricultural activities. The problem is further complicated when interviews are conducted over the course of several months, thus leading to recall of variable length. To test for such recall bias, the length of time between harvest and interview is examined for three African countries with respect to several common agricultural input and harvest measures. The analysis shows little evidence of recall bias impacting data quality. There is some indication that more salient events are less subject to recall decay. Overall, the results allay some concerns about the quality of some types of agricultural data collected through recall over lengthy periods.Crops&Crop Management Systems,Educational Sciences,Rural Development Knowledge&Information Systems,Regional Economic Development,Rural Poverty Reduction
Assessment of SVM Reliability for Microarray Data Analysis
The goal of our research is to provide techniques that can assess and validate the results of SVM-based analysis of microarray data. We present preliminary results of the effect of mislabeled training samples. We conducted several systematic experiments on artificial and real medical data using SVMs. We systematically flipped the labels of a fraction of the training data. We show that a relatively small number of mislabeled examples can dramatically decrease the performance as visualized on the ROC graphs. This phenomenon persists even if the dimensionality of the input space is drastically decreased, by using for example feature selection. Moreover we show that for SVM recursive feature elimination, even a small fraction of mislabeled samples can completely change the resulting set of genes. This work is an extended version of the previous paper [MBN04]
Big Data and Reliability Applications: The Complexity Dimension
Big data features not only large volumes of data but also data with
complicated structures. Complexity imposes unique challenges in big data
analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an
extensive discussion of the opportunities and challenges in big data and
reliability, and described engineering systems that can generate big data that
can be used in reliability analysis. Meeker and Hong (2014) focused on large
scale system operating and environment data (i.e., high-frequency multivariate
time series data), and provided examples on how to link such data as covariates
to traditional reliability responses such as time to failure, time to
recurrence of events, and degradation measurements. This paper intends to
extend that discussion by focusing on how to use data with complicated
structures to do reliability analysis. Such data types include high-dimensional
sensor data, functional curve data, and image streams. We first provide a
review of recent development in those directions, and then we provide a
discussion on how analytical methods can be developed to tackle the challenging
aspects that arise from the complexity feature of big data in reliability
applications. The use of modern statistical methods such as variable selection,
functional data analysis, scalar-on-image regression, spatio-temporal data
models, and machine learning techniques will also be discussed.Comment: 28 pages, 7 figure
Software reliability experiments data analysis and investigation
The objectives are to investigate the fundamental reasons which cause independently developed software programs to fail dependently, and to examine fault tolerant software structures which maximize reliability gain in the presence of such dependent failure behavior. The authors used 20 redundant programs from a software reliability experiment to analyze the software errors causing coincident failures, to compare the reliability of N-version and recovery block structures composed of these programs, and to examine the impact of diversity on software reliability using subpopulations of these programs. The results indicate that both conceptually related and unrelated errors can cause coincident failures and that recovery block structures offer more reliability gain than N-version structures if acceptance checks that fail independently from the software components are available. The authors present a theory of general program checkers that have potential application for acceptance tests
Probabilistic estimation of microarray data reliability and underlying gene expression
Background: The availability of high throughput methods for measurement of
mRNA concentrations makes the reliability of conclusions drawn from the data
and global quality control of samples and hybridization important issues. We
address these issues by an information theoretic approach, applied to
discretized expression values in replicated gene expression data.
Results: Our approach yields a quantitative measure of two important
parameter classes: First, the probability that a gene is in the
biological state in a certain variety, given its observed expression
in the samples of that variety. Second, sample specific error probabilities
which serve as consistency indicators of the measured samples of each variety.
The method and its limitations are tested on gene expression data for
developing murine B-cells and a -test is used as reference. On a set of
known genes it performs better than the -test despite the crude
discretization into only two expression levels. The consistency indicators,
i.e. the error probabilities, correlate well with variations in the biological
material and thus prove efficient.
Conclusions: The proposed method is effective in determining differential
gene expression and sample reliability in replicated microarray data. Already
at two discrete expression levels in each sample, it gives a good explanation
of the data and is comparable to standard techniques.Comment: 11 pages, 4 figure
- …