1,646 research outputs found
Partial Covering Arrays: Algorithms and Asymptotics
A covering array is an array with entries
in , for which every subarray contains each
-tuple of among its rows. Covering arrays find
application in interaction testing, including software and hardware testing,
advanced materials development, and biological systems. A central question is
to determine or bound , the minimum number of rows of
a . The well known bound
is not too far from being
asymptotically optimal. Sensible relaxations of the covering requirement arise
when (1) the set need only be contained among the rows
of at least of the subarrays and (2) the
rows of every subarray need only contain a (large) subset of . In this paper, using probabilistic methods, significant
improvements on the covering array upper bound are established for both
relaxations, and for the conjunction of the two. In each case, a randomized
algorithm constructs such arrays in expected polynomial time
Constructing interaction test suites with greedy algorithms
Combinatorial approaches to testing are used in several fields, and have recently gained momentum in the field of software testing through software interaction testing. One-test-at-a-time greedy algorithms are used to automatically construct such test suites. This paper discusses basic criteria of why greedy algorithms have been appropriate for this test gen-eration problem in the past and then expands upon how greedy algorithms can be utilized to address test suite pri-oritization
AID/APOBEC-network reconstruction identifies pathways associated with survival in ovarian cancer
Background Building up of pathway-/disease-relevant signatures provides a
persuasive tool for understanding the functional relevance of gene alterations
and gene network associations in multifactorial human diseases. Ovarian cancer
is a highly complex heterogeneous malignancy in respect of tumor anatomy,
tumor microenvironment including pro-/antitumor immunity and inflammation;
still, it is generally treated as single disease. Thus, further approaches to
investigate novel aspects of ovarian cancer pathogenesis aiming to provide a
personalized strategy to clinical decision making are of high priority. Herein
we assessed the contribution of the AID/APOBEC family and their associated
genes given the remarkable ability of AID and APOBECs to edit DNA/RNA, and as
such, providing tools for genetic and epigenetic alterations potentially
leading to reprogramming of tumor cells, stroma and immune cells. Results We
structured the study by three consecutive analytical modules, which include
the multigene-based expression profiling in a cohort of patients with primary
serous ovarian cancer using a self-created AID/APOBEC-associated gene
signature, building up of multivariable survival models with high predictive
accuracy and nomination of top-ranked candidate/target genes according to
their prognostic impact, and systems biology-based reconstruction of the AID
/APOBEC-driven disease-relevant mechanisms using transcriptomics data from
ovarian cancer samples. We demonstrated that inclusion of the AID/APOBEC
signature-based variables significantly improves the clinicopathological
variables-based survival prognostication allowing significant patient
stratification. Furthermore, several of the profiling-derived variables such
as ID3, PTPRC/CD45, AID, APOBEC3G, and ID2 exceed the prognostic impact of
some clinicopathological variables. We next extended the signature-/modeling-
based knowledge by extracting top genes co-regulated with target molecules in
ovarian cancer tissues and dissected potential networks/pathways/regulators
contributing to pathomechanisms. We thereby revealed that the AID/APOBEC-
related network in ovarian cancer is particularly associated with
remodeling/fibrotic pathways, altered immune response, and autoimmune
disorders with inflammatory background. Conclusions The herein study is, to
our knowledge, the first one linking expression of entire AID/APOBECs and
interacting genes with clinical outcome with respect to survival of cancer
patients. Overall, data propose a novel AID/APOBEC-derived survival model for
patient risk assessment and reconstitute mapping to molecular pathways. The
established study algorithm can be applied further for any biologically
relevant signature and any type of diseased tissue
From Correlation to Causality: Does Network Information improve Cancer Outcome Prediction?
Motivation:
Disease progression in cancer can vary substantially between patients. Yet, patients often receive the same treatment. Recently, there has been much work on predicting disease progression and patient outcome variables from gene expression in order to personalize treatment options. A widely used approach is high-throughput experiments that aim to explore predictive signature genes which would provide identification of clinical outcome of diseases. Microarray data analysis helps to reveal underlying biological mechanisms of tumor progression, metastasis, and drug-resistance in cancer studies. Despite first diagnostic kits in the market, there are open problems such as the choice of random gene signatures or noisy expression data. The experimental or computational noise in data and limited tissue samples collected from patients might furthermore reduce the predictive power and biological interpretability of such signature genes. Nevertheless, signature genes predicted by different studies generally represent poor similarity; even for the same type of cancer.
Integration of network information with gene expression data could provide more efficient signatures for outcome prediction in cancer studies. One approach to deal with these problems employs gene-gene relationships and ranks genes using the random surfer model of Google's PageRank algorithm. Unfortunately, the majority of published network-based approaches solely tested their methods on a small amount of datasets, questioning the general applicability of network-based methods for outcome prediction.
Methods:
In this thesis, I provide a comprehensive and systematically evaluation of a network-based outcome prediction approach -- NetRank - a PageRank derivative -- applied on several types of gene expression cancer data and four different types of networks. The algorithm identifies a signature gene set for a specific cancer type by incorporating gene network information with given expression data. To assess the performance of NetRank, I created a benchmark dataset collection comprising 25 cancer outcome prediction datasets from literature and one in-house dataset.
Results:
NetRank performs significantly better than classical methods such as foldchange or t-test as it improves the prediction performance in average for 7%. Besides, we are approaching the accuracy level of the authors' signatures by applying a relatively unbiased but fully automated process for biomarker discovery. Despite an order of magnitude difference in network size, a regulatory, a protein-protein interaction and two predicted networks perform equally well.
Signatures as published by the authors and the signatures generated with classical methods do not overlap -- not even for the same cancer type -- whereas the network-based signatures strongly overlap. I analyze and discuss these overlapping genes in terms of the Hallmarks of cancer and in particular single out six transcription factors and seven proteins and discuss their specific role in cancer progression. Furthermore several tests are conducted for the identification of a Universal Cancer Signature. No Universal Cancer Signature could be identified so far, but a cancer-specific combination of general master regulators with specific cancer genes could be discovered that achieves the best results for all cancer types.
As NetRank offers a great value for cancer outcome prediction, first steps for a secure usage of NetRank in a public cloud are described.
Conclusion:
Experimental evaluation of network-based methods on a gene expression benchmark dataset suggests that these methods are especially suited for outcome prediction as they overcome the problems of random gene signatures and noisy expression data. Through the combination of network information with gene expression data, network-based methods identify highly similar signatures over all cancer types, in contrast to classical methods that fail to identify highly common gene sets across the same cancer types.
In general allows the integration of additional information in gene expression analysis the identification of more reliable, accurate and reproducible biomarkers and provides a deeper understanding of processes occurring in cancer development and progression.:1 Definition of Open Problems
2 Introduction
2.1 Problems in cancer outcome prediction
2.2 Network-based cancer outcome prediction
2.3 Universal Cancer Signature
3 Methods
3.1 NetRank algorithm
3.2 Preprocessing and filtering of the microarray data
3.3 Accuracy
3.4 Signature similarity
3.5 Classical approaches
3.6 Random signatures
3.7 Networks
3.8 Direct neighbor method
3.9 Dataset extraction
4 Performance of NetRank
4.1 Benchmark dataset for evaluation
4.2 The influence of NetRank parameters
4.3 Evaluation of NetRank
4.4 General findings
4.5 Computational complexity of NetRank
4.6 Discussion
5 Universal Cancer Signature
5.1 Signature overlap – a sign for Universal Cancer Signature
5.2 NetRank genes are highly connected and confirmed in literature
5.3 Hallmarks of Cancer
5.4 Testing possible Universal Cancer Signatures
5.5 Conclusion
6 Cloud-based Biomarker Discovery
6.1 Introduction to secure Cloud computing
6.2 Cancer outcome prediction
6.3 Security analysis
6.4 Conclusion
7 Contributions and Conclusion
Boundary Images
How are images made, and how should we understand the capacities of digital images? This book investigates images as well as the technologies that host them. Its three chapters discuss the boundaries that images cross and blur between humans, machines, and nature and the ways in which images are political, material, and visual. Exploring these boundaries of images, this book places itself at the limits of the visual and beyond what can be seen, understanding these as starting points for the production of new and radically different ways of knowing about the world and its becomings
Coastal aquaculture and resources management in the Mecoacan estuary, Tabasco, Mexico
By dealing with aspects of coastal aquaculture and resources management, an analysis is herein presented at the macro-scale using GIS techniques for the coastal zone of Tabasco state, and at the micro-scale with the description of the characteristics of a coastal community located in the Mecoacan estuary.
Transfer of appropriate aquaculture technologies and introduction of sustainable farming systems are major challenges. The total area identified for aquaculture development through the GIS modelling accounted for 23 462 ha, 80% of which were located in the Centla Biosphere Reserve (Centla and Macuspana). The suitable area identified through the multi-criteria evaluation provided a structure in which requirements for aquaculture development could be met.
An analysis of the fanning systems in the Mecoacan estuary was carried out to understand local attitudes, capabilities and processes and evaluate whether the potential identified by the GIS modelling can be realised. The results from participatory assessments showed that conditions within Mecoacan cooperatives have deteriorated and increasing interest in restructuring the organisations is regarded as a means of integrating employment and income generation alternatives such as aquaculture practices, to support and improve current levels of fisheries production, and to achieve gains in market development.
The analysis of the economics of Mecoacan fishermen suggests that rural problems have not yet been engaged in progressive policies. It seems that previous forms of governance have been maintained to shore up power instead of laying the groundwork for viable rural production, as it is clear that some fishermen are competitive while others are not, regardless of whether or not they are associated in cooperatives.
The large-scale exploitation of resources, degradation of the environment and increased conflict over resources in coastal communities suggest the need of an integrated multi-sectoral approach. A strategy towards an integrated coastal management for Tabasco coastal zone is discussed, including those related to
aquaculture development
Advances in Evolutionary Algorithms
With the recent trends towards massive data sets and significant computational power, combined with evolutionary algorithmic advances evolutionary computation is becoming much more relevant to practice. Aim of the book is to present recent improvements, innovative ideas and concepts in a part of a huge EA field
Earth Observation Open Science and Innovation
geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc
Recommended from our members
The Effectiveness of <i>t</i>-Way Test Data Generation
Modern society is increasingly dependent on the correct functioning of software and increasingly so in areas that are considered safety related or safety critical. Therefore, there is an increasing need to be able to verify and validate that the software is in fact correct and will perform its intended function. Many approaches to this problem have been proposed; however, none seems likely to supplant the role of testing in the near future.
If we accept that there is, and will be, a continuing need to be able to test software then the question becomes one of how can this be done effectively, both in terms of ability to detect errors and in terms of cost. One avenue of research that offers prospects of improving both of these aspects is the automatic generation of test data.
There has recently been a large amount of work conducted in this area. One particularly promising direction has been the application of ideas from the field of experimental design and in particular, the field of t-way adequate factorial designs.
The area however, is not without issues; there is evidence that the technique is capable of detecting errors but that evidence is not unequivocal. Moreover, as with almost all work in the area of automatic test generation, there has been very little comparative work comparing the technique with other test data generation techniques. Worse, there has been effectively no work done that compares any automatic test data generation technique with the effectiveness of tests generated by humans. Another major issue with the technique is the number of tests that applying the technique can result in. This implies that there is a need for an automated oracle if the technique is to be successfully applied. The flaw with this is of course that in most situations the oracle is the human that is conducting the tests, a point often ignored in testing research.
The work presented here addresses both of these points. To do this I have used a code base taken from an industrial engine control system that has an existing set of high quality unit tests developed by hand. To complement this, several other techniques for automatically generating test data have been applied, namely random testing, random experimental designs and a technique for generating single factor experiments. To address the issue of being able to compare the error detection ability of all of the sets of test vectors, rather than the usual effectiveness surrogates of code coverage I have used mutation analysis on the code base to directly measure the ability of each set of test vectors to discover common coding errors. The results presented here show that test data generation techniques based on t-way factorial designs are at least as effective as handgenerated tests and superior to random testing and the factor experimental technique.
The oracle problem associated with the factorial design techniques was addressed using a test set minimisation approach. The mutation tool monitored which vectors could “kill” which code mutants. After a subset of the test vectors had been run, the most effective vectors were retained and the rest discarded. Likewise, mutants that were killed were removed from further consideration and the process repeated. Experimental results show that this minimisation procedure is effective at reducing computational overhead and is capable of producing final sets of test vectors that are comparable in size with the sets of hand-generated tests and so amenable to final hand checking
- …