86 research outputs found
Three-way optimisation of response time, subtask dispersion and energy consumption in split-merge systems
This paper investigates various ways in which the triple trade-off metrics between task response time, subtask dispersion and energy can be improved in split-merge queueing systems. Four ideas, namely dynamic subtask dispersion reduction, state-dependent service times, multiple redundant subtask service servers and restarting subtask service, are examined in the paper. It transpires that all four techniques can be used to improve the triple trade-off, while combinations of the techniques are not necessarily beneficial
ΠΠ±Π·ΠΎΡ ΡΠΈΡΡΠ΅ΠΌ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠΉ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π·Π°ΡΠ²ΠΎΠΊ
This paper is the ο¬rst in a series of two articles devoted to the review of βfork-joinβ (inthe western classiο¬cation) queuing systems or systems with the splitting of incoming queries.This system is a natural model for many other real systems. The article describes the fork-joinqueueing model construction and main characteristics of this model. Special attention is paid tomethods of analysis of the response time of the system. Since the exact expression for the meanresponse time is known only for the case of two servers, the article gives a detailed descriptionof the approach to obtaining an accurate expression of this characteristic. For the case whenthe number of servers is more than two, approximations of the mean response time are obtainedby diο¬erent methods, which is explained by the complexity of the studies due to the existingdependence between the queues of subqueries due to common arrival moments. The paperpresents several methods of approximate analysis: various variants of empirical approximation,i.e. methods that reο¬ne the obtained characteristics by using the results of simulation modeling;interpolation methods using system load limit values in cases when the incoming ο¬ow and servicetime distributions are not exponential.ΠΠ°Π½Π½Π°Ρ ΡΠ°Π±ΠΎΡΠ° ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΏΠ΅ΡΠ²ΠΎΠΉ Π² ΡΠ΅ΡΠΈΠΈ ΠΈΠ· Π΄Π²ΡΡ
ΡΡΠ°ΡΠ΅ΠΉ, ΠΏΠΎΡΠ²ΡΡΡΠ½Π½ΡΡ
ΠΎΠ±Π·ΠΎΡΡ ΡΠΈΡΡΠ΅ΠΌ ΠΌΠ°ΡΡΠΎΠ²ΠΎΠ³ΠΎ ΠΎΠ±ΡΠ»ΡΠΆΠΈΠ²Π°Π½ΠΈΡ Π²ΠΈΠ΄Π° Β«fork-joinΒ» (Π² Π·Π°ΠΏΠ°Π΄Π½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ) ΠΈΠ»ΠΈ ΡΠΈΡΡΠ΅ΠΌΠ°ΠΌ Ρ ΡΠ°ΡΡΠ΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ Π·Π°ΠΏΡΠΎΡΠΎΠ². Π£ΠΊΠ°Π·Π°Π½Π½Π°Ρ ΡΠΈΡΡΠ΅ΠΌΠ° ΡΠ²Π»ΡΠ΅ΡΡΡ Π΅ΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΡΡ Π΄Π»Ρ ΠΌΠ½ΠΎΠ³ΠΈΡ
Π΄ΡΡΠ³ΠΈΡ
ΡΠ΅Π°Π»ΡΠ½ΡΡ
ΡΠΈΡΡΠ΅ΠΌ. Π ΡΡΠ°ΡΡΠ΅ ΠΎΠΏΠΈΡΠ°Π½Ρ ΠΎΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΠΈ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΡΡΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ ΡΠΎΠ΄ΡΡΠ²Π΅Π½Π½ΡΡ
Π΅ΠΉ ΡΠΈΡΡΠ΅ΠΌ, ΠΎΡΠ½ΠΎΠ²Π½ΡΠ΅ ΠΈΡ
Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠΈ. ΠΡΠ΄Π΅Π»ΡΠ½ΠΎΠ΅ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ ΡΠ΄Π΅Π»ΡΠ΅ΡΡΡ ΠΌΠ΅ΡΠΎΠ΄Π°ΠΌ Π°Π½Π°Π»ΠΈΠ·Π° Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΡΠΊΠ»ΠΈΠΊΠ° ΡΠΈΡΡΠ΅ΠΌΡ. ΠΠΎΡΠΊΠΎΠ»ΡΠΊΡ ΡΠΎΡΠ½ΠΎΠ΅ Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΠ΅ Π΄Π»Ρ ΡΡΠ΅Π΄Π½Π΅Π³ΠΎ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΡΠΊΠ»ΠΈΠΊΠ° ΠΈΠ·Π²Π΅ΡΡΠ½ΠΎ ΡΠΎΠ»ΡΠΊΠΎ Π΄Π»Ρ ΡΠ»ΡΡΠ°Ρ Π΄Π²ΡΡ
ΠΏΡΠΈΠ±ΠΎΡΠΎΠ², Π² ΡΡΠ°ΡΡΠ΅ ΠΏΡΠΈΠ²Π΅Π΄Π΅Π½ΠΎ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΠ΅ ΠΎΠΏΠΈΡΠ°Π½ΠΈΠ΅ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π° ΠΊ ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΈΡ ΡΠΎΡΠ½ΠΎΠ³ΠΎ Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΡ ΡΡΠΎΠΉ Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠΈ. ΠΠ»Ρ ΡΠ»ΡΡΠ°Ρ, ΠΊΠΎΠ³Π΄Π° ΡΠΈΡΠ»ΠΎ ΠΏΡΠΈΠ±ΠΎΡΠΎΠ² Π±ΠΎΠ»ΡΡΠ΅ Π΄Π²ΡΡ
, ΡΠ°Π·Π»ΠΈΡΠ½ΡΠΌΠΈ ΠΌΠ΅ΡΠΎΠ΄Π°ΠΌΠΈ ΠΏΠΎΠ»ΡΡΠ΅Π½Ρ Π°ΠΏΠΏΡΠΎΠΊΡΠΈΠΌΠ°ΡΠΈΠΈ ΡΡΠ΅Π΄Π½Π΅Π³ΠΎ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΡΠΊΠ»ΠΈΠΊΠ°,ΡΡΠΎ ΠΎΠ±ΡΡΡΠ½ΡΠ΅ΡΡΡ ΡΠ»ΠΎΠΆΠ½ΠΎΡΡΡΡ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠΉ ΠΈΠ·-Π·Π° ΡΡΡΠ΅ΡΡΠ²ΡΡΡΠ΅ΠΉ Π·Π°Π²ΠΈΡΠΈΠΌΠΎΡΡΠΈ ΠΌΠ΅ΠΆΠ΄Ρ ΠΎΡΠ΅ΡΠ΅Π΄ΡΠΌΠΈ ΠΏΠΎΠ΄ Π·Π°ΠΏΡΠΎΡΠΎΠ² Π² ΡΠΈΠ»Ρ ΠΎΠ±ΡΠΈΡ
ΠΌΠΎΠΌΠ΅Π½ΡΠΎΠ² ΠΏΠΎΡΡΡΠΏΠ»Π΅Π½ΠΈΡ. Π ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΎ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΎ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΏΡΠΈΠ±Π»ΠΈΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π°: ΡΠ°Π·Π»ΠΈΡΠ½ΡΠ΅ Π²Π°ΡΠΈΠ°Π½ΡΡ ΡΠΌΠΏΠΈΡΠΈΡΠ΅ΡΠΊΠΎΠΉ Π°ΠΏΠΏΡΠΎΠΊΡΠΈΠΌΠ°ΡΠΈΠΈ, Ρ.Π΅. ΠΌΠ΅ΡΠΎΠ΄Ρ, ΡΡΠΎΡΠ½ΡΡΡΠΈΠ΅ ΠΏΠΎΠ»ΡΡΠ΅Π½Π½ΡΠ΅ Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠΈ Π±Π»Π°Π³ΠΎΠ΄Π°ΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ² ΠΈΠΌΠΈΡΠ°ΡΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ; ΠΈΠ½ΡΠ΅ΡΠΏΠΎΠ»ΡΡΠΈΡ Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΠΏΡΠ΅Π΄Π΅Π»ΡΠ½ΡΡ
Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ Π·Π°Π³ΡΡΠ·ΠΊΠΈ ΡΠΈΡΡΠ΅ΠΌΡ Π² ΡΠ»ΡΡΠ°ΡΡ
Ρ ΠΎΡΠ»ΠΈΡΠ½ΡΠΌΠΈ ΠΎΡ ΡΠΊΡΠΏΠΎΠ½Π΅Π½ΡΠΈΠ°Π»ΡΠ½ΠΎΠ³ΠΎ ΡΠ°ΡΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡΠΌΠΈ Π΄Π»Ρ Π²Ρ
ΠΎΠ΄ΡΡΠ΅Π³ΠΎ ΠΏΠΎΡΠΎΠΊΠ° ΠΈ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΠ±ΡΠ»ΡΠΆΠΈΠ²Π°Π½ΠΈΡ
ΠΠ±Π·ΠΎΡ ΡΠΈΡΡΠ΅ΠΌ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠΉ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π·Π°ΡΠ²ΠΎΠΊ. Π§Π°ΡΡΡ II
This paper is a continuation of the survey of the βfork-joinβ queuing systems (in the westernclassiο¬cation) or the systems with splitting of queries. Interest in such systems is explainedby a wide range of problems that can be solved with their help, since in fact it is a matter ofparallel processing of data and their applications. For example, this may concern the analysis ofdisk arrays, cloud computing, high-performance services and even the process of picking ordersin a warehouse. In the ο¬rst part of the survey, the main features of the described model (andrelated systems) and its construction were introduced. Also the detailed description of theapproach to obtaining an accurate expression of the average response time in the case of twodevices was presented as well as several methods of approximate analysis of this characteristic(the case when the number of devices is more than two). This part of the survey is devotedto the description of other existing methods for approximating the average response time. Inparticular, the approaches of the approximate analysis of the response time are as follows: thematrix-geometric method, the analysis with the help of order statistics for various types ofdistribution of the service time of subqueries.ΠΠ°Π½Π½Π°Ρ ΡΠ°Π±ΠΎΡΠ° ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΏΡΠΎΠ΄ΠΎΠ»ΠΆΠ΅Π½ΠΈΠ΅ΠΌ ΠΎΠ±Π·ΠΎΡΠ° ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ ΠΌΠ°ΡΡΠΎΠ²ΠΎΠ³ΠΎΠΎΠ±ΡΠ»ΡΠΆΠΈΠ²Π°Π½ΠΈΡ Π²ΠΈΠ΄Π° Β«fork-joinΒ» (Π² Π·Π°ΠΏΠ°Π΄Π½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ) ΠΈΠ»ΠΈ ΡΠΈΡΡΠ΅ΠΌΡ Ρ ΡΠ°ΡΡΠ΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌΠ·Π°ΠΏΡΠΎΡΠΎΠ². ΠΠ½ΡΠ΅ΡΠ΅Ρ ΠΊ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΠ΅ ΠΎΠ±ΡΡΡΠ½ΡΠ΅ΡΡΡ ΡΠΈΡΠΎΠΊΠΈΠΌ ΡΠΏΠ΅ΠΊΡΡΠΎΠΌ Π·Π°Π΄Π°Ρ, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΌΠΎΠ³ΡΡ Π±ΡΡΡ ΡΠ΅ΡΠ΅Π½Ρ Ρ Π΅Ρ ΠΏΠΎΠΌΠΎΡΡΡ, ΠΏΠΎΡΠΊΠΎΠ»ΡΠΊΡ ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈ ΡΠ΅ΡΡ ΠΈΠ΄ΡΡ ΠΎ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠΉΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ΅ Π΄Π°Π½Π½ΡΡ
ΠΈ ΠΈΡ
ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡΡ
. Π ΠΏΡΠΈΠΌΠ΅ΡΡ, ΡΡΠΎ ΠΌΠΎΠΆΠ΅Ρ ΠΊΠ°ΡΠ°ΡΡΡΡ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠ°Π±ΠΎΡΡ Π΄ΠΈΡΠΊΠΎΠ²ΡΡ
ΠΌΠ°ΡΡΠΈΠ²ΠΎΠ², ΠΎΠ±Π»Π°ΡΠ½ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΉ, Π²ΡΡΠΎΠΊΠΎΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΡΡ
ΡΠ΅ΡΠ²ΠΈΡΠΎΠ² ΠΈ Π΄Π°ΠΆΠ΅ ΠΏΡΠΎΡΠ΅ΡΡΠ°ΠΊΠΎΠΌΠΏΠ»Π΅ΠΊΡΠ°ΡΠΈΠΈ Π·Π°ΠΊΠ°Π·ΠΎΠ² Π½Π° ΡΠΊΠ»Π°Π΄Π΅. ΠΡΠ»ΠΈ Π² ΠΏΠ΅ΡΠ²ΠΎΠΉ ΡΠ°ΡΡΠΈ ΠΎΠ±Π·ΠΎΡΠ° Π±ΡΠ»ΠΈ ΠΎΠΏΠΈΡΠ°Π½Ρ ΠΎΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΠΈΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ ΡΠΎΠ΄ΡΡΠ²Π΅Π½Π½ΡΡ
Π΅ΠΉ ΡΠΈΡΡΠ΅ΠΌ, Π° ΡΠ°ΠΊΠΆΠ΅ ΠΏΡΠΈΠ²Π΅Π΄Π΅Π½ΠΎ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΠ΅ ΠΎΠΏΠΈΡΠ°Π½ΠΈΠ΅ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π° ΠΊ ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΈΡ ΡΠΎΡΠ½ΠΎΠ³ΠΎ Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΡ ΡΡΠ΅Π΄Π½Π΅Π³ΠΎ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΡΠΊΠ»ΠΈΠΊΠ° Π² ΡΠ»ΡΡΠ°Π΅ Π΄Π²ΡΡ
ΠΏΡΠΈΠ±ΠΎΡΠΎΠ² ΠΈ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΎ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΎ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΏΡΠΈΠ±Π»ΠΈΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Π΄Π°Π½Π½ΠΎΠΉ Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠΈ Π² ΡΠ»ΡΡΠ°Π΅, ΠΊΠΎΠ³Π΄Π° ΡΠΈΡΠ»ΠΎ ΠΏΡΠΈΠ±ΠΎΡΠΎΠ² Π±ΠΎΠ»ΡΡΠ΅ Π΄Π²ΡΡ
, ΡΠΎ Π²ΠΎ Π²ΡΠΎΡΠΎΠΉ ΡΠ°ΡΡΠΈ ΠΎΠ±Π·ΠΎΡΠ° ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΎΠΎΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π΄ΡΡΠ³ΠΈΡ
ΡΡΡΠ΅ΡΡΠ²ΡΡΡΠΈΡ
ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² Π°ΠΏΠΏΡΠΎΠΊΡΠΈΠΌΠ°ΡΠΈΠΈ ΡΡΠ΅Π΄Π½Π΅Π³ΠΎ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΡΠΊΠ»ΠΈΠΊΠ°. Π ΡΠ°ΡΡΠ½ΠΎΡΡΠΈ, ΠΊ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌΡΠΌ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π°ΠΌ ΠΏΡΠΈΠ±Π»ΠΈΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΎΡΠΊΠ»ΠΈΠΊΠ° ΠΎΡΠ½ΠΎΡΡΡΡΡ:ΠΌΠ°ΡΡΠΈΡΠ½ΠΎ-Π³Π΅ΠΎΠΌΠ΅ΡΡΠΈΡΠ΅ΡΠΊΠΈΠΉ ΠΌΠ΅ΡΠΎΠ΄, Π°Π½Π°Π»ΠΈΠ· Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΠΏΠΎΡΡΠ΄ΠΊΠΎΠ²ΡΡ
ΡΡΠ°ΡΠΈΡΡΠΈΠΊ Π΄Π»Ρ ΡΠ°Π·Π»ΠΈΡΠ½ΡΡ
ΡΠΈΠΏΠΎΠ² ΡΠ°ΡΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ Π²ΡΠ΅ΠΌΠ΅Π½ΠΈ ΠΏΡΠ΅Π±ΡΠ²Π°Π½ΠΈΡ ΠΏΠΎΠ΄Π·Π°ΠΏΡΠΎΡΠΎΠ²
Corridor Location: Generating Competitive and Efficient Route Alternatives
The problem of transmission line corridor location can be considered, at best, a "wicked" public systems decision problem. It requires the consideration of numerous objectives while balancing the priorities of a variety of stakeholders, and designers should be prepared to develop diverse non-inferior route alternatives that must be defensible under the scrutiny of a public forum. Political elements aside, the underlying geographical computational problems that must be solved to provide a set of high quality alternatives are no less easy, as they require solving difficult spatial optimization problems on massive GIS terrain-based raster data sets.Transmission line siting methodologies have previously been developed to guide designers in this endeavor, but close scrutiny of these methodologies show that there are many shortcomings with their approaches. The main goal of this dissertation is to take a fresh look at the process of corridor location, and develop a set of algorithms that compute path alternatives using a foundation of solid geographical theory in order to offer designers better tools for developing quality alternatives that consider the entire spectrum of viable solutions. And just as importantly, as data sets become increasingly massive and present challenging computational elements, it is important that algorithms be efficient and able to take advantage of parallel computing resources.A common approach to simplify a problem with numerous objectives is to combine the cost layers into a composite a priori weighted single-objective raster grid. This dissertation examines new methods used for determining a spatially diverse set of near-optimal alternatives, and develops parallel computing techniques for brute-force near-optimal path enumeration, as well as more elegant methods that take advantage of the hierarchical structure of the underlying path-tree computation to select sets of spatially diverse near optimal paths.Another approach for corridor location is to simultaneously consider all objectives to determine the set of Pareto-optimal solutions between the objectives. This amounts to solving a discrete multi-objective shortest path problem, which is considered to be NP-Hard for computing the full set of non-inferior solutions. Given the difficulty of solving for the complete Pareto-optimal set, this dissertation develops an approximation heuristic to compute path sets that are nearly exact-optimal in a fraction of the time when compared to exact algorithms. This method is then applied as an upper bound to an exact enumerative approach, resulting in significant performance speedups. But as analytic computing continues to moved toward distributed clusters, it is important to optimize algorithms to take full advantage parallel computing. To that extent, this dissertation develops a scalable parallel framework that efficiently solves for the supported/convex solutions of a biobjective shortest path problem. This framework is equally applicable to other biobjective network optimization problems, providing a powerful tool for solving the next generation of location analysis and geographical optimization models
Combining automated processing and customized analysis for large-scale sequencing data
Extensive application of high-throughput methods in life sciences has brought substantial new challenges for data analysis. Often many different steps have to be applied to a large number of samples. Here, workflow management systems support scientists through the automated execution of corresponding large analysis workflows. The first part of this cumulative dissertation concentrates on the development of Watchdog, a novel workflow management system for the automated analysis of large-scale experimental data. Watchdog`s main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. A graphical user interface enables workflow construction using a pre-defined toolset without programming experience and a community sharing platform allows scientists to share toolsets and workflows efficiently. Furthermore, we implemented methods for resuming execution of interrupted or partially modified workflows and for automated deployment of software using package managers and container virtualization.
Using Watchdog, we implemented default analysis workflows for typical types of large-scale biological experiments, such as RNA-seq and ChIP-seq. Although they can be easily applied to new datasets of the same type, at some point such standard workflows reach their limit and customized methods are required to resolve specific questions. Hence, the second part of this dissertation focuses on combining standard analysis workflows with the development of application-specific novel bioinformatics approaches to address questions of interest to our biological collaboration partners. The first study concentrates on identifying the binding motif of the ZNF768 transcription factor, which consists of two anchor regions connected by a variable linker region. As standard motif finding methods detected only the anchors of the motifs separately, a custom method was developed for determining the spaced motif with the linker region. The second study focused on the effect of CDK12 inhibition on transcription. Results obtained from standard RNA-seq analysis indicated substantial transcript shortening upon CDK12 inhibition. We thus developed a new measure to quantify the degree of transcript shortening. In addition, a customized meta-gene analysis framework was developed to model RNA polymerase II progression using ChIP-seq data. This revealed that CDK12 inhibition causes an RNA polymerase II processivity defect resulting in the detected transcript shortening.
In summary, the methods developed in this thesis represent both general contributions to large-scale sequencing data analysis and served to resolve specific questions regarding transcription factor binding and regulation of elongating RNA Polymerase II
Recommended from our members
Mapping the Genomic Context of Mutagenesis
The accumulation of genomic mutations leads to the formation of cancer. For this reason, many efforts have been undertaken to characterise mutational processes in terms of their genomic imprints. A particularly successful approach is matrix-based mutational signature analysis, which identifies prototypical mutation patterns by applying non-negative matrix factorisation to catalogues of single nucleotide variants and other mutation types. However, mutagenesis is a multifaceted event that is affected by the genomic organisation of DNA and cellular processes such as transcription, replication, and DNA repair processes. Moreover, since many mutational processes also generate characteristic multi nucleotide variants, insertion and deletions, and structural variants, it appears valuable to jointly deconvolve broader mutational catalogues to better understand the complex nature of mutagenesis.
In this thesis, I present TensorSignatures, an algorithm to learn mutational signatures jointly across different variant categories as well as their genomic localisation and properties. The analysis of 2,778 primary and 3,824 metastatic cancer genomes of the PCAWG consortium and the HMF cohort shows that practically all signatures operate dynamically in response to various genomic and epigenomic states. The analysis pins differential spectra of UV mutagenesis found in active and inactive chromatin to global genome nucleotide excision repair. TensorSignatures accurately characterises transcription-associated mutagenesis, which is detected in 7 different cancer types. The algorithm also extracts distinct signatures of replication- and double strand break repair-driven mutagenesis by APOBEC3A and 3B with differential numbers and length of mutation clusters. As a fourth example, TensorSignatures reproduces a signature of somatic hypermutation generating highly clustered variants around the transcription start sites of active genes in lymphoid leukaemia, distinct from a more general and less clustered signature of PolΞ·-driven translesion synthesis found in a broad range of cancer types. Finally, I demonstrate TensorSignaturesβ utility by applying it to multiple datasets in various collaboration projects.
Taken together, TensorSignatures adds great detail and refines mutational signature analysis by jointly learning mutation patterns and their genomic determinants. This sheds light on the manifold influences that underlie mutagenesis and helps to pinpoint mutagenic influences which cannot easily be distinguished based on the mutation spectra alone. As mutational signature analysis is an essential element of the cancer genome analysis toolkit, TensorSignatures may help make the growing catalogues of mutational signatures more insightful by highlighting mutagenic mechanisms, or hypotheses thereof, to be investigated in greater depth
Nuclear Fusion Programme: Annual Report of the Association Karlsruhe Institute of Technology/EURATOM ; January 2013 - December 2013 (KIT Scientific Reports ; 7671)
The Karlsruhe Institute of Technology (KIT) is working in the framework of the European Fusion Programme on key technologies in the areas of superconducting magnets, microwave heating systems (Electron-Cyclotron-Resonance-Heating, ECRH), the deuterium-tritium fuel cycle, He-cooled breeding blankets, a He-cooled divertor and structural materials, as well as refractory metals for high heat flux applications including a major participation in the preparation of the international IFMIF project
- β¦