7 research outputs found

    Auswahl und Implementierung eines Scientific Workflow Management Systems zur Analyse von Next-Generation Sequencing Daten

    Get PDF
    In dem transregionalen Sonderforschungsbereich SFB/TRR 77 untersuchen Heidelberger und Hannoveraner Wissenschaftler Entstehungsmechanismen und neue Therapieansätze des Leberzellkarzinoms, einer der tödlichsten Tumorerkrankungen unserer Zeit. Die IT-Plattform Pelican, die ein Teil des Gebiets Z2 ist, soll dem Forschungsverbund die softwaregestütze Analyse und die nachhaltige Bereitstellung von Leberkrebs-Forschungsdaten ermöglichen [Ganzinger et al. 2011]. Ein Teil von Pelican soll eine gemeinsame Informationsplattform anbieten, die die biomedizinischen Daten der verschiedenen medizinischen und biologischen Projekte integriert und den beteiligten Projektgruppen biostatistische Programme und projektübergreifende Auswertungen zur Verfügung stellt. Die Integration von Gewebe-, Molekül-, Genetik- und Klinikdaten in eine gemeinsame Plattform ermöglicht Datenerhaltung und umfassende Analysen. Die integrierte Analyse begegnet durch die Verknüpfung verschiedener Forschungsprojekte des SFB/TRR 77 den Herausforderungen der Multidisziplinarität klinischer Forschung und Genforschung. Mit dem Next-Generation DNA Sequencing ist durch Kostenreduzierung und immenser Zeiteinsparung die DNA Sequenzierung einem breiten Spektrum an Wissenschaftlern zugänglich geworden und hat Kompetenzen zur Sequenzierung von zentralen Stellen in die Hände vieler individueller Forscher gelegt [Shendure and Ji 2008, Ding et al. 2010, Wetterstrand 2011]. Die Kombination dieser hochentwickelten Technologien aus der Gentechnik und rechnerbasierten Werkzeugen erlaubt die Beantwortung biologischer Fragestellungen in erheblich umfangreicherer Art und Weise als dies bisher möglich gewesen ist [Shaer et al. 2013]. Die rasche Entwicklung des Next-Generation Sequencing beinhaltet auch das Konstruieren neuer Ansätze zur bioinformatischen Datenanalyse, ohne die kein Informationsgewinn, wie beispielsweise die Entdeckung von Genvariationen, möglich wäre. Das dabei neu gewonnene Wissen kann zu erheblichen Fortschritten in der Krebsforschung führen, beispielsweise wenn es um das Identifizieren der Genomveränderungen einer Tumorzelle geht [Ding et al. 2010]. Anstatt Sequenzierungen in kleinem Maßstab durchzuführen, können Forscher inzwischen Sequenzierungen in weit umfangreicherem Ausmaß realisieren, in denen Informationen von multiplen Genen und Genomen vermessen, dokumentiert und in Datenbanken gespeichert werden können. Die DNA Sequenzen werden nach der Sequenzierung in einer Kette aus vielen Prozessschritten – eine bioinformatische Pipeline – analysiert und verarbeitet. Zu den Einzelschritte, wie zum Beispiel Alignment oder die Entfernung von Duplikaten, gibt es oftmals viele Alternativen

    AVENTIS - An architecture for event data analysis

    Full text link
    Time-stamped event data is being generated at an exponential rate from various sources (sensor networks, e-markets etc.), which are stored in event logs and made available to researchers. Despite the data deluge and evolution of a plethora of tools and technologies, science behind exploratory analysis and knowledge discovery lags. There are several reasons behind this. In conducting event data analysis, researchers typically detect a pattern or trend in the data through computation of time-series measures and apply the computed measures to several mathematical models to glean information from data. This is a complex and time-consuming process covering a range of activities from data capture (from a broad array of data sources) to interpretation and dissemination of experimental results forming a pipeline of activities. Further, data-analysis is conducted by domain-users, who are typically non-IT experts but data processing tools and applications are largely developed by application developers. End-users not only lack the critical skills to build a structured analysis pipeline, but are also perplexed by the number of different ways available to derive the necessary information. Consequently, this thesis proposes AVENTIS (Architecture for eVENT Data analysIS), a novel framework to guide the design of analytic solutions to facilitate time-series analysis of event data and is tailored to the needs of domain users. The framework comprises three components; a knowledge base, a model-driven analytic methodology and an accompanying software architecture that provides the necessary technical and operational requirements. Specifically, the research contribution lies in the ability of the framework to enable expressing analysis requirements at a level of abstraction consistent with the domain users and readily make available the information sought without the users having to build the analysis process themselves. Secondly, the framework also facilitates an abstract design space for the domain experts to enable them to build conceptual models of their experiment as a sequence of structured tasks in a technology neutral manner and transparently translate these abstract process models to executable implementations. To evaluate the AVENTIS framework, a prototype based on AVENTIS is implemented and tested with case studies taken from the financial research domain

    Efficient Point Clustering for Visualization

    Get PDF
    The visualization of large spatial point data sets constitutes a problem with respect to runtime and quality. A visualization of raw data often leads to occlusion and clutter and thus a loss of information. Furthermore, particularly mobile devices have problems in displaying millions of data items. Often, thinning via sampling is not the optimal choice because users want to see distributional patterns, cardinalities and outliers. In particular for visual analytics, an aggregation of this type of data is very valuable for providing an interactive user experience. This thesis defines the problem of visual point clustering that leads to proportional circle maps. It furthermore introduces a set of quality measures that assess different aspects of resulting circle representations. The Circle Merging Quadtree constitutes a novel and efficient method to produce visual point clusterings via aggregation. It is able to outperform comparable methods in terms of runtime and also by evaluating it with the aforementioned quality measures. Moreover, the introduction of a preprocessing step leads to further substantial performance improvements and a guaranteed stability of the Circle Merging Quadtree. This thesis furthermore addresses the incorporation of miscellaneous attributes into the aggregation. It discusses means to provide statistical values for numerical and textual attributes that are suitable for side-views such as plots and data tables. The incorporation of multiple data sets or data sets that contain class attributes poses another problem for aggregation and visualization. This thesis provides methods for extending the Circle Merging Quadtree to output pie chart maps or maps that contain circle packings. For the latter variant, this thesis provides results of a user study that investigates the methods and the introduced quality criteria. In the context of providing methods for interactive data visualization, this thesis finally presents the VAT System, where VAT stands for visualization, analysis and transformation. This system constitutes an exploratory geographical information system that implements principles of visual analytics for working with spatio-temporal data. This thesis details on the user interface concept for facilitating exploratory analysis and provides the results of two user studies that assess the approach

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    Get PDF
    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357
    corecore