11,518 research outputs found
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.
Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today's sequencing and array-based profiling methods present major challenges to visualization tools. The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution. A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data. Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues. To that end, IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems. IGV is freely available for download from http://www.broadinstitute.org/igv, under a GNU LGPL open-source license
ROOT - A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization
ROOT is an object-oriented C++ framework conceived in the high-energy physics
(HEP) community, designed for storing and analyzing petabytes of data in an
efficient way. Any instance of a C++ class can be stored into a ROOT file in a
machine-independent compressed binary format. In ROOT the TTree object
container is optimized for statistical data analysis over very large data sets
by using vertical data storage techniques. These containers can span a large
number of files on local disks, the web, or a number of different shared file
systems. In order to analyze this data, the user can chose out of a wide set of
mathematical and statistical functions, including linear algebra classes,
numerical algorithms such as integration and minimization, and various methods
for performing regression analysis (fitting). In particular, ROOT offers
packages for complex data modeling and fitting, as well as multivariate
classification based on machine learning techniques. A central piece in these
analysis tools are the histogram classes which provide binning of one- and
multi-dimensional data. Results can be saved in high-quality graphical formats
like Postscript and PDF or in bitmap formats like JPG or GIF. The result can
also be stored into ROOT macros that allow a full recreation and rework of the
graphics. Users typically create their analysis macros step by step, making use
of the interactive C++ interpreter CINT, while running over small data samples.
Once the development is finished, they can run these macros at full compiled
speed over large data sets, using on-the-fly compilation, or by creating a
stand-alone batch program. Finally, if processing farms are available, the user
can reduce the execution time of intrinsically parallel tasks - e.g. data
mining in HEP - by using PROOF, which will take care of optimally distributing
the work over the available resources in a transparent way
Tangos: the agile numerical galaxy organization system
We present Tangos, a Python framework and web interface for database-driven
analysis of numerical structure formation simulations. To understand the role
that such a tool can play, consider constructing a history for the absolute
magnitude of each galaxy within a simulation. The magnitudes must first be
calculated for all halos at all timesteps and then linked using a merger tree;
folding the required information into a final analysis can entail significant
effort. Tangos is a generic solution to this information organization problem,
aiming to free users from the details of data management. At the querying
stage, our example of gathering properties over history is reduced to a few
clicks or a simple, single-line Python command. The framework is highly
extensible; in particular, users are expected to define their own properties
which tangos will write into the database. A variety of parallelization options
are available and the raw simulation data can be read using existing libraries
such as pynbody or yt. Finally, tangos-based databases and analysis pipelines
can easily be shared with collaborators or the broader community to ensure
reproducibility. User documentation is provided separately.Comment: Clarified various points and further improved code performance;
accepted for publication in ApJS. Tutorials (including video) at
http://tiny.cc/tango
IAC user manual
The User Manual for the Integrated Analysis Capability (IAC) Level 1 system is presented. The IAC system currently supports the thermal, structures, controls and system dynamics technologies, and its development is influenced by the requirements for design/analysis of large space systems. The system has many features which make it applicable to general problems in engineering, and to management of data and software. Information includes basic IAC operation, executive commands, modules, solution paths, data organization and storage, IAC utilities, and module implementation
Event views and graph reductions for understanding system level C code
Concurrent processing, runtime bindings and an extensive use of aggregate data structures make system level C codes difficult to understand. We propose event views and graph reductions as techniques to facilitate program comprehension. Starting with some domain knowledge, a user can apply these techniques to quickly identify and analyze exactly those parts of the program that are relevant to a given concern. We have built a tool called CVision to demonstrate applicability of the proposed techniques. CVi-sion is an interactive tool that allows the user to: (a) quickly get to the relevant parts of the code, (b) graphically visualize relationships between program elements, (c) interactively apply different graph reductions to eliminate irrelevant relationships. Using these capabilities, the user can quickly distill a large body of code and extract meaningful views of runtime events that capture the user\u27s concern. The proposed program comprehension techniques are demonstrated through two case studies based on Linux and XINU operating systems
Software Infrastructure for Natural Language Processing
We classify and review current approaches to software infrastructure for
research, development and delivery of NLP systems. The task is motivated by a
discussion of current trends in the field of NLP and Language Engineering. We
describe a system called GATE (a General Architecture for Text Engineering)
that provides a software infrastructure on top of which heterogeneous NLP
processing modules may be evaluated and refined individually, or may be
combined into larger application systems. GATE aims to support both researchers
and developers working on component technologies (e.g. parsing, tagging,
morphological analysis) and those working on developing end-user applications
(e.g. information extraction, text summarisation, document generation, machine
translation, and second language learning). GATE promotes reuse of component
technology, permits specialisation and collaboration in large-scale projects,
and allows for the comparison and evaluation of alternative technologies. The
first release of GATE is now available - see
http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page
Design and Implementation of a Tracer Driver: Easy and Efficient Dynamic Analyses of Constraint Logic Programs
Tracers provide users with useful information about program executions. In
this article, we propose a ``tracer driver''. From a single tracer, it provides
a powerful front-end enabling multiple dynamic analysis tools to be easily
implemented, while limiting the overhead of the trace generation. The relevant
execution events are specified by flexible event patterns and a large variety
of trace data can be given either systematically or ``on demand''. The proposed
tracer driver has been designed in the context of constraint logic programming;
experiments have been made within GNU-Prolog. Execution views provided by
existing tools have been easily emulated with a negligible overhead.
Experimental measures show that the flexibility and power of the described
architecture lead to good performance. The tracer driver overhead is inversely
proportional to the average time between two traced events. Whereas the
principles of the tracer driver are independent of the traced programming
language, it is best suited for high-level languages, such as constraint logic
programming, where each traced execution event encompasses numerous low-level
execution steps. Furthermore, constraint logic programming is especially hard
to debug. The current environments do not provide all the useful dynamic
analysis tools. They can significantly benefit from our tracer driver which
enables dynamic analyses to be integrated at a very low cost.Comment: To appear in Theory and Practice of Logic Programming (TPLP),
Cambridge University Press. 30 pages
- …