6,590 research outputs found

    Information visualization for DNA microarray data analysis: A critical review

    Get PDF
    Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

    Compiler Support for Operator Overloading and Algorithmic Differentiation in C++

    Get PDF
    Multiphysics software needs derivatives for, e.g., solving a system of non-linear equations, conducting model verification, or sensitivity studies. In C++, algorithmic differentiation (AD), based on operator overloading (overloading), can be used to calculate derivatives up to machine precision. To that end, the built-in floating-point type is replaced by the user-defined AD type. It overloads all required operators, and calculates the original value and the corresponding derivative based on the chain rule of calculus. While changing the underlying type seems straightforward, several complications arise concerning software and performance engineering. This includes (1) fundamental language restrictions of C++ w.r.t. user-defined types, (2) type correctness of distributed computations with the Message Passing Interface (MPI) library, and (3) identification and mitigation of AD induced overheads. To handle these issues, AD experts may spend a significant amount of time to enhance a code with AD, verify the derivatives and ensure optimal application performance. Hence, in this thesis, we propose a modern compiler-based tooling approach to support and accelerate the AD-enhancement process of C++ target codes. In particular, we make contributions to three aspects of AD. The initial type change - While the change to the AD type in a target code is conceptually straightforward, the type change often leads to a multitude of compiler error messages. This is due to the different treatment of built-in floating-point types and user-defined types by the C++ language standard. Previously legal code constructs in the target code subsequently violate the language standard when the built-in floating-point type is replaced with a user-defined AD type. We identify and classify these problematic code constructs and their root cause is shown. Solutions by localized source transformation are proposed. To automate this rather mechanical process, we develop a static code analyser and source transformation tool, called OO-Lint, based on the Clang compiler framework. It flags instances of these problematic code constructs and applies source transformations to make the code compliant with the requirements of the language standard. To show the overall relevance of complications with user-defined types, OO-Lint is applied to several well-known scientific codes, some of which have already been AD enhanced by others. In all of these applications, except the ones manually treated for AD overloading, problematic code constructs are detected. Type correctness of MPI communication - MPI is the de-facto standard for programming high performance, distributed applications. At the same time, MPI has a complex interface whose usage can be error-prone. For instance, MPI derived data types require manual construction by specifying memory locations of the underlying data. Specifying wrong offsets can lead to subtle bugs that are hard to detect. In the context of AD, special libraries exist that handle the required derivative book-keeping by replacing the MPI communication calls with overloaded variants. However, on top of the AD type change, the MPI communication routines have to be changed manually. In addition, the AD type fundamentally changes memory layout assumptions as it has a different extent than the built-in types. Previously legal layout assumptions have, thus, to be reverified. As a remedy, to detect any type-related errors, we developed a memory sanitizer tool, called TypeART, based on the LLVM compiler framework and the MPI correctness checker MUST. It tracks all memory allocations relevant to MPI communication to allow for checking the underlying type and extent of the typeless memory buffer address passed to any MPI routine. The overhead induced by TypeART w.r.t. several target applications is manageable. AD domain-specific profiling - Applying AD in a black-box manner, without consideration of the target code structure, can have a significant impact on both runtime and memory consumption. An AD expert is usually required to apply further AD-related optimizations for the reduction of these induced overheads. Traditional profiling techniques are, however, insufficient as they do not reveal any AD domain-specific metrics. Of interest for AD code optimization are, e.g., specific code patterns, especially on a function level, that can be treated efficiently with AD. To that end, we developed a static profiling tool, called ProAD, based on the LLVM compiler framework. For each function, it generates the computational graph based on the static data flow of the floating-point variables. The framework supports pattern analysis on the computational graph to identify the optimal application of the chain rule. We show the potential of the optimal application of AD with two case studies. In both cases, significant runtime improvements can be achieved when the knowledge of the code structure, provided by our tool, is exploited. For instance, with a stencil code, a speedup factor of about 13 is achieved compared to a naive application of AD and a factor of 1.2 compared to hand-written derivative code

    In conversation with simulation: The application of numerical simulation to the design of structural nodal connections

    Get PDF
    The thesis explores methods for integration of structural analysis, design and production in a digital design environment. The somewhat ambiguous title implies the ambition to make such integration in relation to the explorative phase of the design process which is described by Donald Sch\uf6n as having a conversational character. A conversation between the designer and the representation by the means of the tool. The tool is in this context a simulation and instead of exploring the potential of automatic optimisation, the simulation is used for designer driven exploration. The aim of the thesis is to give an overview of how this type of integration is currently being approached and to contribute with new tools and methods in that pursuit. The motivation behind the work is to lower the threshold for the application of structural analysis in early-stage design, with an ambition of architectural qualities and resource efficiency in mind. An overview of the historical context is portrayed with broad brush strokes, followed by a more precise account of the mathematical and physical context, which is complemented by an attempt to describe how our tools and roles tend to interplay in the composition of the design process. Methods such as the finite element method, isogeometric analysis, smoothed particle hydrodynamics and peridynamics, including their related geometrical representations are introduced in relation to this context. A variety of production techniques are also discussed in relation to material mechanical properties for conventional building materials such as steel, concrete and wood.The method development is approached through the use of numerical and physical experiments which are applied for design of material-efficient structural components, with a particular design process perspective. The nodal connection is chosen as an application because it combines geometrical and structural complexity in an element that is of crucial importance for a holistic spatial setting, while often being produced in a material inefficient way, with poor attention to detail.The three articles that are included follow a trajectory from large to small, from the holistic to the particular. The first article is a description of the computational design work with the roof for the new international airport of Mexico City. The second article aims to address one of the challenges that were faced in that project with material inefficiency for nodal connections, with a critical perspective on optimisation. The final article presents an extension/modification for the peridynamics theory enabling variable particle sizes and an irregular particle distribution through the introduction of a concept called force flux density. The development is motivated by limitations found in the present theory through numerical experiments. The method enables simulation of phenomena such as brittle fracture, for which correlation with Griffith\u27s theory of fracture is shown. Further work includes an extension of the force flux method from 2D to 3D, including calibration of material a model for 3D printed steel. Other possibilities involve the exploration of how such a method can adapt to the various stages of the design process, where requirements of accuracy, speed and interactivity will vary

    Highly-cited papers in software engineering: The top 100

    Get PDF
    Context: According to the search reported in this paper, as of this writing (May 2015), a very large number of papers (more than 70,000) have been published in the area of Software Engineering (SE) since its inception in 1968. Citations are crucial in any research area to position the work and to build on the work of others. Identification and characterization of highly-cited papers are common and are regularly reported in various disciplines. Objective: The objective of this study is to identify the papers in the area of SE that have influenced others the most as measured by citation count. Studying highly-cited SE papers helps researchers to see the type of approaches and research methods presented and applied in such papers, so as to be able to learn from them to write higher quality papers which will likely receive high citations. Method: To achieve the above objective, we conducted a study, comprised of five research questions, to identify and classify the top-100 highly-cited SE papers in terms of two metrics: total number of citations and average annual number of citations. Results: By total number of citations, the top paper is "A metrics suite for object-oriented design", cited 1,817 times and published in 1994. By average annual number of citations, the top paper is "QoS-aware middleware for Web services composition", cited 154.2 times on average annually and published in 2004. Conclusion: It is concluded that it is important to identify the highly-cited SE papers and also to characterize the overall citation landscape in the SE field. We hope that this paper will encourage further discussions in the SE community towards further analysis and formal characterization of the highly-cited SE papers.Vahid Garousi was partially supported by several internal grants provided by the Hacettepe University. The authors would like to thank the anonymous reviewers for their insightful comments

    sEMG-based hand gesture recognition with deep learning

    Get PDF
    Hand gesture recognition based on surface electromyographic (sEMG) signals is a promising approach for the development of Human-Machine Interfaces (HMIs) with a natural control, such as intuitive robot interfaces or poly-articulated prostheses. However, real-world applications are limited by reliability problems due to motion artifacts, postural and temporal variability, and sensor re-positioning. This master thesis is the first application of deep learning on the Unibo-INAIL dataset, the first public sEMG dataset exploring the variability between subjects, sessions and arm postures, by collecting data over 8 sessions of each of 7 able-bodied subjects executing 6 hand gestures in 4 arm postures. In the most recent studies, the variability is addressed with training strategies based on training set composition, which improve inter-posture and inter-day generalization of classical (i.e. non-deep) machine learning classifiers, among which the RBF-kernel SVM yields the highest accuracy. The deep architecture realized in this work is a 1d-CNN implemented in Pytorch, inspired by a 2d-CNN reported to perform well on other public benchmark databases. On this 1d-CNN, various training strategies based on training set composition were implemented and tested. Multi-session training proves to yield higher inter-session validation accuracies than single-session training. Two-posture training proves to be the best postural training (proving the benefit of training on more than one posture), and yields 81.2% inter-posture test accuracy. Five-day training proves to be the best multi-day training, and yields 75.9% inter-day test accuracy. All results are close to the baseline. Moreover, the results of multi-day trainings highlight the phenomenon of user adaptation, indicating that training should also prioritize recent data. Though not better than the baseline, the achieved classification accuracies rightfully place the 1d-CNN among the candidates for further research

    Multi-Modal Interfaces for Sensemaking of Graph-Connected Datasets

    Get PDF
    The visualization of hypothesized evolutionary processes is often shown through phylogenetic trees. Given evolutionary data presented in one of several widely accepted formats, software exists to render these data into a tree diagram. However, software packages commonly in use by biologists today often do not provide means to dynamically adjust and customize these diagrams for studying new hypothetical relationships, and for illustration and publication purposes. Even where these options are available, there can be a lack of intuitiveness and ease-of-use. The goal of our research is, thus, to investigate more natural and effective means of sensemaking of the data with different user input modalities. To this end, we experimented with different input modalities, designing and running a series of prototype studies, ultimately focusing our attention on pen-and-touch. Through several iterations of feedback and revision provided with the help of biology experts and students, we developed a pen-and-touch phylogenetic tree browsing and editing application called PhyloPen. This application expands on the capabilities of existing software with visualization techniques such as overview+detail, linked data views, and new interaction and manipulation techniques using pen-and-touch. To determine its impact on phylogenetic tree sensemaking, we conducted a within-subject comparative summative study against the most comparable and commonly used state-of-the-art mouse-based software system, Mesquite. Conducted with biology majors at the University of Central Florida, each used both software systems on a set number of exercise tasks of the same type. Determining effectiveness by several dependent measures, the results show PhyloPen was significantly better in terms of usefulness, satisfaction, ease-of-learning, ease-of-use, and cognitive load and relatively the same in variation of completion time. These results support an interaction paradigm that is superior to classic mouse-based interaction, which could have the potential to be applied to other communities that employ graph-based representations of their problem domains

    Contributions to the cornerstones of interaction in visualization: strengthening the interaction of visualization

    Get PDF
    Visualization has become an accepted means for data exploration and analysis. Although interaction is an important component of visualization approaches, current visualization research pays less attention to interaction than to aspects of the graphical representation. Therefore, the goal of this work is to strengthen the interaction side of visualization. To this end, we establish a unified view on interaction in visualization. This unified view covers four cornerstones: the data, the tasks, the technology, and the human.Visualisierung hat sich zu einem unverzichtbaren Werkzeug fĂŒr die Exploration und Analyse von Daten entwickelt. Obwohl Interaktion ein wichtiger Bestandteil solcher Werkzeuge ist, wird der Interaktion in der aktuellen Visualisierungsforschung weniger Aufmerksamkeit gewidmet als Aspekten der graphischen ReprĂ€sentation. Daher ist es das Ziel dieser Arbeit, die Interaktion im Bereich der Visualisierung zu stĂ€rken. Hierzu wird eine einheitliche Sicht auf Interaktion in der Visualisierung entwickelt
    • 

    corecore