11 research outputs found

    Interactive Network Exploration with Orange

    Get PDF
    Network analysis is one of the most widely used techniques in many areas of modern science. Most existing tools for that purpose are limited to drawing networks and computing their basic general characteristics. The user is not able to interactively and graphically manipulate the networks, select and explore subgraphs using other statistical and data mining techniques, add and plot various other data within the graph, and so on. In this paper we present a tool that addresses these challenges, an add-on for exploration of networks within the general component-based environment Orange

    FragViz: visualization of fragmented networks

    Get PDF
    BACKGROUND Researchers in systems biology use network visualization to summarize the results of their analysis. Such networks often include unconnected components, which popular network alignment algorithms place arbitrarily with respect to the rest of the network. This can lead to misinterpretations due to the proximity of otherwise unrelated elements. RESULTS We propose a new network layout optimization technique called FragViz which can incorporate additional information on relations between unconnected network components. It uses a two-step approach by first arranging the nodes within each of the components and then placing the components so that their proximity in the network corresponds to their relatedness. In the experimental study with the leukemia gene networks we demonstrate that FragViz can obtain network layouts which are more interpretable and hold additional information that could not be exposed using classical network layout optimization algorithms. CONCLUSIONS Network visualization relies on computational techniques for proper placement of objects under consideration. These algorithms need to be fast so that they can be incorporated in responsive interfaces required by the explorative data analysis environments. Our layout optimization technique FragViz meets these requirements and specifically addresses the visualization of fragmented networks, for which standard algorithms do not consider similarities between unconnected components. The experiments confirmed the claims on speed and accuracy of the proposed solution

    Odkrivanje zakonitosti iz mrež s pomočjo vizualizacije

    No full text

    Visualization and analysis of the space of prediction models

    Get PDF
    Data mining – a search for interesting patterns in the data – typically creates a number of different models. These models need to be simple enough to be graspable by the human expert. A tree consisting of hundreds of nodes or a scatter plot projecting dozens of variables into a two-dimensional plane may offer great classification accuracy or class separation, yet they may be impossible to interpret. But simple models cannot describe the complex data which may contain many interesting relations. The solution is to create a large number of simple models, where each of them offers insight into a small part of the problem domain, but together they present a complete picture. The problem, however, is the presentation of the big picture. While a computer can infer thousands of models, the human expert is incapable of reviewing them without assistance. We argue that a set of partial views can represent complex data. Views may include linear projections, each involving at most a couple of variables and showing a single, particular, and simplified perspective or relation. Similarly, good predictive models can be build using ensembles of simple models, such as random forests of small trees, each covering a part of the problem domain. These approaches lack techniques for manual exploration. Is it possible to select a limited number of visualizations which will provide a complete picture? Can we navigate through the random forest and observe the common properties of models in each region? We propose a method for creating maps of classification models, which are presented to the user to interactively explore the model space. The proposed technique can, besides organizing predictive models, rank some types of visualizations. We extend existing methods – that rank projections based on quality – to consider projection diversity, and show that our method yields more information when viewing the same number of projections. We describe a model-map-based technique for the visualization and exploration of random forest, a prediction model that is assembled of random prediction trees. Those are normally small (up to 10 vertices), so each covers only a small part of the space of the decision problem. The random forest predicts remarkably well, however it is prohibitively difficult to explain or visualize–in comparison to a single prediction tree. Our method assists an expert with the random forest analysis, for example, in clustering similar trees together to emphasize the diverse ones. In addition to interactive exploration of a random forest, the model map can explain predictions of different classes. The final ingredient of the technique is a versatile interactive tool for exploration of maps and a new network layout technique, particularly suitable for handling the visualization of the model map networks. Most existing tools for network analysis are limited to drawing networks and computing their basic general characteristics. With this tools, it is impossible for the user to interactively and graphically manipulate the networks, select and explore sub-graphs using other statistical and data mining techniques, add and plot various other data within the graph, and so on. We developed tools that address these challenges, widgets and modules for exploration of networks and model maps within the general component-based environment Orange. We propose a network layout optimization algorithm which is designed to visualize fragmented networks: FragViz. Networks of prediction models are usually fragmented. They consist of unconnected components which popular network alignment algorithms place arbitrarily with respect to the rest of the network. This can lead to misinterpretations due to the proximity of otherwise unrelated elements. FragViz incorporates additional information on relations between unconnected network components. It uses a two-step approach by first arranging the nodes within each of the components and then placing the components so that their proximity in the network corresponds to their relatedness. In the experimental study we demonstrate that FragViz can obtain network layouts which are more interpretable and hold additional information that could not be exposed using classical network layout optimization algorithms

    RNAnorm: RNA-seq data normalization in Python

    No full text
    <ul> <li>Add support for Python 3.12</li> </ul>If you use this software, please cite it as below

    RNAnorm: RNA-seq data normalization in Python

    No full text
    Added Provide an example of gene lengths file in the README.rst Fixed Fix fit methods to match sklearn signature Fix LibrarySize to always return np.array in private functions Fix get_norm_factors to follow the set_output config Remove leftovers of as_frame parameterIf you use this software, please cite it as below

    Orange: data mining toolbox in Python

    Get PDF
    Orange is a machine learning and data mining suite for data analysis through Python scripting and visual programming. Here we report on the scripting part, which features interactive data analysis and component-based assembly of data mining procedures. In the selection and design of components, we focus on the flexibility of their reuse: our principal intention is to let the user write simple and clear scripts in Python, which build upon C++ implementations of computationally-intensive tasks. Orange is intended both for experienced users and programmers, as well as for students of data mining

    Polygenic analysis and targeted improvement of the complex trait of high acetic acid tolerance in the yeast Saccharomyces cerevisiae

    No full text
    BACKGROUND: Acetic acid is one of the major inhibitors in lignocellulose hydrolysates used for the production of second-generation bioethanol. Although several genes have been identified in laboratory yeast strains that are required for tolerance to acetic acid, the genetic basis of the high acetic acid tolerance naturally present in some Saccharomyces cerevisiae strains is unknown. Identification of its polygenic basis may allow improvement of acetic acid tolerance in yeast strains used for second-generation bioethanol production by precise genome editing, minimizing the risk of negatively affecting other industrially important properties of the yeast. RESULTS: Haploid segregants of a strain with unusually high acetic acid tolerance and a reference industrial strain were used as superior and inferior parent strain, respectively. After crossing of the parent strains, QTL mapping using the SNP variant frequency determined by pooled-segregant whole-genome sequence analysis revealed two major QTLs. All F1 segregants were then submitted to multiple rounds of random inbreeding and the superior F7 segregants were submitted to the same analysis, further refined by sequencing of individual segregants and bioinformatics analysis taking into account the relative acetic acid tolerance of the segregants. This resulted in disappearance in the QTL mapping with the F7 segregants of a major F1 QTL, in which we identified HAA1, a known regulator of high acetic acid tolerance, as a true causative allele. Novel genes determining high acetic acid tolerance, GLO1, DOT5, CUP2, and a previously identified component, VMA7, were identified as causative alleles in the second major F1 QTL and in three newly appearing F7 QTLs, respectively. The superior HAA1 allele contained a unique single point mutation that significantly improved acetic acid tolerance under industrially relevant conditions when inserted into an industrial yeast strain for second-generation bioethanol production. CONCLUSIONS: This work reveals the polygenic basis of high acetic acid tolerance in S. cerevisiae in unprecedented detail. It also shows for the first time that a single strain can harbor different sets of causative genes able to establish the same polygenic trait. The superior alleles identified can be used successfully for improvement of acetic acid tolerance in industrial yeast strains
    corecore