14 research outputs found

    Assessing 2D visual encoding of 3D spatial connectivity

    Get PDF
    Introduction: When visualizing complex data, the layout method chosen can greatly affect the ability to identify outliers, spot incorrect modeling assumptions, or recognize unexpected patterns. Additionally, visual layout can play a crucial role in communicating results to peers.Methods: In this paper, we compared the effectiveness of three visual layouts—the adjacency matrix, a half-matrix layout, and a circular layout—for visualizing spatial connectivity data, e.g., contacts derived from chromatin conformation capture experiments. To assess these visual layouts, we conducted a study comprising 150 participants from Amazon’s Mechanical Turk, as well as a second expert study comprising 30 biomedical research scientists.Results: The Mechanical Turk study found that the circular layout was the most accurate and intuitive, while the expert study found that the circular and half-matrix layouts were more accurate than the matrix layout.Discussion: We concluded that the circular layout may be a good default choice for visualizing smaller datasets with relatively few spatial contacts, while, for larger datasets, the half- matrix layout may be a better choice. Our results also demonstrated how crowdsourcing methods could be used to determine which visual layouts are best for addressing specific data challenges in bioinformatics

    Martini: using literature keywords to compare gene sets

    Get PDF
    Life scientists are often interested to compare two gene sets to gain insight into differences between two distinct, but related, phenotypes or conditions. Several tools have been developed for comparing gene sets, most of which find Gene Ontology (GO) terms that are significantly over-represented in one gene set. However, such tools often return GO terms that are too generic or too few to be informative. Here, we present Martini, an easy-to-use tool for comparing gene sets. Martini is based, not on GO, but on keywords extracted from Medline abstracts; Martini also supports a much wider range of species than comparable tools. To evaluate Martini we created a benchmark based on the human cell cycle, and we tested several comparable tools (CoPub, FatiGO, Marmite and ProfCom). Martini had the best benchmark performance, delivering a more detailed and accurate description of function. Martini also gave best or equal performance with three other datasets (related to Arabidopsis, melanoma and ovarian cancer), suggesting that Martini represents an advance in the automated comparison of gene sets. In agreement with previous studies, our results further suggest that literature-derived keywords are a richer source of gene-function information than GO annotations. Martini is freely available at http://martini.embl.de

    Single-cell transcriptomics reveals involution mimicry during the specification of the basal breast cancer subtype

    Get PDF
    Basal breast cancer is associated with younger age, early relapse, and a high mortality rate. Here, we use unbiased droplet-based single-cell RNA sequencing (RNA-seq) to elucidate the cellular basis of tumor progression during the specification of the basal breast cancer subtype from the luminal progenitor population in the MMTV-PyMT (mouse mammary tumor virus-polyoma middle tumor-antigen) mammary tumor model. We find that basal-like cancer cells resemble the alveolar lineage that is specified upon pregnancy and encompass the acquisition of an aberrant post-lactation developmental program of involution that triggers remodeling of the tumor microenvironment and metastatic dissemination. This involution mimicry is characterized by a highly interactive multicellular network, with involution cancer-associated fibroblasts playing a pivotal role in extracellular matrix remodeling and immunosuppression. Our results may partially explain the increased risk and poor prognosis of breast cancer associated with childbirth.</p

    The Dark Proteome Database

    No full text
    Abstract Background Recently we surveyed the dark-proteome, i.e., regions of proteins never observed by experimental structure determination and inaccessible to homology modelling. Surprisingly, we found that most of the dark proteome could not be accounted for by conventional explanations (e.g., intrinsic disorder, transmembrane domains, and compositional bias), and that nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. In this paper we will present the Dark Proteome Database (DPD) and associated web services that provide access to updated information about the dark proteome. Results We assembled DPD from several external web resources (primarily Aquaria and Swiss-Prot) and stored it in a relational database currently containing ~10 million entries and occupying ~2 GBytes of disk space. This database comprises two key tables: one giving information on the ‘darkness’ of each protein, and a second table that breaks each protein into dark and non-dark regions. In addition, a second version of the database is created using also information from the Protein Model Portal (PMP) to determine darkness. To provide access to DPD, a web server has been implemented giving access to all underlying data, as well as providing access to functional analyses derived from these data. Conclusions Availability of this database and its web service will help focus future structural and computational biology efforts to study the dark proteome, thus providing a basis for understanding a wide variety of biological functions that currently remain unknown. Availability and implementation DPD is available at http://darkproteome.ws . The complete database is also available upon request. Data use is permitted via the Creative Commons Attribution-NonCommercial International license ( http://creativecommons.org/licenses/by-nc/4.0/ )

    Additional file 1: of The Dark Proteome Database

    No full text
    An image file showing the validation tests used to check overall features and a range of individual proteins. (ZIP 1123 kb

    Comparative eye-tracking evaluation of scatterplots and parallel coordinates

    No full text
    We investigate task performance and reading characteristics for scatterplots (Cartesian coordinates) and parallel coordinates. In a controlled eye-tracking study, we asked 24 participants to assess the relative distance of points in multidimensional space, depending on the diagram type (parallel coordinates or a horizontal collection of scatterplots), the number of data dimensions (2, 4, 6, or 8), and the relative distance between points (15%, 20%, or 25%). For a given reference point and two target points, we instructed participants to choose the target point that was closer to the reference point in multidimensional space. We present a visual scanning model that describes different strategies to solve this retrieval task for both diagram types, and propose corresponding hypotheses that we test using task completion time, accuracy, and gaze positions as dependent variables. Our results show that scatterplots outperform parallel coordinates significantly in 2 dimensions, however, the task was solved more quickly and more accurately with parallel coordinates in 8 dimensions. The eye-tracking data further shows significant differences between Cartesian and parallel coordinates, as well as between different numbers of dimensions. For parallel coordinates, there is a clear trend toward shorter fixations and longer saccades with increasing number of dimensions. Using an area-of-interest (AOI) based approach, we identify different reading strategies for each diagram type: For parallel coordinates, the participants’ gaze frequently jumped back and forth between pairs of axes, while axes were rarely focused on when viewing Cartesian coordinates. We further found that participants’ attention is biased: toward the center of the whole plotfor parallel coordinates and skewed to the center/left side for Cartesian coordinates. We anticipate that these results may support the design of more effective visualizations for multidimensional data

    Comprehensive comparison of large-scale tissue expression datasets

    No full text
    For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface

    Versus—A tool for evaluating visualizations and image quality using a 2AFC methodology

    No full text
    Novel visualization methods and strategies are necessary to cope with the deluge of datasets present in any scientific field to make discoveries and find answers to previously unanswered questions. These methods and strategies should not only present scientific findings as images in a concise way but also need to be effective and expressive, which often remain untested. Here, we present Versus, a tool to enable easy image quality assessment and image ranking, utilizing a two-alternative forced choice methodology (2AFC) and an efficient ranking algorithm based on a binary search. The tool provides a systematic way of setting up evaluation experiments via the web without the necessity to install any additional software or require any programming skills. Furthermore, Versus can easily interface with crowdsourcing platforms, such as Amazon’s Mechanical Turk, or can be used as a stand-alone system to carry out evaluations with experts. We demonstrate the use of Versus by means of an image evaluation study, aiming to determine if hue, saturation, brightness, and texture are good indicators of uncertainty in three-dimensional protein structures. Drawing from the power of crowdsourcing, we argue that there is demand and also great potential for this tool to become a standard for simple and fast image evaluations, with the aim to test the effectiveness and expressiveness of scientific visualizations. Keywords: Evaluation, Visualization, Visual analytics, Image comparison, Crowdsourcing, Evaluation methods, 2AFC, Image evaluation, Tool, Visualization evaluatio
    corecore