37 research outputs found

    DeepVenn -- a web application for the creation of area-proportional Venn diagrams using the deep learning framework Tensorflow.js

    Full text link
    Motivation: The Venn diagram is one of the most popular methods to visualize the overlap and differences between data sets. It is especially useful when it is are 'area-proportional'; i.e. the sizes of the circles and the overlaps are proportional to the sizes of the data sets. There are some tools available that can generate area-proportional Venn Diagrams, but most of them are limited to two or three circles, and others are not available as a web application or accept only numbers and not lists of IDs as input. Some existing solutions also have limited accuracy because of outdated algorithms to calculate the optimal placement of the circles. The latest machine learning and deep learning frameworks can offer a solution to this problem. Results: The DeepVenn web application can create area-proportional Venn diagrams of up to ten sets. Because of an algorithm implemented with the deep learning framework Tensorflow.js, DeepVenn automatically finds the optimal solution in which the overlap between the circles corresponds to the sizes of the overlap as much as possible. The only required input is two to ten lists of IDs. Optional parameters include the main title, the subtitle, the set titles and colours of the circles and the background. The user can choose to display absolute numbers or percentages in the final diagram. The image can be saved as a PNG file by right-clicking on it and choosing "Save image as". The right side of the interface also shows the numbers and contents of all intersections. Availability: DeepVenn is available at https://www.deepvenn.com. Contact: [email protected]: 2 pages, 1 figur

    BioVenn – a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams

    Get PDF
    Contains fulltext : 70137.pdf ( ) (Open Access)BACKGROUND: In many genomics projects, numerous lists containing biological identifiers are produced. Often it is useful to see the overlap between different lists, enabling researchers to quickly observe similarities and differences between the data sets they are analyzing. One of the most popular methods to visualize the overlap and differences between data sets is the Venn diagram: a diagram consisting of two or more circles in which each circle corresponds to a data set, and the overlap between the circles corresponds to the overlap between the data sets. Venn diagrams are especially useful when they are 'area-proportional' i.e. the sizes of the circles and the overlaps correspond to the sizes of the data sets. Currently there are no programs available that can create area-proportional Venn diagrams connected to a wide range of biological databases. RESULTS: We designed a web application named BioVenn to summarize the overlap between two or three lists of identifiers, using area-proportional Venn diagrams. The user only needs to input these lists of identifiers in the textboxes and push the submit button. Parameters like colors and text size can be adjusted easily through the web interface. The position of the text can be adjusted by 'drag-and-drop' principle. The output Venn diagram can be shown as an SVG or PNG image embedded in the web application, or as a standalone SVG or PNG image. The latter option is useful for batch queries. Besides the Venn diagram, BioVenn outputs lists of identifiers for each of the resulting subsets. If an identifier is recognized as belonging to one of the supported biological databases, the output is linked to that database. Finally, BioVenn can map Affymetrix and EntrezGene identifiers to Ensembl genes. CONCLUSION: BioVenn is an easy-to-use web application to generate area-proportional Venn diagrams from lists of biological identifiers. It supports a wide range of identifiers from the most used biological databases currently available. Its implementation on the World Wide Web makes it available for use on any computer with internet connection, independent of operating system and without the need to install programs locally. BioVenn is freely accessible at http://www.cmbi.ru.nl/cdd/biovenn/

    Variable allele frequency threshold

    Get PDF
    The present invention relates to monitoring a patient's response to therapy. In order to improve the monitoring of a patient's response to therapy, a method is provided to set a plurality of allele frequency thresholds to accounting for variations among tumours and patients. As the multiple allele frequency thresholds take into account differences between genes, single-nucleotide polymorphisms (SNPs), and/or patients, the multiple allele frequency thresholds may provide significant value to improve personalized therapy selection, disease surveillance, and monitoring to improve patient outcomes

    Benchmarking ortholog identification methods using functional genomics data

    Get PDF
    BACKGROUND: The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods. RESULTS: To measure the similarity in function of proteins from different species we used functional genomics data, such as expression data and protein interaction data. We tested several of the most popular ortholog identification methods. In general, we observed a sensitivity/selectivity trade-off: the functional similarity scores per orthologous pair of sequences become higher when the number of proteins included in the ortholog groups decreases. CONCLUSION: By combining the sensitivity and the selectivity into an overall score, we show that the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins

    Microalgae and Phototrophic Purple Bacteria for Nutrient Recovery from Agri-Industrial Effluents; Influences on Plant Growth, Rhizosphere Bacteria, and Putative C & N Cycling Genes

    Get PDF
    Microalgae (MA) and purple phototrophic bacteria (PPB) have the ability to remove and recover nutrients from digestate (anaerobic digestion effluent) and pre-settled pig manure that can be utilized as a bio-fertilizer. The objective of this study was to compare the effect of biologically recovered nutrients from MA and PPB in relation to plant growth and soil biological processes involved in nitrogen & carbon cycling

    The ReIMAGINE multimodal warehouse: using artificial intelligence for accurate risk stratification of prostate cancer

    Get PDF
    Introduction. Prostate cancer (PCa) is the most frequent cancer diagnosis in men worldwide. Our ability to identify those men whose cancer will decrease their lifespan and/or quality of life remains poor. The ReIMAGINE Consortium has been established to improve PCa diagnosis. Materials and methods. MRI will likely become the future cornerstone of the risk-stratification process for men at risk of early prostate cancer. We will, for the first time, be able to combine the underlying molecular changes in PCa with the state-of-the-art imaging. ReIMAGINE Screening invites men for MRI and PSA evaluation. ReIMAGINE Risk includes men at risk of prostate cancer based on MRI, and includes biomarker testing. Results. Baseline clinical information, genomics, blood, urine, fresh prostate tissue samples, digital pathology and radiomics data will be analysed. Data will be de-identified, stored with correlated mpMRI disease endotypes and linked with long term follow-up outcomes in an instance of the Philips Clinical Data Lake, consisting of cloud-based software. The ReIMAGINE platform includes application programming interfaces and a user interface that allows users to browse data, select cohorts, manage users and access rights, query data, and more. Connection to analytics tools such as Python allows statistical and stratification method pipelines to run profiling regression analyses. Discussion. The ReIMAGINE Multimodal Warehouse comprises a unique data source for PCa research, to improve risk stratification for PCa and inform clinical practice. The de-identified dataset characterized by clinical, imaging, genomics and digital pathology PCa patient phenotypes will be a valuable resource for the scientific and medical community

    PhyloPat: phylogenetic pattern analysis of eukaryotic genes

    Get PDF
    BACKGROUND: Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION: PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION: PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release

    Testing statistical significance scores of sequence comparison methods with structure similarity

    Get PDF
    BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons
    corecore