172 research outputs found

    Applications of Machine Learning in Cancer Prediction and Prognosis

    Get PDF
    Machine learning is a branch of artifi cial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artifi cial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression

    Exact Minimum Eigenvalue Distribution of an Entangled Random Pure State

    Full text link
    A recent conjecture regarding the average of the minimum eigenvalue of the reduced density matrix of a random complex state is proved. In fact, the full distribution of the minimum eigenvalue is derived exactly for both the cases of a random real and a random complex state. Our results are relevant to the entanglement properties of eigenvectors of the orthogonal and unitary ensembles of random matrix theory and quantum chaotic systems. They also provide a rare exactly solvable case for the distribution of the minimum of a set of N {\em strongly correlated} random variables for all values of N (and not just for large N).Comment: 13 pages, 2 figures included; typos corrected; to appear in J. Stat. Phy

    Multi-item sales forecasting with total and split exponential smoothing

    Full text link
    Efficient supply chain management relies on accurate demand forecasting. Typically,forecasts are required at frequent intervals for many items. Forecasting methods suitable for this application are those that can be relied upon to produce robust and accurate predictions when implemented within an automated procedure. Exponential smoothing methods are a common choice. In this empirical case study paper, we evaluate a recently proposed seasonal exponential smoothing method that has previously been considered only for forecasting daily supermarket sales. We term this method ‘total and split’ exponential smoothing, and apply it to monthly sales data from a publishing company. The resulting forecasts are compared against a variety of methods, including several available in the software currently used by the company. Our results show total and split exponential smoothing outperforming the other methods considered. The results were also impressive for a method that trims outliers and then applies simple exponential smoothing

    PlasMapper: A web server for drawing and auto-annotating plasmid maps

    Get PDF
    PlasMapper is a comprehensive web server that automatically generates and annotates high-quality circular plasmid maps. Taking only the plasmid/vector DNA sequence as input, PlasMapper uses sequence pattern matching and BLAST alignment to automatically identify and label common promoters, terminators, cloning sites, restriction sites, reporter genes, affinity tags, selectable marker genes, replication origins and open reading frames. PlasMapper then presents the identified features in textual form and as high resolution, multicolored graphical output. The appearance and contents of the output can be customized in numerous ways using several supplied options. Further, PlasMapper images can be rendered in both rasterized (PNG and JPG) and vector graphics (SVG)formats to accommodate a variety of userneeds or preferences. The images and textual output are of sufficient quality that they may be used directly in publications or presentations. The PlasMapper web server is freely accessible at http://wishart.biology. ualberta.ca/PlasMapper

    PROTEUS2: A Web Server for Comprehensive Protein Structure Prediction and Structure-Based Annotation

    Get PDF
    PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane b-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure based mapping, hidden Markov models, multicomponent neural nets and up-to-date databases of known secondary structure assignments, PROTEUS is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2 = 94%), membrane spanning helices (Q2 = 87%) and secondary structure (Q3 score of 81.3%). PROTEUS2’s homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL and 3D JigSaw (within 0.2 A ° RMSD). The average PROTEUS2 prediction takes »3 min per query sequence. The PROTEUS2 server along with source code for many of its modules is accessible a http:// wishart.biology.ualberta.ca/proteus2

    PolySearch: A web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites

    Get PDF
    A particular challenge in biomedical text mining is to find ways of handling ‘comprehensive’ or ‘associative’ queries such as ‘Find all genes associated with breast cancer’. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports `50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is ‘Given X, find all Y’s’ where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch’s performance has been assessed in tasks such as gene synonym identification, protein– protein interaction identification and disease gene identification using a variety of manually assembled ‘gold standard’ text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at http://wishart. biology.ualberta.ca/polysearc

    Genomic sequence and activity of KS10, a transposable phage of the Burkholderia cepacia complex

    Get PDF
    Background: The Burkholderia cepacia complex (BCC) is a versatile group of Gram negative organisms that can be found throughout the environment in sources such as soil, water, and plants. While BCC bacteria can be involved in beneficial interactions with plants, they are also considered opportunistic pathogens, specifically in patients with cystic fibrosis and chronic granulomatous disease. These organisms also exhibit resistance to many antibiotics, making conventional treatment often unsuccessful. KS10 was isolated as a prophage of B. cenocepacia K56-2, a clinically relevant strain of the BCC. Our objective was to sequence the genome of this phage and also determine if this prophage encoded any virulence determinants. Results: KS10 is a 37,635 base pairs (bp) transposable phage of the opportunistic pathogen Burkholderia cenocepacia. Genome sequence analysis and annotation of this phage reveals that KS10 shows the closest sequence homology to Mu and BcepMu. KS10 was found to be a prophage in three different strains of B. cenocepacia, including strains K56-2, J2315, and C5424, and seven tested clinical isolates of B. cenocepacia, but no other BCC species. A survey of 23 strains and 20 clinical isolates of the BCC revealed that KS10 is able to form plaques on lawns of B. ambifaria LMG 19467, B. cenocepacia PC184, and B. stabilis LMG 18870. Conclusion: KS10 is a novel phage with a genomic organization that differs from most phages in that its capsid genes are not aligned into one module but rather separated by approximately 11 kb, giving evidence of one or more prior genetic rearrangements. There were no potential virulence factors identified in KS10, though many hypothetical proteins were identified with no known function

    DrugBank: A comprehensive resource for in silico drug discovery and explorat

    Get PDF
    DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains .4100 drug entries including .800 FDA approved small molecule and biotech drugs as well as .3200 experimental drugs. Additionally, .14 000 protein or drug target sequences are linked to these drug entries. Each DrugCard entry contains .80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Many data fields are hyperlinked to other databases( KEGG,PubChem, ChEBI, PDB, Swiss-Prot and GenBank) and a variety of structure viewing applets. The database is fully searchable supporting extensive text, sequence, chemical structure and relational query searches. Potential applications of DrugBank include in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. DrugBank is available at http:// redpoll.pharmacy.ualberta.ca/drugbank/

    MiMeDB: the Human Microbial Metabolome Database

    Get PDF
    The Human Microbial Metabolome Database (MiMeDB) (https://mimedb.org) is a comprehensive, multi-omic, microbiome resource that connects: (i) microbes to microbial genomes; (ii) microbial genomes to microbial metabolites; (iii) microbial metabolites to the human exposome and (iv) all of these 'omes' to human health. MiMeDB was established to consolidate the growing body of data connecting the human microbiome and the chemicals it produces to both health and disease. MiMeDB contains detailed taxonomic, microbiological and body-site location data on most known human microbes (bacteria and fungi). This microbial data is linked to extensive genomic and proteomic sequence data that is closely coupled to colourful interactive chromosomal maps. The database also houses detailed information about all the known metabolites generated by these microbes, their structural, chemical and spectral properties, the reactions and enzymes responsible for these metabolites and the primary exposome sources (food, drug, cosmetic, pollutant, etc.) that ultimately lead to the observed microbial metabolites in humans. Additional, extensively referenced data about the known or presumptive health effects, measured biosample concentrations and human protein targets for these compounds is provided. All of this information is housed in richly annotated, highly interactive, visually pleasing database that has been designed to be easy to search, easy to browse and easy to navigate. Currently MiMeDB contains data on 626 health effects or bioactivities, 1904 microbes, 3112 references, 22 054 reactions, 24 254 metabolites or exposure chemicals, 648 861 MS and NMR spectra, 6.4 million genes and 7.6 billion DNA bases. We believe that MiMeDB represents the kind of integrated, multi-omic or systems biology database that is needed to enable comprehensive multi-omic integration.Analytical BioScience
    corecore