642 research outputs found

    Evaluation of two interaction techniques for visualization of dynamic graphs

    Full text link
    Several techniques for visualization of dynamic graphs are based on different spatial arrangements of a temporal sequence of node-link diagrams. Many studies in the literature have investigated the importance of maintaining the user's mental map across this temporal sequence, but usually each layout is considered as a static graph drawing and the effect of user interaction is disregarded. We conducted a task-based controlled experiment to assess the effectiveness of two basic interaction techniques: the adjustment of the layout stability and the highlighting of adjacent nodes and edges. We found that generally both interaction techniques increase accuracy, sometimes at the cost of longer completion times, and that the highlighting outclasses the stability adjustment for many tasks except the most complex ones.Comment: Appears in the Proceedings of the 24th International Symposium on Graph Drawing and Network Visualization (GD 2016

    Psychometric precision in phenotype definition is a useful step in molecular genetic investigation of psychiatric disorders

    Get PDF
    Affective disorders are highly heritable, but few genetic risk variants have been consistently replicated in molecular genetic association studies. The common method of defining psychiatric phenotypes in molecular genetic research is either a summation of symptom scores or binary threshold score representing the risk of diagnosis. Psychometric latent variable methods can improve the precision of psychiatric phenotypes, especially when the data structure is not straightforward. Using data from the British 1946 birth cohort, we compared summary scores with psychometric modeling based on the General Health Questionnaire (GHQ-28) scale for affective symptoms in an association analysis of 27 candidate genes (249 single-nucleotide polymorphisms (SNPs)). The psychometric method utilized a bi-factor model that partitioned the phenotype variances into five orthogonal latent variable factors, in accordance with the multidimensional data structure of the GHQ-28 involving somatic, social, anxiety and depression domains. Results showed that, compared with the summation approach, the affective symptoms defined by the bi-factor psychometric model had a higher number of associated SNPs of larger effect sizes. These results suggest that psychometrically defined mental health phenotypes can reflect the dimensions of complex phenotypes better than summation scores, and therefore offer a useful approach in genetic association investigations

    PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.</p> <p>Results</p> <p>Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at <url>http://bioinf.sce.carleton.ca/PCISS</url>. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition.</p> <p>Conclusion</p> <p>Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.</p

    Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction

    Get PDF
    Background: We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.Results: Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested. Conclusion: This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination

    Genomewide Association Scan of Suicidal Thoughts and Behaviour in Major Depression

    Get PDF
    Background Suicidal behaviour can be conceptualised as a continuum from suicidal ideation, to suicidal attempts to completed suicide. In this study we identify genes contributing to suicidal behaviour in the depression study RADIANT. Methodology/Principal Findings A quantitative suicidality score was composed of two items from the SCAN interview. In addition, the 251 depression cases with a history of serious suicide attempts were classified to form a discrete trait. The quantitative trait was correlated with younger onset of depression and number of episodes of depression, but not with gender. A genome-wide association study of 2,023 depression cases was performed to identify genes that may contribute to suicidal behaviour. Two Munich depression studies were used as replication cohorts to test the most strongly associated SNPs. No SNP was associated at genome-wide significance level. For the quantitative trait, evidence of association was detected at GFRA1, a receptor for the neurotrophin GDRA (p = 2e-06). For the discrete trait of suicide attempt, SNPs in KIAA1244 and RGS18 attained p-values of <5e-6. None of these SNPs showed evidence for replication in the additional cohorts tested. Candidate gene analysis provided some support for a polymorphism in NTRK2, which was previously associated with suicidality. Conclusions/Significance This study provides a genome-wide assessment of possible genetic contribution to suicidal behaviour in depression but indicates a genetic architecture of multiple genes with small effects. Large cohorts will be required to dissect this further

    Meta-analytic approach to the accurate prediction of secreted virulence effectors in gram-negative bacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many pathogens use a type III secretion system to translocate virulence proteins (called effectors) in order to adapt to the host environment. To date, many prediction tools for effector identification have been developed. However, these tools are insufficiently accurate for producing a list of putative effectors that can be applied directly for labor-intensive experimental verification. This also suggests that important features of effectors have yet to be fully characterized.</p> <p>Results</p> <p>In this study, we have constructed an accurate approach to predicting secreted virulence effectors from Gram-negative bacteria. This consists of a support vector machine-based discriminant analysis followed by a simple criteria-based filtering. The accuracy was assessed by estimating the average number of true positives in the top-20 ranking in the genome-wide screening. In the validation, 10 sets of 20 training and 20 testing examples were randomly selected from 40 known effectors of <it>Salmonella enterica </it>serovar Typhimurium LT2. On average, the SVM portion of our system predicted 9.7 true positives from 20 testing examples in the top-20 of the prediction. Removal of the N-terminal instability, codon adaptation index and ProtParam indices decreased the score to 7.6, 8.9 and 7.9, respectively. These discrimination features suggested that the following characteristics of effectors had been uncovered: unstable N-terminus, non-optimal codon usage, hydrophilic, and less aliphathic. The secondary filtering process represented by coexpression analysis and domain distribution analysis further refined the average true positive counts to 12.3. We further confirmed that our system can correctly predict known effectors of <it>P. syringae </it>DC3000, strongly indicating its feasibility.</p> <p>Conclusions</p> <p>We have successfully developed an accurate prediction system for screening effectors on a genome-wide scale. We confirmed the accuracy of our system by external validation using known effectors of <it>Salmonella </it>and obtained the accurate list of putative effectors of the organism. The level of accuracy was sufficient to yield candidates for gene-directed experimental verification. Furthermore, new features of effectors were revealed: non-optimal codon usage and instability of the N-terminal region. From these findings, a new working hypothesis is proposed regarding mechanisms controlling the translocation of virulence effectors and determining the substrate specificity encoded in the secretion system.</p

    LipocalinPred: a SVM-based method for prediction of lipocalins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Functional annotation of rapidly amassing nucleotide and protein sequences presents a challenging task for modern bioinformatics. This is particularly true for protein families sharing extremely low sequence identity, as for lipocalins, a family of proteins with varied functions and great diversity at the sequence level, yet conserved structures.</p> <p>Results</p> <p>In the present study we propose a SVM based method for identification of lipocalin protein sequences. The SVM models were trained with the input features generated using amino acid, dipeptide and secondary structure compositions as well as PSSM profiles. The model derived using both PSSM and secondary structure emerged as the best model in the study. Apart from achieving a high prediction accuracy (>90% in leave-one-out), lipocalinpred correctly differentiates closely related fatty acid-binding proteins and triabins as non-lipocalins.</p> <p>Conclusion</p> <p>The method offers a promising approach as a lipocalin prediction tool, complementing PROSITE, Pfam and homology modelling methods.</p

    EL_PSSM-RT:DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation

    Get PDF
    Background: Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues. Results: In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues. Conclusions: We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community
    corecore