31 research outputs found

    An analysis of single amino acid repeats as use case for application specific background models

    Get PDF
    Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation

    Bayesian modelling of shared gene function

    No full text
    Motivation: Biological assays are often carried out on tissues that contain many cell lineages and active pathways. Microarray data produced using such material therefore reflect superimpositions of biological processes. Analysing such data for shared gene function by means of well-matched assays may help to provide a better focus on specific cell types and processes. The identification of genes that behave similarly in different biological systems also has the potential to reveal new insights into preserved biological mechanisms. Results: In this article, we propose a hierarchical Bayesian model allowing integrated analysis of several microarray data sets for shared gene function. Each gene is associated with an indicator variable that selects whether binary class labels are predicted from expression values or by a classifier which is common to all genes. Each indicator selects the component models for all involved data sets simultaneously. A quantitative measure of shared gene function is obtained by inferring a probability measure over these indicators. Through experiments on synthetic data, we illustrate potential advantages of this Bayesian approach over a standard method. A shared analysis of matched microarray experiments covering (a) a cycle of mouse mammary gland development and (b) the process of in vitro endothelial cell apoptosis is proposed as a biological gold standard. Several useful sanity checks are introduced during data analysis, and we confirm the prior biological belief that shared apoptosis events occur in both systems. We conclude that a Bayesian analysis for shared gene function has the potential to reveal new biological insights, unobtainable by other means

    Cognitive tasks for driving a brain computer interfacing system: a pilot study

    No full text
    Different cognitive tasks were investigated for use with a brain-computer interface (BCI). The main aim was to evaluate which two of several candidate tasks lead to patterns of electroencephalographic (EEG) activity that could be differentiated most reliably and, therefore, produce the highest communication rate. An optimal signal processing method was also sought to enhance differentiation of EEG profiles across tasks. In ten normal subjects (five male), aged 29-54 years, EEG activity was recorded from four channels during cognitive tasks grouped in pairs, and performed alternately. Four imagery tasks were: spatial navigation around a familiar environment; auditory imagery of a familiar tune; and right and left motor imagery of opening and closing the hand. Signal processing methodology included autoregressive (AR) modeling and classification based on logistic regression and a nonlinear generative classifier. The highest communication rate was found using the navigation and auditory imagery tasks. In terms of classification performance and, hence, possible communication rate, these results were significantly better (p<0.05) than those obtained with the classical pairing of motor tasks involving imaginary movements of the left and right hands. In terms of EEG data analysis, a nonlinear classification model provided more robust results than a linear model (p/spl Lt/0.01), and a lower AR model order than those used in previous work was found to be effective. These findings have implications for establishing appropriate methods to operate BCI systems, particularly for disabled people who may experience difficulty with motor tasks, even motor imagery
    corecore