356 research outputs found

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Predicting mental imagery based BCI performance from personality, cognitive profile and neurophysiological patterns

    Get PDF
    Mental-Imagery based Brain-Computer Interfaces (MI-BCIs) allow their users to send commands to a computer using their brain-activity alone (typically measured by ElectroEncephaloGraphy— EEG), which is processed while they perform specific mental tasks. While very promising, MI-BCIs remain barely used outside laboratories because of the difficulty encountered by users to control them. Indeed, although some users obtain good control performances after training, a substantial proportion remains unable to reliably control an MI-BCI. This huge variability in user-performance led the community to look for predictors of MI-BCI control ability. However, these predictors were only explored for motor-imagery based BCIs, and mostly for a single training session per subject. In this study, 18 participants were instructed to learn to control an EEG-based MI-BCI by performing 3 MI-tasks, 2 of which were non-motor tasks, across 6 training sessions, on 6 different days. Relationships between the participants’ BCI control performances and their personality, cognitive profile and neurophysiological markers were explored. While no relevant relationships with neurophysiological markers were found, strong correlations between MI-BCI performances and mental-rotation scores (reflecting spatial abilities) were revealed. Also, a predictive model of MI-BCI performance based on psychometric questionnaire scores was proposed. A leave-one-subject-out cross validation process revealed the stability and reliability of this model: it enabled to predict participants’ performance with a mean error of less than 3 points. This study determined how users’ profiles impact their MI-BCI control ability and thus clears the way for designing novel MI-BCI training protocols, adapted to the profile of each user

    SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Get PDF
    MOTIVATION: Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. RESULTS: The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. CONCLUSION: We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at

    A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

    Get PDF
    BACKGROUND: As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. METHODS: In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. RESULTS: We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM) algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. CONCLUSION: Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets

    Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual <it>C. elegans </it>genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (<it>i.e</it>., editing) is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours.</p> <p>Results</p> <p>In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions) and train a support vector machine (SVM) classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at <url>http://starrynite.sourceforge.net</url>.</p> <p>Conclusions</p> <p>We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.</p

    High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

    Get PDF
    Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel -mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding

    Physicochemical property distributions for accurate and rapid pairwise protein homology detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.</p> <p>Results</p> <p>We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost.</p> <p>Conclusions</p> <p>A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

    Population‐based cohort study of outcomes following cholecystectomy for benign gallbladder diseases

    Get PDF
    Background The aim was to describe the management of benign gallbladder disease and identify characteristics associated with all‐cause 30‐day readmissions and complications in a prospective population‐based cohort. Methods Data were collected on consecutive patients undergoing cholecystectomy in acute UK and Irish hospitals between 1 March and 1 May 2014. Potential explanatory variables influencing all‐cause 30‐day readmissions and complications were analysed by means of multilevel, multivariable logistic regression modelling using a two‐level hierarchical structure with patients (level 1) nested within hospitals (level 2). Results Data were collected on 8909 patients undergoing cholecystectomy from 167 hospitals. Some 1451 cholecystectomies (16·3 per cent) were performed as an emergency, 4165 (46·8 per cent) as elective operations, and 3293 patients (37·0 per cent) had had at least one previous emergency admission, but had surgery on a delayed basis. The readmission and complication rates at 30 days were 7·1 per cent (633 of 8909) and 10·8 per cent (962 of 8909) respectively. Both readmissions and complications were independently associated with increasing ASA fitness grade, duration of surgery, and increasing numbers of emergency admissions with gallbladder disease before cholecystectomy. No identifiable hospital characteristics were linked to readmissions and complications. Conclusion Readmissions and complications following cholecystectomy are common and associated with patient and disease characteristics
    corecore