46 research outputs found

    PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Results of phylogenetic analysis are often visualized as phylogenetic trees. Such a tree can typically only include up to a few hundred sequences. When more than a few thousand sequences are to be included, analyzing the phylogenetic relationships among them becomes a challenging task. The recent frequent outbreaks of influenza A viruses have resulted in the rapid accumulation of corresponding genome sequences. Currently, there are more than 7500 influenza A virus genomes in the database. There are no efficient ways of representing this huge data set as a whole, thus preventing a further understanding of the diversity of the influenza A virus genome.</p> <p>Results</p> <p>Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set. The use of PhyloMap on influenza A virus genome sequences reveals the phylogenetic relationships of the internal genes that cannot be seen when only a subset of sequences are analyzed.</p> <p>Conclusions</p> <p>The application of PhyloMap to influenza A virus genome data shows that it is a robust algorithm for analyzing large sequence data sets. It utilizes the entire data set, minimizes bias, and provides intuitive visualization. PhyloMap is implemented in JAVA, and the source code is freely available at <url>http://www.biochem.uni-luebeck.de/public/software/phylomap.html</url></p

    A Novel Validation Algorithm Allows for Automated Cell Tracking and the Extraction of Biologically Meaningful Parameters

    Get PDF
    Automated microscopy is currently the only method to non-invasively and label-free observe complex multi-cellular processes, such as cell migration, cell cycle, and cell differentiation. Extracting biological information from a time-series of micrographs requires each cell to be recognized and followed through sequential microscopic snapshots. Although recent attempts to automatize this process resulted in ever improving cell detection rates, manual identification of identical cells is still the most reliable technique. However, its tedious and subjective nature prevented tracking from becoming a standardized tool for the investigation of cell cultures. Here, we present a novel method to accomplish automated cell tracking with a reliability comparable to manual tracking. Previously, automated cell tracking could not rival the reliability of manual tracking because, in contrast to the human way of solving this task, none of the algorithms had an independent quality control mechanism; they missed validation. Thus, instead of trying to improve the cell detection or tracking rates, we proceeded from the idea to automatically inspect the tracking results and accept only those of high trustworthiness, while rejecting all other results. This validation algorithm works independently of the quality of cell detection and tracking through a systematic search for tracking errors. It is based only on very general assumptions about the spatiotemporal contiguity of cell paths. While traditional tracking often aims to yield genealogic information about single cells, the natural outcome of a validated cell tracking algorithm turns out to be a set of complete, but often unconnected cell paths, i.e. records of cells from mitosis to mitosis. This is a consequence of the fact that the validation algorithm takes complete paths as the unit of rejection/acceptance. The resulting set of complete paths can be used to automatically extract important biological parameters with high reliability and statistical significance. These include the distribution of life/cycle times and cell areas, as well as of the symmetry of cell divisions and motion analyses. The new algorithm thus allows for the quantification and parameterization of cell culture with unprecedented accuracy. To evaluate our validation algorithm, two large reference data sets were manually created. These data sets comprise more than 320,000 unstained adult pancreatic stem cells from rat, including 2592 mitotic events. The reference data sets specify every cell position and shape, and assign each cell to the correct branch of its genealogic tree. We provide these reference data sets for free use by others as a benchmark for the future improvement of automated tracking methods

    Dimensional Complexity of the Resting Brain in Healthy Aging, Using a Normalized MPSE

    Get PDF
    Spontaneous fluctuations of resting-state functional connectivity have been studied in many ways, but grasping the complexity of brain activity has been difficult. Dimensional complexity measures, which are based on Eigenvalue (EV) spectrum analyses (e.g., Ω entropy) have been successfully applied to EEG data, but have not been fully evaluated on functional MRI recordings, because only through the recent introduction of fast multiband fMRI sequences, feasable temporal resolutions are reached. Combining the Eigenspectrum normalization of Ω entropy and the scalable architecture of the so called Multivariate Principal Subspace Entropy (MPSE) leads to a new complexity measure, namely normalized MPSE (nMPSE). It allows functional brain complexity analyses at varying levels of EV energy, independent from global shifts in data variance. Especially the restriction of the EV spectrum to the first dimensions, carrying the most prominent data variance, can act as a filter to reveal the most discriminant factors of dependent variables. Here we look at the effects of healthy aging on the dimensional complexity of brain activity. We employ a large open access dataset, providing a great number of high quality fast multiband recordings. Using nMPSE on whole brain, regional, network and searchlight approaches, we were able to find many age related changes, i.e., in sensorimotoric and right inferior frontal brain regions. Our results implicate that research on dimensional complexity of functional MRI recordings promises to be a unique resource for understanding brain function and for the extraction of biomarkers

    Learning to Predict Ischemic Stroke Growth on Acute CT Perfusion Data by Interpolating Low-Dimensional Shape Representations

    Get PDF
    Cerebrovascular diseases, in particular ischemic stroke, are one of the leading global causes of death in developed countries. Perfusion CT and/or MRI are ideal imaging modalities for characterizing affected ischemic tissue in the hyper-acute phase. If infarct growth over time could be predicted accurately from functional acute imaging protocols together with advanced machine-learning based image analysis, the expected benefits of treatment options could be better weighted against potential risks. The quality of the outcome prediction by convolutional neural networks (CNNs) is so far limited, which indicates that even highly complex deep learning algorithms are not fully capable of directly learning physiological principles of tissue salvation through weak supervision due to a lack of data (e.g., follow-up segmentation). In this work, we address these current shortcomings by explicitly taking into account clinical expert knowledge in the form of segmentations of the core and its surrounding penumbra in acute CT perfusion images (CTP), that are trained to be represented in a low-dimensional non-linear shape space. Employing a multi-scale CNN (U-Net) together with a convolutional auto-encoder, we predict lesion tissue probabilities for new patients. The predictions are physiologically constrained to a shape embedding that encodes a continuous progression between the core and penumbra extents. The comparison to a simple interpolation in the original voxel space and an unconstrained CNN shows that the use of such a shape space can be advantageous to predict time-dependent growth of stroke lesions on acute perfusion data, yielding a Dice score overlap of 0.46 for predictions from expert segmentations of core and penumbra. Our interpolation method models monotone infarct growth robustly on a linear time scale to automatically predict clinically plausible tissue outcomes that may serve as a basis for more clinical measures such as the expected lesion volume increase and can support the decision making on treatment options and triage

    A Knowledge-Model for AI-Driven Tutoring Systems

    No full text
    A powerful new complement to traditional synchronous teaching is emerging: intelligent tutoring systems. The narrative: A learner interacts with a digital agent. The agent reviews, selects and proposes individually tailored educational resources and processes – i.e. a meaningful succession of instructions, tests or groupwork. The aim is to make personal tutored learning the new norm in higher education – especially in groups with heterogeneous educational backgrounds. The challenge: Today, there are no suitable data that allow computer-agents to learn how to take reasonable decisions. Available educational resources cannot be addressed by a computer logic because up to now they have not been tagged with machine-readable information at all or these have not been provided uniformly. And what’s worse: there are no agreed conceptual and structured models of what we understand by „learning“, how this model-to-be could be implemented in a computer algorithm and what those explicit decisions are that a tutoring system could take. So, a prerequisite for any future digital agent is to have a structured, computer-accessible model of “knowledge”. This model is required to qualify and quantify individual learning, to allow the association of resources as learning objects and to provide a base to operationalize learning for AI-based agents. We will suggest a conceptual model of “knowledge” based on a variant of Bloom’s taxonomy, transfer this concept of cognitive learning objectives into an ontology and describe an implementation into a web-based database application. The approach has been employed to model the basics of abstract knowledge in engineering mechanics at university-level. This paper addresses interdisciplinary aspects ranging from a teaching methodology, the taxonomy of knowledge in cognitive science, over a database-application for ontologies to an implementation of this model in a Grails service. We aim to deliver this web-based ontology, its user-interfaces and APIs into a research network that qualifies AI-based agents for competence-based tutoring

    On the efficiency of the genetic code after frameshift mutations

    No full text
    Statistical and biochemical studies of the standard genetic code (SGC) have found evidence that the impact of mistranslations is minimized in a way that erroneous codes are either synonymous or code for an amino acid with similar polarity as the originally coded amino acid. It could be quantified that the SGC is optimized to protect this specific chemical property as good as possible. In recent work, it has been speculated that the multilevel optimization of the genetic code stands in the wider context of overlapping codes. This work tries to follow the systematic approach on mistranslations and to extend those analyses to the general effect of frameshift mutations on the polarity conservation of amino acids. We generated one million random codes and compared their average polarity change over all triplets and the whole set of possible frameshift mutations. While the natural code—just as for the point mutations—appears to be competitively robust against frameshift mutations as well, we found that both optimizations appear to be independent of each other. For both, better codes can be found, but it becomes significantly more difficult to find candidates that optimize all of these features—just like the SGC does. We conclude that the SGC is not only very efficient in minimizing the consequences of mistranslations, but rather optimized in amino acid polarity conservation for all three effects of code alteration, namely translational errors, point and frameshift mutations. In other words, our result demonstrates that the SGC appears to be much more than just “one in a million”

    Fast and Easy Computation of Approximate Smallest Enclosing Balls

    No full text
    The incremental Badoiu-Clarkson algorithm finds the smallest ball enclosing n points in d dimensions with at least O(1 / √ t) precision, after t iteration steps. The extremely simple incremental step of the algorithm makes it very attractive both for theoreticians and practitioners. A simplified proof for this convergence is given. This proof allows to show that the precision increases, in fact, even as O(u/t) with the number of iteration steps. Computer experiments, but not yet a proof, suggest that the u, which depends only on the data instance, is actually bounded by min { √ 2d, √ 2n}. If it holds, then the algorithm finds the smallest enclosing ball with ɛ precision in at most O(nd √ dm/ɛ) time, with dm = min{d, n}. Key words: computational geometry, smallest enclosing ball, pattern recognition 1
    corecore