53 research outputs found

    Semi-automated map generation for concept gaming

    Get PDF
    Conventional learning games have often limited flexibility to address individual needs of a learner. The concept gaming approach provides a frame for handling conceptual structures that are defined by a concept map. A single concept map can be used to create many alternative games and these can be chosen so that personal learning goals can be taken well into account. However, the workload of creating new concept maps and sharing them effectively seems to easily hinder adoption of concept gaming. We now propose a new semi-automated map generation method for concept gaming. Due to fast increase in the open access knowledge available in the Web, the articles of the Wikipedia encyclopedia were chosen to serve as a source for concept map generation. Based on a given entry name the proposed method produces hierarchical concept maps that can be freely explored and modified. Variants of this approach could be successfully implemented in the wide range of educational tasks. In addition, ideas for further development of concept gaming are proposed.Peer reviewe

    Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity

    Get PDF
    Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts

    Transcriptome annotation using tandem SAGE tags

    Get PDF
    Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation

    On compression of parse trees

    No full text
    We consider methods for compressing parse trees, especially techniques based on statistical modeling. We regard a sequence of productions corresponding to a suffix of the path from the root of a tree to a node as the context of a node. The contexts are augmented with branching information of the nodes. By applying the text compression algorithm PPM on such contexts we achieve good compression results. We compare experimentally the PPM approach with other methods.

    A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science 57,131145

    No full text
    Abstract. An approximation algorithm for the shortest common superstring problem is developed, based on the Knuth-Morris-Pratt string matching procedure and on the greedy heuristics for finding longest Hamiltonian paths in weighted graphs. Given a set R of strings, the algorithm constructs a common superstring for R in O(mn) steps where m is the number of strings in R and n is the total length of these strings. The performance of the algorithm is analyzed in terms of the compression in the common superstrings constructed, that is, in terms of n -k where k is the length of the obtained superstring. We show that (n -k) ≥ (n -k min ) / 2 where k min is the length of a shortest common superstring. Hence the compression achieved by the algorithm is at least half of the maximum compression. It also seems that the lengths always satisfy k ≤ 2 . k min but proving this remains open
    • …
    corecore