733 research outputs found

    An efficient algorithm for the extended (l,d)-motif problem with unknown number of binding sites

    Get PDF
    Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motif-discovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif's length is usually unknown in practice, Styczynsfd et al. introduced the Extended (l,d)-Motif Problem (EMP), where the motif's length is not an input parameter. Unfortunately, the algorithm given by Styczynski et al. to solve EMP can take an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. This paper makes two main contributions. First, we eliminate another input parameter from EMP: the minimum number of binding sites in the DNA sequences. Fewer input parameters not only reduces the burden of the user, but also may give more realistic/robust results since restrictions on length or on the number of binding sites make little sense when the best motif may not be the longest nor have the largest number of binding sites. Second, we develop an efficient algorithm to solve our redefined problem. The algorithm is also a fast solution for EMP (without any sacrifice to accuracy) making EMP practical. © 2005 IEEE.published_or_final_versio

    Identifying projected clusters from gene expression profiles

    Get PDF
    In microarray gene expression data, clusters may hide in subspaces. Traditional clustering algorithms that make use of similarity measurements in the full input space may fail to detect the clusters. In recent years a number of algorithms have been proposed to identify this kind of projected clusters, but many of them rely on some critical parameters whose proper values are hard for users to determine. In this paper a new algorithm that dynamically adjusts its internal thresholds is proposed. It has a low dependency on user parameters while allowing users to input some domain knowledge should they be available. Experimental results show that the algorithm is capable of identifying some interesting projected clusters from real microarray data.published_or_final_versio

    A Novel Approach to Mine for Genetic Markers via Comparing Class Frequency Distributions of Maximal Repeats Extracted from Tagged Whole Genomic Sequences

    Get PDF
    The cost to extract one new biomarker within genomic sequences is very huge. This chapter adopts a scalable approach, developed previously and based on MapReduce programming model, to extract maximal repeats from a huge amount of tagged whole genomic sequences and meanwhile computing the similarities of sequences within the same class and the differences among the other classes, where the types of classes are derived from those tags. The work can be extended to any kind of genomic sequential data if one can have the organisms into several disjoint classes according to one specific phenotype, and then collect the whole genomes of those organisms. Those patterns, for example, biomarkers, if exist in only one class, with distinctive class frequency distribution can provide hints to biologists to dig out the relationship between that phenotype and those genomic patterns. It is expected that this approach may provide a novel direction in the research of biomarker extraction via whole genomic sequence comparison in the era of post genomics

    A Holter of low complexity design using mixed signal processor

    Get PDF
    [[abstract]]A low power, portable, and easily implemented Holter recorder is necessary for patients or researchers of electrocardiogram (ECG). Such a Holter recorder with off-the-shelf components is realized with mixed signal processor (MSP) in this paper. To decrease the complexity of analog circuits and the interference of 60 Hz noise from power line, we use the MSP to implement a finite impulse response (FIR) filter which is equiripple design. We also integrate the ringed buffer for the input samples and the symmetrical characteristic of the FIR filter for efficiently computing convolution. The experimental results show that the output ECG signal with the PQRST feature is easy to be distinguished. This ECG signal is recorded for 24 hr using a SD card. Furthermore, the ECG signal is transmitted with a smartphone via Bluetooth to decrease the burden of the Holier recorder.[[conferencetype]]國際[[conferencedate]]20051019~20051021[[booktype]]紙本[[conferencelocation]]Minneapolis, MN, US

    A unifying framework for seed sensitivity and its application to subset seeds

    Get PDF
    We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem -- a set of target alignments, an associated probability distribution, and a seed model -- that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds

    Comparative Analysis of Computationally Accelerated NGS Alignment

    Get PDF
    The Smith-Waterman algorithm is the basis of most current sequence alignment technology, which can be used to identify similarities between sequences for cancer detection and treatment because it provides researchers with potential targets for early diagnosis and personalized treatment. The growing number of DNA and RNA sequences available to analyze necessitates faster alignment processes than are possible with current iterations of the Smith-Waterman (S-W) algorithm. This project aimed to identify the most effective and efficient methods for accelerating the S-W algorithm by investigating recent advances in sequence alignment. Out of a total of 22 articles considered in this project, 17 articles had to be excluded from the study due to lack of standardization of data reporting. Only one study by Chen et al. obtained in this project contained enough information to compare accuracy and alignment speed. When accuracy was excluded from the criteria, five studies contained enough information to rank their efficiency. The study conducted by Rucci et al. was the fastest at 268.83 Giga Cell Updates Per Second (GCUPS), and the method by Pérez-Serrano et al. came close at 229.93 GCUPS while testing larger sequences. It was determined that reporting standards in this field are not sufficient, and the study by Chen et al. should set a benchmark for future reporting

    Estimation of brain dynamics under visuomotor task using functional connectivity analysis based on graph theory

    Get PDF
    Network studies of brain connectivity have demonstrated that the highly connected area, or hub, is a vital feature of human functional and structural brain organization. Hubs identify which region plays an important role in cognitive/sensorimotor tasks. In addition, a complex visuomotor learning skill causes specific changes of neuronal activation across brain regions. Accordingly, this study utilizes the hub as one of the features to map the visuomotor learning tasks and their dynamic functional connectivity (dFC). The electroencephalogram (EEG) data recorded under three different behavior conditions were investigated: motion only (MO), vision only (VO), and tracking (Tra) conditions. Here, we used the phase locking value (PLV) with a sliding window (50 ms) to calculate the dFC at four distinct frequency bands: 8-12 Hz (alpha), 18-22 Hz (low beta), 26-30 Hz (high beta) and 38-42 Hz (gamma), and the eigenvector centrality to evaluate the hub identification. The Gaussian Mixture Model (GMM) was applied to investigate the dFC patterns. The results showed that the dFC patterns with the hub feature represent the characteristic of neuronal activities under visuomotor coordination

    Inferring causal relations from multivariate time series : a fast method for large-scale gene expression data

    Get PDF
    Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data, which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an efficient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation
    corecore