733 research outputs found
An efficient algorithm for the extended (l,d)-motif problem with unknown number of binding sites
Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motif-discovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif's length is usually unknown in practice, Styczynsfd et al. introduced the Extended (l,d)-Motif Problem (EMP), where the motif's length is not an input parameter. Unfortunately, the algorithm given by Styczynski et al. to solve EMP can take an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. This paper makes two main contributions. First, we eliminate another input parameter from EMP: the minimum number of binding sites in the DNA sequences. Fewer input parameters not only reduces the burden of the user, but also may give more realistic/robust results since restrictions on length or on the number of binding sites make little sense when the best motif may not be the longest nor have the largest number of binding sites. Second, we develop an efficient algorithm to solve our redefined problem. The algorithm is also a fast solution for EMP (without any sacrifice to accuracy) making EMP practical. © 2005 IEEE.published_or_final_versio
Identifying projected clusters from gene expression profiles
In microarray gene expression data, clusters may hide in subspaces. Traditional clustering algorithms that make use of similarity measurements in the full input space may fail to detect the clusters. In recent years a number of algorithms have been proposed to identify this kind of projected clusters, but many of them rely on some critical parameters whose proper values are hard for users to determine. In this paper a new algorithm that dynamically adjusts its internal thresholds is proposed. It has a low dependency on user parameters while allowing users to input some domain knowledge should they be available. Experimental results show that the algorithm is capable of identifying some interesting projected clusters from real microarray data.published_or_final_versio
A Novel Approach to Mine for Genetic Markers via Comparing Class Frequency Distributions of Maximal Repeats Extracted from Tagged Whole Genomic Sequences
The cost to extract one new biomarker within genomic sequences is very huge. This chapter adopts a scalable approach, developed previously and based on MapReduce programming model, to extract maximal repeats from a huge amount of tagged whole genomic sequences and meanwhile computing the similarities of sequences within the same class and the differences among the other classes, where the types of classes are derived from those tags. The work can be extended to any kind of genomic sequential data if one can have the organisms into several disjoint classes according to one specific phenotype, and then collect the whole genomes of those organisms. Those patterns, for example, biomarkers, if exist in only one class, with distinctive class frequency distribution can provide hints to biologists to dig out the relationship between that phenotype and those genomic patterns. It is expected that this approach may provide a novel direction in the research of biomarker extraction via whole genomic sequence comparison in the era of post genomics
A Holter of low complexity design using mixed signal processor
[[abstract]]A low power, portable, and easily implemented Holter recorder is necessary for patients or researchers of electrocardiogram (ECG). Such a Holter recorder with off-the-shelf components is realized with mixed signal processor (MSP) in this paper. To decrease the complexity of analog circuits and the interference of 60 Hz noise from power line, we use the MSP to implement a finite impulse response (FIR) filter which is equiripple design. We also integrate the ringed buffer for the input samples and the symmetrical characteristic of the FIR filter for efficiently computing convolution. The experimental results show that the output ECG signal with the PQRST feature is easy to be distinguished. This ECG signal is recorded for 24 hr using a SD card. Furthermore, the ECG signal is transmitted with a smartphone via Bluetooth to decrease the burden of the Holier recorder.[[conferencetype]]國際[[conferencedate]]20051019~20051021[[booktype]]紙本[[conferencelocation]]Minneapolis, MN, US
A unifying framework for seed sensitivity and its application to subset seeds
We propose a general approach to compute the seed sensitivity, that can be
applied to different definitions of seeds. It treats separately three
components of the seed sensitivity problem -- a set of target alignments, an
associated probability distribution, and a seed model -- that are specified by
distinct finite automata. The approach is then applied to a new concept of
subset seeds for which we propose an efficient automaton construction.
Experimental results confirm that sensitive subset seeds can be efficiently
designed using our approach, and can then be used in similarity search
producing better results than ordinary spaced seeds
Comparative Analysis of Computationally Accelerated NGS Alignment
The Smith-Waterman algorithm is the basis of most current sequence alignment technology, which can be used to identify similarities between sequences for cancer detection and treatment because it provides researchers with potential targets for early diagnosis and personalized treatment. The growing number of DNA and RNA sequences available to analyze necessitates faster alignment processes than are possible with current iterations of the Smith-Waterman (S-W) algorithm. This project aimed to identify the most effective and efficient methods for accelerating the S-W algorithm by investigating recent advances in sequence alignment. Out of a total of 22 articles considered in this project, 17 articles had to be excluded from the study due to lack of standardization of data reporting. Only one study by Chen et al. obtained in this project contained enough information to compare accuracy and alignment speed. When accuracy was excluded from the criteria, five studies contained enough information to rank their efficiency. The study conducted by Rucci et al. was the fastest at 268.83 Giga Cell Updates Per Second (GCUPS), and the method by Pérez-Serrano et al. came close at 229.93 GCUPS while testing larger sequences. It was determined that reporting standards in this field are not sufficient, and the study by Chen et al. should set a benchmark for future reporting
Recommended from our members
Fast Computation of the Fitness Function for Protein Folding Prediction in a 2D Hydrophobic-Hydrophilic Model
Protein Folding Prediction (PFP) is essentially an energy minimization problem formalised by the definition of a fitness function. Several PFP models have been proposed including the Hydrophobic-Hydrophilic (HP) model, which is widely used as a test-bed for evaluating new algorithms. The calculation of the fitness is the major computational task in determining the native conformation of a protein in the HP model and this paper presents a new efficient search algorithm (ESA) for deriving the fitness value requiring only O(n) complexity in contrast to the full search approach, which takes O(n2). The improved efficiency of ESA is achieved by exploiting some intrinsic properties of the HP model, with a resulting reduction of more than 50% in the overall time complexity when compared with the previously reported Caching Approach, with the added benefit that the additional space complexity is linear instead of quadratic
Estimation of brain dynamics under visuomotor task using functional connectivity analysis based on graph theory
Network studies of brain connectivity have demonstrated that the highly connected area, or hub, is a vital feature of human functional and structural brain organization. Hubs identify which region plays an important role in cognitive/sensorimotor tasks. In addition, a complex visuomotor learning skill causes specific changes of neuronal activation across brain regions. Accordingly, this study utilizes the hub as one of the features to map the visuomotor learning tasks and their dynamic functional connectivity (dFC). The electroencephalogram (EEG) data recorded under three different behavior conditions were investigated: motion only (MO), vision only (VO), and tracking (Tra) conditions. Here, we used the phase locking value (PLV) with a sliding window (50 ms) to calculate the dFC at four distinct frequency bands: 8-12 Hz (alpha), 18-22 Hz (low beta), 26-30 Hz (high beta) and 38-42 Hz (gamma), and the eigenvector centrality to evaluate the hub identification. The Gaussian Mixture Model (GMM) was applied to investigate the dFC patterns. The results showed that the dFC patterns with the hub feature represent the characteristic of neuronal activities under visuomotor coordination
Inferring causal relations from multivariate time series : a fast method for large-scale gene expression data
Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data, which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an efficient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation
- …