2,305 research outputs found

    A survey of DNA motif finding algorithms

    Get PDF
    Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc

    NATSA: A Near-Data Processing Accelerator for Time Series Analysis

    Get PDF
    Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, and more. Matrix profile, the state-of-the-art algorithm to perform time series analysis, computes the most similar subsequence for a given query subsequence within a sliced time series. Matrix profile has low arithmetic intensity, but it typically operates on large amounts of time series data. In current computing systems, this data needs to be moved between the off-chip memory units and the on-chip computation units for performing matrix profile. This causes a major performance bottleneck as data movement is extremely costly in terms of both execution time and energy. In this work, we present NATSA, the first Near-Data Processing accelerator for time series analysis. The key idea is to exploit modern 3D-stacked High Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile computation near memory, where time series data resides. NATSA provides three key benefits: 1) quickly computing the matrix profile for a wide range of applications by building specialized energy-efficient floating-point arithmetic processing units close to HBM, 2) improving the energy efficiency and execution time by reducing the need for data movement over slow and energy-hungry buses between the computation units and the memory units, and 3) analyzing time series data at scale by exploiting low-latency, high-bandwidth, and energy-efficient memory access provided by HBM. Our experimental evaluation shows that NATSA improves performance by up to 14.2x (9.9x on average) and reduces energy by up to 27.2x (19.4x on average), over the state-of-the-art multi-core implementation. NATSA also improves performance by 6.3x and reduces energy by 10.2x over a general-purpose NDP platform with 64 in-order cores.Comment: To appear in the 38th IEEE International Conference on Computer Design (ICCD 2020

    Big networks : a survey

    Get PDF
    A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called a big network. A big networks is generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further. © 2020 Elsevier Inc

    Architecture and dynamics of the jasmonic acid gene regulatory network

    Get PDF
    Jasmonic acid (JA) is a critical hormonal regulator of plant growth and defense. To advance our understanding of the architecture and dynamic regulation of the JA gene regulatory network, we performed a high-resolution RNA-seq time series of methyl JA-treated Arabidopsis thaliana at 15 time points over a 16-h period. Computational analysis showed that methyl JA (MeJA) induces a burst of transcriptional activity, generating diverse expression patterns over time that partition into distinct sectors of the JA response targeting specific biological processes. The presence of transcription factor (TF) DNA binding motifs correlated with specific TF activity during temporal MeJA-induced transcriptional reprogramming. Insight into the underlying dynamic transcriptional regulation mechanisms was captured in a chronological model of the JA gene regulatory network. Several TFs, including MYB59 and bHLH27, were uncovered as early network components with a role in pathogen and insect resistance. Analysis of subnetworks surrounding the TFs ORA47, RAP2.6L, MYB59, and ANAC055, using transcriptome profiling of overexpressors and mutants, provided insights into their regulatory role in defined modules of the JA network. Collectively, our work illuminates the complexity of the JA gene regulatory network, pinpoints and validates previously unknown regulators, and provides a valuable resource for functional studies on JA signaling components in plant defense and development

    Learning and mining from personal digital archives

    Get PDF
    Given the explosion of new sensing technologies, data storage has become significantly cheaper and consequently, people increasingly rely on wearable devices to create personal digital archives. Lifelogging is the act of recording aspects of life in digital format for a variety of purposes such as aiding human memory, analysing human lifestyle and diet monitoring. In this dissertation we are concerned with Visual Lifelogging, a form of lifelogging based on the passive capture of photographs by a wearable camera. Cameras, such as Microsoft's SenseCam can record up to 4,000 images per day as well as logging data from several incorporated sensors. Considering the volume, complexity and heterogeneous nature of such data collections, it is a signifcant challenge to interpret and extract knowledge for the practical use of lifeloggers and others. In this dissertation, time series analysis methods have been used to identify and extract useful information from temporal lifelogging images data, without benefit of prior knowledge. We focus, in particular, on three fundamental topics: noise reduction, structure and characterization of the raw data; the detection of multi-scale patterns; and the mining of important, previously unknown repeated patterns in the time series of lifelog image data. Firstly, we show that Detrended Fluctuation Analysis (DFA) highlights the feature of very high correlation in lifelogging image collections. Secondly, we show that study of equal-time Cross-Correlation Matrix demonstrates atypical or non-stationary characteristics in these images. Next, noise reduction in the Cross-Correlation Matrix is addressed by Random Matrix Theory (RMT) before Wavelet multiscaling is used to characterize the `most important' or `unusual' events through analysis of the associated dynamics of the eigenspectrum. A motif discovery technique is explored for detection of recurring and recognizable episodes of an individual's image data. Finally, we apply these motif discovery techniques to two known lifelog data collections, All I Have Seen (AIHS) and NTCIR-12 Lifelog, in order to examine multivariate recurrent patterns of multiple-lifelogging users

    Identifying gene regulatory networks common to multiple plant stress responses

    Get PDF
    Stress responses in plants can be defined as a change that affects the homeostasis of pathways, resulting in a phenotype that may or may not be visible to the human eye, affecting the fitness of the plant. Crosstalk is believed to be the shared components of pathways of networks, and is widespread in plants, as shown by examples of crosstalk between transcriptional regulation pathways, and hormone signalling. Crosstalk between stress responses is believed to exist, particularly crosstalk within the responses to biotic stress, and within the responses to abiotic stress. Certain hormone pathways are known to be involved in the crosstalk between the responses to both biotic and abiotic stresses, and can confer immunity or tolerance of Arabidopsis thaliana to these stresses. Transcriptional regulation has also been identified as an important factor in controlling tolerance and resistance to stresses. In this thesis, networks of regulation mediating the response tomultiple stresses are studied. Firstly, co-regulation was predicted for genes differentially expressed in two or more stresses by development of a novel multi-clustering approach, Wigwams Identifies Genes Working Across Multiple Stresses (Wigwams). This approach finds groups of genes whose expression is correlated within stresses, but also identifies a strong statistical link between subsets of stresses. Wigwams identifies the known co-expression of genes encoding enzymes of metabolic and flavonoid biosynthesis pathways, and predicts novels clusters of co-expressed genes. By hypothesising that by being coexpressed could also infer that the genes are co-regulated, promoter motif analysis and modelling provides information for potential upstream regulators. The context-free regulation of groups of co-expressed genes, or potential regulons, was explored using models generated by modelling techniques, in order to generate a quantitative model of transcriptional regulation during the response to B. cinerea, P. syringae pv. tomato DC3000 and senescence. This model was subsequently validated and extended by experimental techniques, using Yeast 1-Hybrid to investigate the protein-DNA interactions, and also microarrays. Analysis of mutants and plants overexpressing a predicted regulator, Rap2.6L, by gene expression analysis identified a number of potential regulon members as downstream targets. Rap2.6L was identified as an indirect regulator of the transcription factor members of three potential regulons co-expressed in the stresses B. cinerea, P. syringae pv. tomato DC3000 and long day senescence, allowing the confirmation of a predicted gene regulatory network operating in multiple stress responses
    corecore