2,305 research outputs found
Recommended from our members
EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.
The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns
A survey of DNA motif finding algorithms
Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc
NATSA: A Near-Data Processing Accelerator for Time Series Analysis
Time series analysis is a key technique for extracting and predicting events
in domains as diverse as epidemiology, genomics, neuroscience, environmental
sciences, economics, and more. Matrix profile, the state-of-the-art algorithm
to perform time series analysis, computes the most similar subsequence for a
given query subsequence within a sliced time series. Matrix profile has low
arithmetic intensity, but it typically operates on large amounts of time series
data. In current computing systems, this data needs to be moved between the
off-chip memory units and the on-chip computation units for performing matrix
profile. This causes a major performance bottleneck as data movement is
extremely costly in terms of both execution time and energy.
In this work, we present NATSA, the first Near-Data Processing accelerator
for time series analysis. The key idea is to exploit modern 3D-stacked High
Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile
computation near memory, where time series data resides. NATSA provides three
key benefits: 1) quickly computing the matrix profile for a wide range of
applications by building specialized energy-efficient floating-point arithmetic
processing units close to HBM, 2) improving the energy efficiency and execution
time by reducing the need for data movement over slow and energy-hungry buses
between the computation units and the memory units, and 3) analyzing time
series data at scale by exploiting low-latency, high-bandwidth, and
energy-efficient memory access provided by HBM. Our experimental evaluation
shows that NATSA improves performance by up to 14.2x (9.9x on average) and
reduces energy by up to 27.2x (19.4x on average), over the state-of-the-art
multi-core implementation. NATSA also improves performance by 6.3x and reduces
energy by 10.2x over a general-purpose NDP platform with 64 in-order cores.Comment: To appear in the 38th IEEE International Conference on Computer
Design (ICCD 2020
Big networks : a survey
A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called a big network. A big networks is generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further. © 2020 Elsevier Inc
Architecture and dynamics of the jasmonic acid gene regulatory network
Jasmonic acid (JA) is a critical hormonal regulator of plant growth and defense. To advance our understanding of the architecture and dynamic regulation of the JA gene regulatory network, we performed a high-resolution RNA-seq time series of methyl JA-treated Arabidopsis thaliana at 15 time points over a 16-h period. Computational analysis showed that methyl JA (MeJA) induces a burst of transcriptional activity, generating diverse expression patterns over time that partition into distinct sectors of the JA response targeting specific biological processes. The presence of transcription factor (TF) DNA binding motifs correlated with specific TF activity during temporal MeJA-induced transcriptional reprogramming. Insight into the underlying dynamic transcriptional regulation mechanisms was captured in a chronological model of the JA gene regulatory network. Several TFs, including MYB59 and bHLH27, were uncovered as early network components with a role in pathogen and insect resistance. Analysis of subnetworks surrounding the TFs ORA47, RAP2.6L, MYB59, and ANAC055, using transcriptome profiling of overexpressors and mutants, provided insights into their regulatory role in defined modules of the JA network. Collectively, our work illuminates the complexity of the JA gene regulatory network, pinpoints and validates previously unknown regulators, and provides a valuable resource for functional studies on JA signaling components in plant defense and development
Learning and mining from personal digital archives
Given the explosion of new sensing technologies, data storage has become significantly cheaper and consequently, people increasingly rely on wearable devices to create personal digital archives. Lifelogging is the act of recording aspects of life in digital format for a variety of purposes such as aiding human memory, analysing human lifestyle and diet monitoring. In this dissertation we are concerned with Visual Lifelogging, a form of lifelogging based on the passive capture of photographs by a wearable camera. Cameras, such as Microsoft's SenseCam can record up to 4,000 images per day as well as logging data from several incorporated sensors. Considering the volume, complexity and heterogeneous nature of such data collections, it is a signifcant challenge to interpret and extract knowledge for the practical use of lifeloggers and others.
In this dissertation, time series analysis methods have been used to identify and extract useful information from temporal lifelogging images data, without benefit of prior knowledge. We focus, in particular, on three fundamental topics: noise reduction, structure and characterization of the raw data; the detection of multi-scale patterns; and the mining of important, previously unknown repeated patterns in the time series of lifelog image data.
Firstly, we show that Detrended Fluctuation Analysis (DFA) highlights the
feature of very high correlation in lifelogging image collections. Secondly, we show that study of equal-time Cross-Correlation Matrix demonstrates atypical or non-stationary characteristics in these images. Next, noise reduction in the Cross-Correlation Matrix is addressed by Random Matrix Theory (RMT) before Wavelet multiscaling is used to characterize the `most important' or `unusual' events through analysis of the associated dynamics of the eigenspectrum. A motif discovery technique is explored for detection of recurring and recognizable episodes of an individual's image data. Finally, we apply these motif discovery techniques to two known lifelog data collections, All I Have Seen (AIHS) and NTCIR-12 Lifelog, in order to examine multivariate recurrent patterns of multiple-lifelogging users
Identifying gene regulatory networks common to multiple plant stress responses
Stress responses in plants can be defined as a change that affects the homeostasis of pathways,
resulting in a phenotype that may or may not be visible to the human eye, affecting the fitness
of the plant. Crosstalk is believed to be the shared components of pathways of networks, and
is widespread in plants, as shown by examples of crosstalk between transcriptional regulation
pathways, and hormone signalling.
Crosstalk between stress responses is believed to exist, particularly crosstalk within the responses
to biotic stress, and within the responses to abiotic stress. Certain hormone pathways are known
to be involved in the crosstalk between the responses to both biotic and abiotic stresses, and can
confer immunity or tolerance of Arabidopsis thaliana to these stresses. Transcriptional regulation
has also been identified as an important factor in controlling tolerance and resistance to stresses.
In this thesis, networks of regulation mediating the response tomultiple stresses are studied. Firstly,
co-regulation was predicted for genes differentially expressed in two or more stresses by development
of a novel multi-clustering approach, Wigwams Identifies Genes Working Across Multiple
Stresses (Wigwams). This approach finds groups of genes whose expression is correlated within
stresses, but also identifies a strong statistical link between subsets of stresses. Wigwams identifies
the known co-expression of genes encoding enzymes of metabolic and flavonoid biosynthesis
pathways, and predicts novels clusters of co-expressed genes. By hypothesising that by being coexpressed
could also infer that the genes are co-regulated, promoter motif analysis and modelling
provides information for potential upstream regulators.
The context-free regulation of groups of co-expressed genes, or potential regulons, was explored
using models generated by modelling techniques, in order to generate a quantitative model of
transcriptional regulation during the response to B. cinerea, P. syringae pv. tomato DC3000 and
senescence. This model was subsequently validated and extended by experimental techniques,
using Yeast 1-Hybrid to investigate the protein-DNA interactions, and also microarrays. Analysis
of mutants and plants overexpressing a predicted regulator, Rap2.6L, by gene expression analysis
identified a number of potential regulon members as downstream targets.
Rap2.6L was identified as an indirect regulator of the transcription factor members of three potential
regulons co-expressed in the stresses B. cinerea, P. syringae pv. tomato DC3000 and long
day senescence, allowing the confirmation of a predicted gene regulatory network operating in
multiple stress responses
- …