Search CORE

2,305 research outputs found

Recommended from our members

EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.

Author: Ge Xinzhou
Kwon Soo Bin
Li Jingyi Jessica
Li Wei Vivian
Xie Lingjue
Zhang Haowen
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns

eScholarship - University of California

A survey of DNA motif finding algorithms

Author: Dai Ho-Kwok
Das Modan K
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc

Springer - Publisher Connector

PubMed Central

The University of Arizona

SHAREOK repository

NATSA: A Near-Data Processing Accelerator for Time Series Analysis

Author: Alser Mohammed
Fernandez Ivan
Giannoula Christina
Gutiérrez Eladio
Gómez-Luna Juan
Mutlu Onur
Plata Oscar
Quislant Ricardo
Publication venue
Publication date: 01/01/2020
Field of study

Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, and more. Matrix profile, the state-of-the-art algorithm to perform time series analysis, computes the most similar subsequence for a given query subsequence within a sliced time series. Matrix profile has low arithmetic intensity, but it typically operates on large amounts of time series data. In current computing systems, this data needs to be moved between the off-chip memory units and the on-chip computation units for performing matrix profile. This causes a major performance bottleneck as data movement is extremely costly in terms of both execution time and energy. In this work, we present NATSA, the first Near-Data Processing accelerator for time series analysis. The key idea is to exploit modern 3D-stacked High Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile computation near memory, where time series data resides. NATSA provides three key benefits: 1) quickly computing the matrix profile for a wide range of applications by building specialized energy-efficient floating-point arithmetic processing units close to HBM, 2) improving the energy efficiency and execution time by reducing the need for data movement over slow and energy-hungry buses between the computation units and the memory units, and 3) analyzing time series data at scale by exploiting low-latency, high-bandwidth, and energy-efficient memory access provided by HBM. Our experimental evaluation shows that NATSA improves performance by up to 14.2x (9.9x on average) and reduces energy by up to 27.2x (19.4x on average), over the state-of-the-art multi-core implementation. NATSA also improves performance by 6.3x and reduces energy by 10.2x over a general-purpose NDP platform with 64 in-order cores.Comment: To appear in the 38th IEEE International Conference on Computer Design (ICCD 2020

arXiv.org e-Print Archive

Crossref

Repositorio Institucional Universidad de Málaga

Big networks : a survey

Author: Bedru Hayat
Xia Feng
Xiao Xinru
Yu Shuo
Zhang Da
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called a big network. A big networks is generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further. © 2020 Elsevier Inc

Federation ResearchOnline

Architecture and dynamics of the jasmonic acid gene regulatory network

Author: Caarls Lotte
De Vries Michel
Denby Katherine
Hickman Richard
Jironkin Aleksey
Pereira Mendes Marciel
Pieterse Corne M. J.
Rhodes Johanna
Schuurink Robert C
Steenbergen Merel
Talbot Adam
Van der Nagel Ivo
Van Dijken Anja J H
Van Verk Marcel C
Van Wees Saskia CM
Vroegop-Vos Irene A
Wesselink Gert Jan
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 21/08/2017
Field of study

Jasmonic acid (JA) is a critical hormonal regulator of plant growth and defense. To advance our understanding of the architecture and dynamic regulation of the JA gene regulatory network, we performed a high-resolution RNA-seq time series of methyl JA-treated Arabidopsis thaliana at 15 time points over a 16-h period. Computational analysis showed that methyl JA (MeJA) induces a burst of transcriptional activity, generating diverse expression patterns over time that partition into distinct sectors of the JA response targeting specific biological processes. The presence of transcription factor (TF) DNA binding motifs correlated with specific TF activity during temporal MeJA-induced transcriptional reprogramming. Insight into the underlying dynamic transcriptional regulation mechanisms was captured in a chronological model of the JA gene regulatory network. Several TFs, including MYB59 and bHLH27, were uncovered as early network components with a role in pathogen and insect resistance. Analysis of subnetworks surrounding the TFs ORA47, RAP2.6L, MYB59, and ANAC055, using transcriptome profiling of overexpressors and mutants, provided insights into their regulatory role in defined modules of the JA network. Collectively, our work illuminates the complexity of the JA gene regulatory network, pinpoints and validates previously unknown regulators, and provides a valuable resource for functional studies on JA signaling components in plant defense and development

Crossref

Warwick Research Archives Portal Repository

White Rose Research Online

Utrecht University Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Learning and mining from personal digital archives

Author: Li Na
Publication venue: Dublin City University. Scientific Computing and Complex Systems Modelling (Sci-Sym)
Publication date: 01/03/2020
Field of study

Given the explosion of new sensing technologies, data storage has become significantly cheaper and consequently, people increasingly rely on wearable devices to create personal digital archives. Lifelogging is the act of recording aspects of life in digital format for a variety of purposes such as aiding human memory, analysing human lifestyle and diet monitoring. In this dissertation we are concerned with Visual Lifelogging, a form of lifelogging based on the passive capture of photographs by a wearable camera. Cameras, such as Microsoft's SenseCam can record up to 4,000 images per day as well as logging data from several incorporated sensors. Considering the volume, complexity and heterogeneous nature of such data collections, it is a signifcant challenge to interpret and extract knowledge for the practical use of lifeloggers and others. In this dissertation, time series analysis methods have been used to identify and extract useful information from temporal lifelogging images data, without benefit of prior knowledge. We focus, in particular, on three fundamental topics: noise reduction, structure and characterization of the raw data; the detection of multi-scale patterns; and the mining of important, previously unknown repeated patterns in the time series of lifelog image data. Firstly, we show that Detrended Fluctuation Analysis (DFA) highlights the feature of very high correlation in lifelogging image collections. Secondly, we show that study of equal-time Cross-Correlation Matrix demonstrates atypical or non-stationary characteristics in these images. Next, noise reduction in the Cross-Correlation Matrix is addressed by Random Matrix Theory (RMT) before Wavelet multiscaling is used to characterize the `most important' or `unusual' events through analysis of the associated dynamics of the eigenspectrum. A motif discovery technique is explored for detection of recurring and recognizable episodes of an individual's image data. Finally, we apply these motif discovery techniques to two known lifelog data collections, All I Have Seen (AIHS) and NTCIR-12 Lifelog, in order to examine multivariate recurrent patterns of multiple-lifelogging users

Irish Universities

DCU Online Research Access Service

Identifying gene regulatory networks common to multiple plant stress responses

Author: Rhodes Johanna
Publication venue
Publication date
Field of study

Stress responses in plants can be defined as a change that affects the homeostasis of pathways, resulting in a phenotype that may or may not be visible to the human eye, affecting the fitness of the plant. Crosstalk is believed to be the shared components of pathways of networks, and is widespread in plants, as shown by examples of crosstalk between transcriptional regulation pathways, and hormone signalling. Crosstalk between stress responses is believed to exist, particularly crosstalk within the responses to biotic stress, and within the responses to abiotic stress. Certain hormone pathways are known to be involved in the crosstalk between the responses to both biotic and abiotic stresses, and can confer immunity or tolerance of Arabidopsis thaliana to these stresses. Transcriptional regulation has also been identified as an important factor in controlling tolerance and resistance to stresses. In this thesis, networks of regulation mediating the response tomultiple stresses are studied. Firstly, co-regulation was predicted for genes differentially expressed in two or more stresses by development of a novel multi-clustering approach, Wigwams Identifies Genes Working Across Multiple Stresses (Wigwams). This approach finds groups of genes whose expression is correlated within stresses, but also identifies a strong statistical link between subsets of stresses. Wigwams identifies the known co-expression of genes encoding enzymes of metabolic and flavonoid biosynthesis pathways, and predicts novels clusters of co-expressed genes. By hypothesising that by being coexpressed could also infer that the genes are co-regulated, promoter motif analysis and modelling provides information for potential upstream regulators. The context-free regulation of groups of co-expressed genes, or potential regulons, was explored using models generated by modelling techniques, in order to generate a quantitative model of transcriptional regulation during the response to B. cinerea, P. syringae pv. tomato DC3000 and senescence. This model was subsequently validated and extended by experimental techniques, using Yeast 1-Hybrid to investigate the protein-DNA interactions, and also microarrays. Analysis of mutants and plants overexpressing a predicted regulator, Rap2.6L, by gene expression analysis identified a number of potential regulon members as downstream targets. Rap2.6L was identified as an indirect regulator of the transcription factor members of three potential regulons co-expressed in the stresses B. cinerea, P. syringae pv. tomato DC3000 and long day senescence, allowing the confirmation of a predicted gene regulatory network operating in multiple stress responses

Warwick Research Archives Portal Repository