1,522 research outputs found
Alignment-free Genomic Analysis via a Big Data Spark Platform
Motivation: Alignment-free distance and similarity functions (AF functions,
for short) are a well established alternative to two and multiple sequence
alignments for many genomic, metagenomic and epigenomic tasks. Due to
data-intensive applications, the computation of AF functions is a Big Data
problem, with the recent Literature indicating that the development of fast and
scalable algorithms computing AF functions is a high-priority task. Somewhat
surprisingly, despite the increasing popularity of Big Data technologies in
Computational Biology, the development of a Big Data platform for those tasks
has not been pursued, possibly due to its complexity. Results: We fill this
important gap by introducing FADE, the first extensible, efficient and scalable
Spark platform for Alignment-free genomic analysis. It supports natively
eighteen of the best performing AF functions coming out of a recent hallmark
benchmarking study. FADE development and potential impact comprises novel
aspects of interest. Namely, (a) a considerable effort of distributed
algorithms, the most tangible result being a much faster execution time of
reference methods like MASH and FSWM; (b) a software design that makes FADE
user-friendly and easily extendable by Spark non-specialists; (c) its ability
to support data- and compute-intensive tasks. About this, we provide a novel
and much needed analysis of how informative and robust AF functions are, in
terms of the statistical significance of their output. Our findings naturally
extend the ones of the highly regarded benchmarking study, since the functions
that can really be used are reduced to a handful of the eighteen included in
FADE
Verifying the magnitude dependence in earthquake occurrence
The existence of magnitude dependence in earthquake triggering has been
reported. Such a correlation is linked to the issue of seismic predictability
and remains under intense debate whether it is physical or is caused by
incomplete data due to short-term aftershocks missing. Working firstly with a
synthetic catalogue generated by a numerical model that capture most
statistical features of earthquakes and then with an high-resolution earthquake
catalogue for the Amatrice-Norcia (2016) sequence in Italy, where for the
latter case we employ the stochastic declustering method to reconstruct the
family tree among seismic events and limit our analysis to events above the
magnitude of completeness, we found that the hypothesis of magnitude
correlation can be rejected
Evaluating the incompleteness magnitude using an unbiased estimate of the value
The evaluation of the value of the Gutenberg-Richter (GR) law, for a
sample composed of earthquakes, presents a systematic positive bias which is proportional to , as already observed by Ogata \& Yamashina
(1986). In this study we show how to incorporate in the bias
introduced by deviations from the GR law. More precisely we show that is proportional to the square of the variability coefficient , defined
as the ratio between {the standard deviation of the magnitude distribution and
its mean value.} When the magnitude distribution follows the GR law and
this allows us to introduce a new procedure, based on the dependence of on
, which allows us to {identify} the incompleteness magnitude as the
threshold magnitude leading to . The method is tested on synthetic
catalogs and it is applied to estimate in Southern California, Japan and
New Zealand
CD8+ T Cells: GITR Matters
As many members of the tumor necrosis factor receptor superfamily, glucocorticoid-induced TNFR-related gene (GITR) plays multiple roles mostly in the cells of immune system. CD8+ T cells are key players in the immunity against viruses and tumors, and GITR has been demonstrated to be an essential molecule for these cells to mount an immune response. The aim of this paper is to focus on GITR function in CD8+ cells, paying particular attention to numerous and recent studies that suggest its crucial role in mouse disease models
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
Background
Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic.
Results
We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy a specialized FASTA/Q compressor within MapReduce-Hadoop for processing files stored on the distributed Hadoop File System, with very little knowledge of Hadoop. Practically, we provide evidence that the deployment of those specialized compressors within Hadoop, not available so far, results in better space savings, and even in better execution times over compressed data, with respect to the use of generic compressors available in Hadoop, in particular for FASTQ files. Finally, we observe that these results hold also for the Apache Spark framework, when used to process FASTA/Q files stored on the Hadoop File System.
Conclusions
Our Methods and the corresponding software substantially contribute to achieve space and time savings for the storage and processing of FASTA/Q files in Hadoop and Spark. Being our approach general, it is very likely that it can be applied also to FASTA/Q compression methods that will appear in the future
Metabolic syndrome-breast cancer link varies by intrinsic molecular subtype
Metabolic syndrome (MS) has been shown to increase the risk of breast cancer. Existing data suggest that the strength of metabolic syndrome-breast cancer link varies by intrinsic molecular subtype, but results from worldwide literature are controversial. Primary endpoint of the study was to assess whether MS is a predictor of specific breast cancer (BC) subtype. Secondary endpoint was to determine whether components of MS can individually increase the risk of specific breast cancer subtype
Scaffolds in Tendon Tissue Engineering
Tissue engineering techniques using novel scaffold materials offer potential alternatives for managing tendon disorders. Tissue engineering strategies to improve tendon repair healing include the use of scaffolds, growth factors, cell seeding, or a combination of these approaches. Scaffolds have been the most common strategy investigated to date. Available scaffolds for tendon repair include both biological scaffolds, obtained from mammalian tissues, and synthetic scaffolds, manufactured from chemical compounds. Preliminary studies support the idea that scaffolds can provide an alternative for tendon augmentation with an enormous therapeutic potential. However, available data are lacking to allow definitive conclusion on the use of scaffolds for tendon augmentation. We review the current basic science and clinical understanding in the field of scaffolds and tissue engineering for tendon repair
Growth Factors and Anticatabolic Substances for Prevention and Management of Intervertebral Disc Degeneration
Intervertebral disc (IVD) degeneration is frequent, appearing from the second decade of life and progressing with age. Conservative management often fails, and patients with IVD degeneration may need surgical intervention. Several treatment strategies have been proposed, although only surgical discectomy and arthrodesis have been proved to be predictably effective. Biological strategies aim to prevent and manage IVD degeneration, improving the function and anabolic and reparative capabilities of the nucleus pulposus and annulus fibrosus cells and inhibiting matrix degradation. At present, clinical applications are still in their infancy. Further studies are required to clarify the role of growth factors and anticatabolic substances for prevention and management of intervertebral disc degeneration
- …