2,181 research outputs found
Recommended from our members
Simulating multiple faceted variability in single cell RNA sequencing.
The abundance of new computational methods for processing and interpreting transcriptomes at a single cell level raises the need for in silico platforms for evaluation and validation. Here, we present SymSim, a simulator that explicitly models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to the three primary sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states (both discrete and continuous), and technical variation due to low sensitivity and measurement noise and bias. We demonstrate how SymSim can be used for benchmarking methods for clustering, differential expression and trajectory inference, and for examining the effects of various parameters on their performance. We also show how SymSim can be used to evaluate the number of cells required to detect a rare population under various scenarios
Recommended from our members
Genomic regions, cellular components and gene regulatory basis underlying pod length variations in cowpea (V. unguiculata L. Walp).
Cowpea (V. unguiculata L. Walp) is a climate resilient legume crop important for food security. Cultivated cowpea (V. unguiculata L) generally comprises the bushy, short-podded grain cowpea dominant in Africa and the climbing, long-podded vegetable cowpea popular in Asia. How selection has contributed to the diversification of the two types of cowpea remains largely unknown. In the current study, a novel genotyping assay for over 50 000 SNPs was employed to delineate genomic regions governing pod length. Major, minor and epistatic QTLs were identified through QTL mapping. Seventy-two SNPs associated with pod length were detected by genome-wide association studies (GWAS). Population stratification analysis revealed subdivision among a cowpea germplasm collection consisting of 299 accessions, which is consistent with pod length groups. Genomic scan for selective signals suggested that domestication of vegetable cowpea was accompanied by selection of multiple traits including pod length, while the further improvement process was featured by selection of pod length primarily. Pod growth kinetics assay demonstrated that more durable cell proliferation rather than cell elongation or enlargement was the main reason for longer pods. Transcriptomic analysis suggested the involvement of sugar, gibberellin and nutritional signalling in regulation of pod length. This study establishes the basis for map-based cloning of pod length genes in cowpea and for marker-assisted selection of this trait in breeding programmes
Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast
Background: Bulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors.
Results: To increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM). Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method.
Conclusions: EXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio's i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available
Assessing the Impact of Game Day Schedule and Opponents on Travel Patterns and Route Choice using Big Data Analytics
The transportation system is crucial for transferring people and goods from point A to point B. However, its reliability can be decreased by unanticipated congestion resulting from planned special events. For example, sporting events collect large crowds of people at specific venues on game days and disrupt normal traffic patterns.
The goal of this study was to understand issues related to road traffic management during major sporting events by using widely available INRIX data to compare travel patterns and behaviors on game days against those on normal days. A comprehensive analysis was conducted on the impact of all Nebraska Cornhuskers football games over five years on traffic congestion on five major routes in Nebraska. We attempted to identify hotspots, the unusually high-risk zones in a spatiotemporal space containing traffic congestion that occur on almost all game days. For hotspot detection, we utilized a method called Multi-EigenSpot, which is able to detect multiple hotspots in a spatiotemporal space. With this algorithm, we were able to detect traffic hotspot clusters on the five chosen routes in Nebraska. After detecting the hotspots, we identified the factors affecting the sizes of hotspots and other parameters. The start time of the game and the Cornhuskers’ opponent for a given game are two important factors affecting the number of people coming to Lincoln, Nebraska, on game days. Finally, the Dynamic Bayesian Networks (DBN) approach was applied to forecast the start times and locations of hotspot clusters in 2018 with a weighted mean absolute percentage error (WMAPE) of 13.8%
The circadian clock rephases during lateral root organ initiation in Arabidopsis thaliana
The endogenous circadian clock enables organisms to adapt their growth and development to environmental changes. Here we describe how the circadian clock is employed to coordinate responses to the key signal auxin during lateral root (LR) emergence. In the model plant, Arabidopsis thaliana, LRs originate from a group of stem cells deep within the root, necessitating that new organs emerge through overlying root tissues. We report that the circadian clock is rephased during LR development. Metabolite and transcript profiling revealed that the circadian clock controls the levels of auxin and auxin-related genes including the auxin response repressor IAA14 and auxin oxidase AtDAO2. Plants lacking or overexpressing core clock components exhibit LR emergence defects. We conclude that the circadian clock acts to gate auxin signalling during LR development to facilitate organ emergence
Diversification of myco-heterotrophic angiosperms: evidence from Burmanniaceae.
Background - Myco-heterotrophy evolved independently several times during angiosperm evolution. Although many species of myco-heterotrophic plants are highly endemic and long-distance dispersal seems unlikely, some genera are widely dispersed and have pantropical distributions, often with large disjunctions. Traditionally this has been interpreted as evidence for an old age of these taxa. However, due to their scarcity and highly reduced plastid genomes our understanding about the evolutionary histories of the angiosperm myco-heterotrophic groups is poor. Results - We provide a hypothesis for the diversification of the myco-heterotrophic family Burmanniaceae. Phylogenetic inference, combined with biogeographical analyses, molecular divergence time estimates, and diversification analyses suggest that Burmanniaceae originated in West Gondwana and started to diversify during the Late Cretaceous. Diversification and migration of the species-rich pantropical genera Burmannia and Gymnosiphon display congruent patterns. Diversification began during the Eocene, when global temperatures peaked and tropical forests occurred at low latitudes. Simultaneous migration from the New to the Old World in Burmannia and Gymnosiphon occurred via boreotropical migration routes. Subsequent Oligocene cooling and breakup of boreotropical flora ended New-Old World migration and caused a gradual decrease in diversification rate in Burmanniaceae. Conclusion - Our results indicate that extant diversity and pantropical distribution of myco-heterotrophic Burmanniaceae is the result of diversification and boreotropical migration during the Eocene when tropical rain forest expanded dramaticall
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions. In this
paper we present a unified analysis framework for the analysis of quantitative
image data using a Bayesian functional mixed model approach. This framework is
flexible enough to handle complex, irregular images with many local features,
and can model the simultaneous effects of multiple factors on the image
intensities and account for the correlation between images induced by the
design. We introduce a general isomorphic modeling approach to fitting the
functional mixed model, of which the wavelet-based functional mixed model is
one special case. With suitable modeling choices, this approach leads to
efficient calculations and can result in flexible modeling and adaptive
smoothing of the salient features in the data. The proposed method has the
following advantages: it can be run automatically, it produces inferential
plots indicating which regions of the image are associated with each factor, it
simultaneously considers the practical and statistical significance of
findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The Pawnee earthquake as a result of the interplay among injection, faults and foreshocks
The Pawnee M5.8 earthquake is the largest event in Oklahoma instrument recorded history. It occurred near the edge of active seismic zones, similar to other M5+ earthquakes since 2011. It ruptured a previously unmapped fault and triggered aftershocks along a complex conjugate fault system. With a high-resolution earthquake catalog, we observe propagating foreshocks leading to the mainshock within 0.5 km distance, suggesting existence of precursory aseismic slip. At approximately 100 days before the mainshock, two M ≥ 3.5 earthquakes occurred along a mapped fault that is conjugate to the mainshock fault. At about 40 days before, two earthquakes clusters started, with one M3 earthquake occurred two days before the mainshock. The three M ≥ 3 foreshocks all produced positive Coulomb stress at the mainshock hypocenter. These foreshock activities within the conjugate fault system are near-instantaneously responding to variations in injection rates at 95% confidence. The short time delay between injection and seismicity differs from both the hypothetical expected time scale of diffusion process and the long time delay observed in this region prior to 2016, suggesting a possible role of elastic stress transfer and critical stress state of the fault. Our results suggest that the Pawnee earthquake is a result of interplay among injection, tectonic faults, and foreshocks
Detecting microcalcification clusters in digital mammograms: Study for inclusion into computer aided diagnostic prompting system
Among signs of breast cancer encountered in digital mammograms radiologists point to microcalcification clusters (MCCs). Their detection is a challenging problem from both medical and image processing point of views. This work presents two concurrent methods for MCC detection, and studies their possible inclusion to a computer aided diagnostic prompting system. One considers Wavelet Domain Hidden Markov Tree (WHMT) for modeling microcalcification edges. The model is used for differentiation between MC and non-MC edges based on the weighted maximum likelihood (WML) values. The classification of objects is carried out using spatial filters. The second method employs SUSAN edge detector in the spatial domain for mammogram segmentation. Classification of objects as calcifications is carried out using another set of spatial filters and Feedforward Neural Network (NN). A same distance filter is employed in both methods to find true clusters. The analysis of two methods is performed on 54 image regions from the mammograms selected randomly from DDSM database, including benign and cancerous cases as well as cases which can be classified as hard cases from both radiologists and the computer perspectives. WHMT/WML is able to detect 98.15% true positive (TP) MCCs under 1.85% of false positives (FP), whereas the SUSAN/NN method achieves 94.44% of TP at the cost of 1.85% for FP. The comparison of these two methods suggests WHMT/WML for the computer aided diagnostic prompting. It also certifies the low false positive rates for both methods, meaning less biopsy tests per patient
- …