30 research outputs found
Learning curves for Gaussian process regression: Approximations and bounds
We consider the problem of calculating learning curves (i.e., average
generalization performance) of Gaussian processes used for regression. On the
basis of a simple expression for the generalization error, in terms of the
eigenvalue decomposition of the covariance function, we derive a number of
approximation schemes. We identify where these become exact, and compare with
existing bounds on learning curves; the new approximations, which can be used
for any input space dimension, generally get substantially closer to the truth.
We also study possible improvements to our approximations. Finally, we use a
simple exactly solvable learning scenario to show that there are limits of
principle on the quality of approximations and bounds expressible solely in
terms of the eigenvalue spectrum of the covariance function.Comment: 25 pages, 10 figure
Characterizing the morbid genome of ciliopathies
Background Ciliopathies are clinically diverse disorders of the primary cilium. Remarkable progress has been made in understanding the molecular basis of these genetically heterogeneous conditions; however, our knowledge of their morbid genome, pleiotropy, and variable expressivity remains incomplete. Results We applied genomic approaches on a large patient cohort of 371 affected individuals from 265 families, with phenotypes that span the entire ciliopathy spectrum. Likely causal mutations in previously described ciliopathy genes were identified in 85% (225/265) of the families, adding 32 novel alleles. Consistent with a fully penetrant model for these genes, we found no significant difference in their âmutation loadâ beyond the causal variants between our ciliopathy cohort and a control non-ciliopathy cohort. Genomic analysis of our cohort further identified mutations in a novel morbid gene TXNDC15, encoding a thiol isomerase, based on independent loss of function mutations in individuals with a consistent ciliopathy phenotype (Meckel-Gruber syndrome) and a functional effect of its deficiency on ciliary signaling. Our study also highlighted seven novel candidate genes (TRAPPC3, EXOC3L2, FAM98C, C17orf61, LRRCC1, NEK4, and CELSR2) some of which have established links to ciliogenesis. Finally, we show that the morbid genome of ciliopathies encompasses many founder mutations, the combined carrier frequency of which accounts for a high disease burden in the study population. Conclusions Our study increases our understanding of the morbid genome of ciliopathies. We also provide the strongest evidence, to date, in support of the classical Mendelian inheritance of Bardet-Biedl syndrome and other ciliopathies
Characterization of greater middle eastern genetic variation for enhanced disease gene discovery
The Greater Middle East (GME) has been a central hub of human migration and population admixture. The tradition of consanguinity, variably practiced in the Persian Gulf region, North Africa, and Central Asia1-3, has resulted in an elevated burden of recessive disease4. Here we generated a whole-exome GME variome from 1,111 unrelated subjects. We detected substantial diversity and admixture in continental and subregional populations, corresponding to several ancient founder populations with little evidence of bottlenecks. Measured consanguinity rates were an order of magnitude above those in other sampled populations, and the GME population exhibited an increased burden of runs of homozygosity (ROHs) but showed no evidence for reduced burden of deleterious variation due to classically theorized âgenetic purgingâ. Applying this database to unsolved recessive conditions in the GME population reduced the number of potential disease-causing variants by four- to sevenfold. These results show variegated genetic architecture in GME populations and support future human genetic discoveries in Mendelian and population genetics
PromoSer: improvements to the algorithm, visualization and accessibility Nucl
PromoSer is a web service that provides an easy and efficient approach to the batch retrieval of a large number of proximal promoters. Since its introduction last year, it has undergone continued development and expansion. At the core, there have been improvements in the filtering of the raw mRNA/EST sequences upon which all predictions are built, improvements in the alignments clustering and transcription start site prediction algorithms, and improvements in the backing database for increased performance. At the user interface level, improvements include enhanced functionality and user options, better integration with other resources on the web and a new visualization tool. PromoSer now also supports queries using a SOAP-based interface and XML-based responses. The service is publicly available a
Learning curves for Gaussian process
We consider the problem of calculating learning curves (i.e., average generalization performance) of Gaussian processes used for regression. On the basis of a simple expression for the generalization error, in terms of the eigenvalue decomposition of the covariance function, we derive a number of approximation schemes
SeqVISTA: a new module of integrated computational tools for studying transcriptional regulation
Transcriptional regulation is one of the most basic regulatory mechanisms in the cell. The accumulation of multiple metazoan genome sequences and the advent of high-throughput experimental techniques have motivated the development of a large number of bioinformatics methods for the detection of regulatory motifs. The regulatory process is extremely complex and individual computational algorithms typically have very limited success in genome-scale studies. Here, we argue the importance of integrating multiple computational algorithms and present an infrastructure that integrates eight web services covering key areas of transcriptional regulation. We have adopted the client-side integration technology and built a consistent input and output environment with a versatile visualization tool named SeqVISTA. The infrastructure will allow for easy integration of gene regulation analysis software that is scattered over the Internet. It will also enable bench biologists to perform an arsenal of analysis using cutting-edge methods in a familiar environment and bioinformatics researchers to focus on developing new algorithms without the need to invest substantial effort on complex pre- or post-processors. SeqVISTA is freely available to academic users and can be launched online at http://zlab.bu.edu/SeqVISTA/web.jnlp, provided that Java Web Start has been installed. In addition, a stand-alone version of the program can be downloaded and run locally. It can be obtained at http://zlab.bu.edu/SeqVISTA