310 research outputs found

    Predicting multiplex subcellular localization of proteins using protein-protein interaction network: a comparative study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteins that interact in vivo tend to reside within the same or "adjacent" subcellular compartments. This observation provides opportunities to reveal protein subcellular localization in the context of the protein-protein interaction (PPI) network. However, so far, only a few efforts based on heuristic rules have been made in this regard.</p> <p>Results</p> <p>We systematically and quantitatively validate the hypothesis that proteins physically interacting with each other probably share at least one common subcellular localization. With the result, for the first time, four graph-based semi-supervised learning algorithms, Majority, <it>χ</it><sup>2</sup>-score, GenMultiCut and FunFlow originally proposed for protein function prediction, are introduced to assign "multiplex localization" to proteins. We analyze these approaches by performing a large-scale cross validation on a <it>Saccharomyces cerevisiae </it>proteome compiled from BioGRID and comparing their predictions for 22 protein subcellular localizations. Furthermore, we build an ensemble classifier to associate 529 unlabeled and 137 ambiguously-annotated proteins with subcellular localizations, most of which have been verified in the previous experimental studies.</p> <p>Conclusions</p> <p>Physical interaction of proteins has actually provided an essential clue for their co-localization. Compared to the local approaches, the global algorithms consistently achieve a superior performance.</p

    Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA repair is the general term for the collection of critical mechanisms which repair many forms of DNA damage such as methylation or ionizing radiation. DNA repair has mainly been studied in experimental and clinical situations, and relatively few information-based approaches to new extracting DNA repair knowledge exist. As a first step, automatic detection of DNA repair proteins in genomes via informatics techniques is desirable; however, there are many forms of DNA repair and it is not a straightforward process to identify and classify repair proteins with a single optimal method. We perform a study of the ability of homology and machine learning-based methods to identify and classify DNA repair proteins, as well as scan vertebrate genomes for the presence of novel repair proteins. Combinations of primary sequence polypeptide frequency, secondary structure, and homology information are used as feature information for input to a Support Vector Machine (SVM).</p> <p>Results</p> <p>We identify that SVM techniques are capable of identifying portions of DNA repair protein datasets without admitting false positives; at low levels of false positive tolerance, homology can also identify and classify proteins with good performance. Secondary structure information provides improved performance compared to using primary structure alone. Furthermore, we observe that machine learning methods incorporating homology information perform best when data is filtered by some clustering technique. Analysis by applying these methodologies to the scanning of multiple vertebrate genomes confirms a positive correlation between the size of a genome and the number of DNA repair protein transcripts it is likely to contain, and simultaneously suggests that all organisms have a non-zero minimum number of repair genes. In addition, the scan result clusters several organisms' repair abilities in an evolutionarily consistent fashion. Analysis also identifies several functionally unconfirmed proteins that are highly likely to be involved in the repair process. A new web service, INTREPED, has been made available for the immediate search and annotation of DNA repair proteins in newly sequenced genomes.</p> <p>Conclusion</p> <p>Despite complexity due to a multitude of repair pathways, combinations of sequence, structure, and homology with Support Vector Machines offer good methods in addition to existing homology searches for DNA repair protein identification and functional annotation. Most importantly, this study has uncovered relationships between the size of a genome and a genome's available repair repetoire, and offers a number of new predictions as well as a prediction service, both which reduce the search time and cost for novel repair genes and proteins.</p

    Underdiagnosis and referral bias of autism in ethnic minorities

    Get PDF
    This study examined (1) the distribution of ethnic minorities among children referred to autism institutions and (2) referral bias in pediatric assessment of autism in ethnic minorities. It showed that compared to the known community prevalence, ethnic minorities were under-represented among 712 children referred to autism institutions. In addition, pediatricians (n = 81) more often referred to autism when judging clinical vignettes of European majority cases (Dutch) than vignettes including non-European minority cases (Moroccan or Turkish). However, when asked explicitly for ratings of the probability of autism, the effect of ethnic background on autism diagnosis disappeared. We conclude that the use of structured ratings may decrease the likelihood of ethnic bias in diagnostic decisions of autis

    Azimuthal anisotropy and correlations at large transverse momenta in p+pp+p and Au+Au collisions at sNN\sqrt{s_{_{NN}}}= 200 GeV

    Get PDF
    Results on high transverse momentum charged particle emission with respect to the reaction plane are presented for Au+Au collisions at sNN\sqrt{s_{_{NN}}}= 200 GeV. Two- and four-particle correlations results are presented as well as a comparison of azimuthal correlations in Au+Au collisions to those in p+pp+p at the same energy. Elliptic anisotropy, v2v_2, is found to reach its maximum at pt3p_t \sim 3 GeV/c, then decrease slowly and remain significant up to pt7p_t\approx 7 -- 10 GeV/c. Stronger suppression is found in the back-to-back high-ptp_t particle correlations for particles emitted out-of-plane compared to those emitted in-plane. The centrality dependence of v2v_2 at intermediate ptp_t is compared to simple models based on jet quenching.Comment: 4 figures. Published version as PRL 93, 252301 (2004

    Azimuthal anisotropy in Au+Au collisions at sqrtsNN = 200 GeV

    Get PDF
    The results from the STAR Collaboration on directed flow (v_1), elliptic flow (v_2), and the fourth harmonic (v_4) in the anisotropic azimuthal distribution of particles from Au+Au collisions at sqrtsNN = 200 GeV are summarized and compared with results from other experiments and theoretical models. Results for identified particles are presented and fit with a Blast Wave model. Different anisotropic flow analysis methods are compared and nonflow effects are extracted from the data. For v_2, scaling with the number of constituent quarks and parton coalescence is discussed. For v_4, scaling with v_2^2 and quark coalescence is discussed.Comment: 26 pages. As accepted by Phys. Rev. C. Text rearranged, figures modified, but data the same. However, in Fig. 35 the hydro calculations are corrected in this version. The data tables are available at http://www.star.bnl.gov/central/publications/ by searching for "flow" and then this pape

    Effective Rheology of Bubbles Moving in a Capillary Tube

    Full text link
    We calculate the average volumetric flux versus pressure drop of bubbles moving in a single capillary tube with varying diameter, finding a square-root relation from mapping the flow equations onto that of a driven overdamped pendulum. The calculation is based on a derivation of the equation of motion of a bubble train from considering the capillary forces and the entropy production associated with the viscous flow. We also calculate the configurational probability of the positions of the bubbles.Comment: 4 pages, 1 figur

    Socioeconomic Inequality in the Prevalence of Autism Spectrum Disorder: Evidence from a U.S. Cross-Sectional Study

    Get PDF
    This study was designed to evaluate the hypothesis that the prevalence of autism spectrum disorder (ASD) among children in the United States is positively associated with socioeconomic status (SES).A cross-sectional study was implemented with data from the Autism and Developmental Disabilities Monitoring Network, a multiple source surveillance system that incorporates data from educational and health care sources to determine the number of 8-year-old children with ASD among defined populations. For the years 2002 and 2004, there were 3,680 children with ASD among a population of 557,689 8-year-old children. Area-level census SES indicators were used to compute ASD prevalence by SES tertiles of the population.Prevalence increased with increasing SES in a dose-response manner, with prevalence ratios relative to medium SES of 0.70 (95% confidence interval [CI] 0.64, 0.76) for low SES, and of 1.25 (95% CI 1.16, 1.35) for high SES, (P<0.001). Significant SES gradients were observed for children with and without a pre-existing ASD diagnosis, and in analyses stratified by gender, race/ethnicity, and surveillance data source. The SES gradient was significantly stronger in children with a pre-existing diagnosis than in those meeting criteria for ASD but with no previous record of an ASD diagnosis (p<0.001), and was not present in children with co-occurring ASD and intellectual disability.The stronger SES gradient in ASD prevalence in children with versus without a pre-existing ASD diagnosis points to potential ascertainment or diagnostic bias and to the possibility of SES disparity in access to services for children with autism. Further research is needed to confirm and understand the sources of this disparity so that policy implications can be drawn. Consideration should also be given to the possibility that there may be causal mechanisms or confounding factors associated with both high SES and vulnerability to ASD

    Semi-supervised protein subcellular localization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data.</p> <p>Results</p> <p>In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions.</p> <p>Conclusion</p> <p>Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.</p
    corecore