45 research outputs found

    Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data.</p> <p>Results</p> <p>We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at <url>http://virus.hpid.org/interactions.aspx</url>. The complexes of human and virus proteins were extracted from PDB and their identifiers are available at <url>http://virus.hpid.org/PDB_IDs.html</url>.</p> <p>Conclusion</p> <p>When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method.</p

    Efficient polygenic risk scores for biobank scale data by exploiting phenotypes from inferred relatives

    Get PDF
    Polygenic risk scores are emerging as a potentially powerful tool to predict future phenotypes of target individuals, typically using unrelated individuals, thereby devaluing information from relatives. Here, for 50 traits from the UK Biobank data, we show that a design of 5,000 individuals with first-degree relatives of target individuals can achieve a prediction accuracy similar to that of around 220,000 unrelated individuals (mean prediction accuracy = 0.26 vs. 0.24, mean fold-change = 1.06 (95% CI: 0.99-1.13), P-value = 0.08), despite a 44-fold difference in sample size. For lifestyle traits, the prediction accuracy with 5,000 individuals including first-degree relatives of target individuals is significantly higher than that with 220,000 unrelated individuals (mean prediction accuracy = 0.22 vs. 0.16, mean fold-change = 1.40 (1.17-1.62), P-value = 0.025). Our findings suggest that polygenic prediction integrating family information may help to accelerate precision health and clinical intervention

    Understanding Opinions towards Migrants in Transit An Analysis of Tweets on Migrant Caravans in the US and Mexico

    Get PDF
    The study of opinions towards migrants is profoundly important to understanding migration as well as to politics. Previous research has contributed to understanding anti-immigrant attitudes using social media data. However, there is still a need for a better understanding of opinions towards migrants in transit. We study the case of Central American migrant caravans from 2018 to 2021 by looking at the opinions in both the US, the destination country, and Mexico, the transit country. Media highly covered these events, and an online debate about them started on social media. Our research aims to understand how migrant caravans are discussed online. We are particularly interested in how media salience and geographical variables are associated with the sentiment intensity of the opinions. We combine geolocated data from Twitter, GDELT (Global Database of Events, Language, and Tone), and Survey and Census data for the US and Mexico. We use topic modeling to find the latent topics within the online Twitter discussion, and VADER sentiment analysis to quantify tweets' sentiments to calculate the sentiment intensity score that is used as the dependent variable of our OLS regression models. For both countries, we found that similar topics were discussed with a more political discussion in the US. Our analysis of the sentiment score revealed that sentiment does not reflect stance adequately, which led us to analyze the sentiment intensity score (absolute value of sentiment). We found that, for Mexico, when the media generated a higher number of news articles about migrant caravans, the sentiment intensity was higher. For the geographical variables, we found no significant association in the US; however, for Mexico, tweets in bordering states had a lower sentiment intensity. These results shed light on the differences in the determinants of sentiment intensity in opinions between the two countries.</p

    A Real-Time Database Testbed and Performance Evaluation

    No full text
    A lot of real-time database (RTDB) research has been done to process transactions in a timely fashion using fresh data reflecting the current real world status. However, most existing RTDB work is based on simulations. Due to the absence of a publicly available RTDB testbed, it is very hard to evaluate real-time data management techniques in a realistic environment. To address the problem, we design and develop an initial version of a RTDB testbed, called RTDB2 (Real-Time Database Benchmark), atop an open source database [5]. We develop soft real-time database workloads that model online stock trades, providing several knobs to specify workloads for RTDB performance evaluation. In addition, we develop a QoS management scheme in RTDB2 to detect overload and reduce workloads, via admission control and adaptive temporal data updates, under overload. From the extensive experiments using the stock trading workloads developed in RTDB2, we observe that adaptive updates can considerably improve the transaction timeliness. We also observe that admission control can only enhance the timeliness under severe overload, possibly causing underutilization problems for moderate workloads

    A novel small molecule based on dithienophosphole oxide for bulk heterojunction solar cells without pre- or post-treatments

    No full text
    A novel small molecule (PDTP-WR) with a dithieno[3,2- b:2��,3��- d ]phosphole oxide (DTP) core unit was designed and synthesized for use in BHJ solar cells. This small molecule had an optical band gap of 1.65?eV, appropriate for a donor material, and HOMO and LUMO energy levels of??5.10, and??3.45?eV, respectively, which provided a broad absorption and superior charge transfer properties. The solar cell devices prepared using PDTP-WR and PC71BM gave a promising power conversion efficiency of 5.04%, a Jsc of 12.98?mA?cm?2, a Voc of 0.79?V, and a fill factor of 49%, without pre- or post-treatments. Morphological and structural studies revealed that PDTP-WR formed a favorable BHJ morphology. An appropriate domain size of less than 20?nm and bi-continuous interpenetrating paths with an ordered structure obtained as a result of strong intermolecular interactions can support the device performances. This is the first reported use of DTP in small molecule donor, and the results of these studies indicated that DTP-based small molecule can be a promising candidate for the photovoltaic applications. ? 2017 Elsevier Ltd113sciescopu

    Efficient polygenic risk scores for biobank scale data by exploiting phenotypes from inferred relatives

    No full text
    Genetic data from large cohorts of unrelated individuals can be used to create polygenic risk scores, which could be used to predict individual risk of developing a specific disease. Here the authors show that smaller cohorts of related individuals can provide similarly powerful predictive ability

    Enamel matrix derivative in dental implantation for enhanced healing

    No full text
    Introduction: Enamel matrix derivative has been used for increasing periodontal regeneration including alveolar bone, new cementum, and periodontal ligament. The aim of this report is to present a case of the effect of enamel matrix derivative on bone regeneration and soft tissue healing in dental implantation. Case Description: Case1 Fifty-four-year-old male patient visited the clinic with the missing tooth on maxillary right first molar. The patient rinsed the intraoral area with 0.12% chlorhexidine digluconate solution before the periodontal surgery. Elevation of a full-thickness flap was done after injection of 2% lidocaine containing 1:100,000 epinephrine. The defect area was grafted with enamel matrix derivative, bone graft and membrane. Uneventful healing was achieved and postoperative follow-up check was performed. Installation of dental implant was performed afterwards.Case 2 Forty-eight-year-old male was presented with the missing tooth on mandibular left first molar. The defect area was grafted with enamel matrix derivative, bone graft and membrane. Postoperative follow-up check was performed.Case 3 Seventy-seven-year-old male was presented with the missing tooth on maxillary left second premolar, first molar and second molar. The defect area was grafted with enamel matrix derivative, bone graft and membrane. Discussion: This study has suggested the use of enamel matrix derivative for wider applications including soft tissue surgery and dental implants. Faster soft tissue healing with higher maturity can be achieved by applying enamel matrix derivatives. Conclusion/Clinical Significance: In conclusion, individuals with bony defects can be restored with enamel matrix derivative, bone graft and membrane from the right diagnosis, evaluation, and planning in dental implantation
    corecore