66 research outputs found

    Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

    Get PDF
    MOTIVATION: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. RESULTS: PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≀0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. AVAILABILITY AND IMPLEMENTATION: https://github.com/ima23/maxent-ppi. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Cosmological Consequences of Slow-Moving Bubbles in First-Order Phase Transitions

    Get PDF
    In cosmological first-order phase transitions, the progress of true-vacuum bubbles is expected to be significantly retarded by the interaction between the bubble wall and the hot plasma. We examine the evolution and collision of slow-moving true-vacuum bubbles. Our lattice simulations indicate that phase oscillations, predicted and observed in systems with a local symmetry and with a global symmetry where the bubbles move at speeds less than the speed of light, do not occur inside collisions of slow-moving local-symmetry bubbles. We observe almost instantaneous phase equilibration which would lead to a decrease in the expected initial defect density, or possibly prevent defects from forming at all. We illustrate our findings with an example of defect formation suppressed in slow-moving bubbles. Slow-moving bubble walls also prevent the formation of `extra defects', and in the presence of plasma conductivity may lead to an increase in the magnitude of any primordial magnetic field formed.Comment: 10 pages, 7 figures, replaced with typos corrected and reference added. To appear in Phys. Rev.

    Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics.

    Get PDF
    Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.LMB was supported by a BBSRC Tools and Resources Development Fund (Award BB/K00137X/1) and a Wellcome Trust Technology Development Grant (108441/Z/15/Z). LG was supported by the European Union 7th Framework Program (PRIME-XS project, grant agreement number 262067) and a BBSRC Strategic Longer and Larger Award (Award BB/L002817/1). DW and OK acknowledge funding from the European Union (PRIME-XS, GA 262067) and Deutsche Forschungsgemeinschaft (KO-2313/6-1).This is the final version of the article. It first appeared from PLOS via https://doi.org/10.1371/journal.pcbi.100492

    Quantifying humpback whale song sequences to understand the dynamics of song exchange at the ocean basin scale

    Get PDF
    Humpback whales have a continually evolving vocal sexual display, or "song," that appears to undergo both evolutionary and "revolutionary" change. All males within a population adhere to the current content and arrangement of the song. Populations within an ocean basin share similarities in their songs; this sharing is complex as multiple variations of the song (song types) may be present within a region at any one time. To quantitatively investigate the similarity of song types, songs were compared at both the individual singer and population level using the Levenshtein distance technique and cluster analysis. The highly stereotyped sequences of themes from the songs of 211 individuals from populations within the western and central South Pacific region from 1998 through 2008 were grouped together based on the percentage of song similarity, and compared to qualitatively assigned song types. The analysis produced clusters of highly similar songs that agreed with previous qualitative assignments. Each cluster contained songs from multiple populations and years, confirming the eastward spread of song types and their progressive evolution through the study region. Quantifying song similarity and exchange will assist in understanding broader song dynamics and contribute to the use of vocal displays as population identifiers

    Robust smoothing of left-censored time series data with a dynamic linear model to infer SARS-CoV-2 RNA concentrations in wastewater

    Get PDF
    Wastewater sampling for the detection and monitoring of SARS-CoV-2 has been developed and applied at an unprecedented pace, however uncertainty remains when interpreting the measured viral RNA signals and their spatiotemporal variation. The proliferation of measurements that are below a quantifiable threshold, usually during non-endemic periods, poses a further challenge to interpretation and time-series analysis of the data. Inspired by research in the use of a custom Kalman smoother model to estimate the true level of SARS-CoV-2 RNA concentrations in wastewater, we propose an alternative left-censored dynamic linear model. Cross-validation of both models alongside a simple moving average, using data from 286 sewage treatment works across England, allows for a comprehensive validation of the proposed approach. The presented dynamic linear model is more parsimonious, has a faster computational time and is represented by a more flexible modelling framework than the equivalent Kalman smoother. Furthermore we show how the use of wastewater data, transformed by such models, correlates more closely with regional case rate positivity as published by the Office for National Statistics (ONS) Coronavirus (COVID-19) Infection Survey. The modelled output is more robust and is therefore capable of better complementing traditional surveillance than untransformed data or a simple moving average, providing additional confidence and utility for public health decision making. La dΓ©tection et la surveillance du SARS-CoV-2 dans les eaux usΓ©es ont Γ©tΓ© dΓ©veloppΓ©es et rΓ©alisΓ©es Γ  un rythme sans prΓ©cΓ©dent, mais l'interprΓ©tation des mesures de concentrations en ARN viral, et de leurs variations spatio-temporelles, pose question. En particulier, l'importante proportion de mesures en deçà du seuil de quantification, gΓ©nΓ©ralement pendant les pΓ©riodes non endΓ©miques, constitue un dΓ©fi pour l'analyse de ces sΓ©ries temporelles. InspirΓ©s par un travail de recherche ayant produit un lisseur de Kalman adaptΓ© pour estimer les concentrations rΓ©elles en ARN de SARS-CoV-2 dans les eaux usΓ©es Γ  partir de ce type de donnΓ©es, nous proposons un nouveau modΓ¨le linΓ©aire dynamique avec censure Γ  gauche. Une validation croisΓ©e de ces lisseurs, ainsi que d'un simple lissage par moyenne glissante, sur des donnΓ©es provenant de 286 stations d'Γ©puration couvrant l'Angleterre, valide de faΓ§on complΓ¨te l'approche proposΓ©e. Le modΓ¨le prΓ©sentΓ© est plus parcimonieux, offre un cadre de modΓ©lisation plus flexible et nΓ©cessite un temps de calcul rΓ©duit par rapport au Lisseur de Kalman Γ©quivalent. Les donnΓ©es issues des eaux usΓ©es ainsi lissΓ©es sont en outre plus fortement corrΓ©lΓ©es avec le taux d'incidence rΓ©gional produit par le bureau des statistiques nationales (ONS) Coronavirus Infection Survey. Elles se montrent plus robustes que les donnΓ©es brutes, ou lissΓ©es par simple moyenne glissante, et donc plus Γ  mΓͺme de complΓ©ter la surveillance traditionnelle, renforΓ§ant ainsi la confiance en l'Γ©pidΓ©miologie fondΓ©e sur les eaux usΓ©es et son utilitΓ© pour la prise de dΓ©cisions de santΓ© publique

    Performance of a fully‐automated system on a WHO malaria microscopy evaluation slide set

    Get PDF
    Background: Manual microscopy remains a widely-used tool for malaria diagnosis and clinical studies, but it has inconsistent quality in the field due to variability in training and field practices. Automated diagnostic systems based on machine learning hold promise to improve quality and reproducibility of field microscopy. The World Health Organization (WHO) has designed a 55-slide set (WHO 55) for their External Competence Assessment of Malaria Microscopists (ECAMM) programme, which can also serve as a valuable benchmark for automated systems. The performance of a fully-automated malaria diagnostic system, EasyScan GO, on a WHO 55 slide set was evaluated. Methods: The WHO 55 slide set is designed to evaluate microscopist competence in three areas of malaria diagnosis using Giemsa-stained blood films, focused on crucial field needs: malaria parasite detection, malaria parasite species identification (ID), and malaria parasite quantitation. The EasyScan GO is a fully-automated system that combines scanning of Giemsa-stained blood films with assessment algorithms to deliver malaria diagnoses. This system was tested on a WHO 55 slide set. Results: The EasyScan GO achieved 94.3 % detection accuracy, 82.9 % species ID accuracy, and 50 % quantitation accuracy, corresponding to WHO microscopy competence Levels 1, 2, and 1, respectively. This is, to our knowledge, the best performance of a fully-automated system on a WHO 55 set. Conclusions: EasyScan GO’s expert ratings in detection and quantitation on the WHO 55 slide set point towards its potential value in drug efficacy use-cases, as well as in some case management situations with less stringent species ID needs. Improved runtime may enable use in general case management settings

    The devil is in the detail : quantifying vocal variation in a complex, multi-levelled, and rapidly evolving display

    Get PDF
    E.C.G. was funded by a Royal Society Newton International Fellowship. L.R. was supported by the MASTS pooling initiative (The Marine Alliance for Science and Technology for Scotland) and their support is gratefully acknowledged. MASTS is funded by the Scottish Funding Council (Grant Reference No. HR09011) and contributing institutions. Some funding and logistical support was provided to M.M.P. by the National Oceanic Society (USA), Dolphin & Whale Watching Expeditions (French Polynesia), Vista Press (USA), and the International Fund for Animal Welfare (via the South Pacific Whale Research Consortium).Identifying and quantifying variation in vocalizations is fundamental to advancing our understanding of processes such as speciation, sexual selection, and cultural evolution. The song of the humpback whale (Megaptera novaeangliae) presents an extreme example of complexity and cultural evolution. It is a long, hierarchically structured vocal display that undergoes constant evolutionary change. Obtaining robust metrics to quantify song variation at multiple scales (from a sound through to population variation across the seascape) is a substantial challenge. Here, we present a method to quantify song similarity at multiple levels within the hierarchy. To incorporate the complexity of these multiple levels, the calculation of similarity is weighted by measurements of sound units (lower levels within the display) to bridge the gap in information between upper and lower levels. Results demonstrate that the inclusion of weighting provides a more realistic and robust representation of song similarity at multiple levels within the display. Our method permits robust quantification of cultural patterns and processes that will also contribute to the conservation management of endangered humpback whale populations, and is applicable to any hierarchically structured signal sequence.PostprintPeer reviewe

    Nodule inception recruits the lateral root developmental program for symbiotic nodule organogenesis in Medicago truncatula

    Get PDF
    To overcome nitrogen deficiencies in the soil, legumes enter symbioses with rhizobial bacteria that convert atmospheric nitrogen into ammonium. Rhizobia are accommodated as endosymbionts within lateral root organs called nodules that initiate from the inner layers of Medicago truncatula roots in response to rhizobial perception. In contrast, lateral roots emerge from predefined founder cells as an adaptive response to environmental stimuli, including water and nutrient availability. CYTOKININ RESPONSE 1 (CRE1)-mediated signaling in the pericycle and in the cortex is necessary and sufficient for nodulation, whereas cytokinin is antagonistic to lateral root development, with cre1 showing increased lateral root emergence and decreased nodulation. To better understand the relatedness between nodule and lateral root development, we undertook a comparative analysis of these two root developmental programs. Here, we demonstrate that despite differential induction, lateral roots and nodules share overlapping developmental programs, with mutants in LOB-DOMAIN PROTEIN 16 (LBD16) showing equivalent defects in nodule and lateral root initiation. The cytokinin-inducible transcription factor NODULE INCEPTION (NIN) allows induction of this program during nodulation through activation of LBD16 that promotes auxin biosynthesis via transcriptional induction of STYLISH (STY) and YUCCAs (YUC). We conclude that cytokinin facilitates local auxin accumulation through NIN promotion of LBD16, which activates a nodule developmental program overlapping with that induced during lateral root initiation

    Parvovirus Minute Virus of Mice Induces a DNA Damage Response That Facilitates Viral Replication

    Get PDF
    Infection by DNA viruses can elicit DNA damage responses (DDRs) in host cells. In some cases the DDR presents a block to viral replication that must be overcome, and in other cases the infecting agent exploits the DDR to facilitate replication. We find that low multiplicity infection with the autonomous parvovirus minute virus of mice (MVM) results in the activation of a DDR, characterized by the phosphorylation of H2AX, Nbs1, RPA32, Chk2 and p53. These proteins are recruited to MVM replication centers, where they co-localize with the main viral replication protein, NS1. The response is seen in both human and murine cell lines following infection with either the MVMp or MVMi strains. Replication of the virus is required for DNA damage signaling. Damage response proteins, including the ATM kinase, accumulate in viral-induced replication centers. Using mutant cell lines and specific kinase inhibitors, we show that ATM is the main transducer of the signaling events in the normal murine host. ATM inhibitors restrict MVM replication and ameliorate virus-induced cell cycle arrest, suggesting that DNA damage signaling facilitates virus replication, perhaps in part by promoting cell cycle arrest. Thus it appears that MVM exploits the cellular DNA damage response machinery early in infection to enhance its replication in host cells

    The Intrinsic Antiviral Defense to Incoming HSV-1 Genomes Includes Specific DNA Repair Proteins and Is Counteracted by the Viral Protein ICP0

    Get PDF
    Cellular restriction factors responding to herpesvirus infection include the ND10 components PML, Sp100 and hDaxx. During the initial stages of HSV-1 infection, novel sub-nuclear structures containing these ND10 proteins form in association with incoming viral genomes. We report that several cellular DNA damage response proteins also relocate to sites associated with incoming viral genomes where they contribute to the cellular front line defense. We show that recruitment of DNA repair proteins to these sites is independent of ND10 components, and instead is coordinated by the cellular ubiquitin ligases RNF8 and RNF168. The viral protein ICP0 targets RNF8 and RNF168 for degradation, thereby preventing the deposition of repressive ubiquitin marks and counteracting this repair protein recruitment. This study highlights important parallels between recognition of cellular DNA damage and recognition of viral genomes, and adds RNF8 and RNF168 to the list of factors contributing to the intrinsic antiviral defense against herpesvirus infection
    • …
    corecore