11 research outputs found

    Trading-off Data Fit and Complexity in Training Gaussian Processes with Multiple Kernels

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this recordLOD 2019: Fifth International Conference on Machine Learning, Optimization, and Data Science, 10-13 September 2019, Siena, ItalyGaussian processes (GPs) belong to a class of probabilistic techniques that have been successfully used in different domains of machine learning and optimization. They are popular because they provide uncertainties in predictions, which sets them apart from other modelling methods providing only point predictions. The uncertainty is particularly useful for decision making as we can gauge how reliable a prediction is. One of the fundamental challenges in using GPs is that the efficacy of a model is conferred by selecting an appropriate kernel and the associated hyperparameter values for a given problem. Furthermore, the training of GPs, that is optimizing the hyperparameters using a data set is traditionally performed using a cost function that is a weighted sum of data fit and model complexity, and the underlying trade-off is completely ignored. Addressing these challenges and shortcomings, in this article, we propose the following automated training scheme. Firstly, we use a weighted product of multiple kernels with a view to relieve the users from choosing an appropriate kernel for the problem at hand without any domain specific knowledge. Secondly, for the first time, we modify GP training by using a multi-objective optimizer to tune the hyperparameters and weights of multiple kernels and extract an approximation of the complete trade-off front between data-fit and model complexity. We then propose to use a novel solution selection strategy based on mean standardized log loss (MSLL) to select a solution from the estimated trade-off front and finalise training of a GP model. The results on three data sets and comparison with the standard approach clearly show the potential benefit of the proposed approach of using multi-objective optimization with multiple kernels.Natural Environment Research Council (NERC

    Understanding the context of male and transgender sex work using peer ethnography.

    No full text
    OBJECTIVES: To distinguish between three distinct groups of male and transgender sex workers in Pakistan and to demonstrate how members of these stigmatized groups need to be engaged in the research process to go beyond stated norms of behaviour. METHODS: A peer ethnography study was undertaken in a major city in Pakistan. 15 male and 15 transgender sex workers were trained as peer researchers to each interview three peers in their network. Analysis was based on interviews with peer researchers as well as observation of dynamics during training and analysis workshops. RESULTS: The research process revealed that, within the epidemiological category of biological males who sell sex, there are three sociologically different sexual identities: khusras (transgender), khotkis (feminized males) and banthas (mainstream male identity). Both khusras and khotkis are organised in strong social structures based on a shared identity. While these networks provide emotional and material support, they also come with rigid group norms based on expected "feminine" behaviours. In everyday reality, sex workers showed fluidity in both behaviour and identity according to the situational context, transgressing both wider societal and group norms. The informal observational component in peer ethnography was crucial for the accurate interpretation of interview data. Participant accounts of behaviour and relationships are shaped by the research contexts including who interviews them, at what stage of familiarity and who may overhear the conversation. CONCLUSIONS: To avoid imposing a "false clarity" on categorisation of identity and assumed behaviour, it is necessary to go beyond verbal accounts to document the fluidity of everyday reality

    Conserved residue clusters at protein-protein interfaces and their use in binding site identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological evolution conserves protein residues that are important for structure and function. Both protein stability and function often require a certain degree of structural co-operativity between spatially neighboring residues and it has previously been shown that conserved residues occur clustered together in protein tertiary structures, enzyme active sites and protein-DNA interfaces. Residues comprising protein interfaces are often more conserved compared to those occurring elsewhere on the protein surface. We investigate the extent to which conserved residues within protein-protein interfaces are clustered together in three-dimensions.</p> <p>Results</p> <p>Out of 121 and 392 interfaces in homodimers and heterocomplexes, 96.7 and 86.7%, respectively, have the conserved positions clustered within the overall interface region. The significance of this clustering was established in comparison to what is seen for the subsets of the same size of randomly selected residues from the interface. Conserved residues occurring in larger interfaces could often be sub-divided into two or more distinct sub-clusters. These structural cluster(s) comprising conserved residues indicate functionally important regions within the protein-protein interface that can be targeted for further structural and energetic analysis by experimental scanning mutagenesis. Almost 60% of experimental hot spot residues (with ΔΔG > 2 kcal/mol) were localized to these conserved residue clusters. An analysis of the residue types that are enriched within these conserved subsets compared to the overall interface showed that hydrophobic and aromatic residues are favored, but charged residues (both positive and negative) are less common. The potential use of this method for discriminating binding sites (interfaces) versus random surface patches was explored by comparing the clustering of conserved residues within each of these regions - in about 50% cases the true interface is ranked among the top 10% of all surface patches.</p> <p>Conclusions</p> <p>Protein-protein interaction sites are much larger than small molecule biding sites, but still conserved residues are not randomly distributed over the whole interface and are distinctly clustered. The clustered nature of evolutionarily conserved residues within interfaces as compared to those within other surface patches not involved in binding has important implications for the identification of protein-protein binding sites and would have applications in docking studies.</p

    A mechanistic integrative computational model of macrophage polarization: Implications in human pathophysiology

    No full text

    Genome-wide association study identifies five new susceptibility loci for primary angle closure glaucoma.

    No full text
    Primary angle closure glaucoma (PACG) is a major cause of blindness worldwide. We conducted a genome-wide association study (GWAS) followed by replication in a combined total of 10,503 PACG cases and 29,567 controls drawn from 24 countries across Asia, Australia, Europe, North America, and South America. We observed significant evidence of disease association at five new genetic loci upon meta-analysis of all patient collections. These loci are at EPDR1 rs3816415 (odds ratio (OR) = 1.24, P = 5.94 Ă— 10(-15)), CHAT rs1258267 (OR = 1.22, P = 2.85 Ă— 10(-16)), GLIS3 rs736893 (OR = 1.18, P = 1.43 Ă— 10(-14)), FERMT2 rs7494379 (OR = 1.14, P = 3.43 Ă— 10(-11)), and DPM2-FAM102A rs3739821 (OR = 1.15, P = 8.32 Ă— 10(-12)). We also confirmed significant association at three previously described loci (P < 5 Ă— 10(-8) for each sentinel SNP at PLEKHA7, COL11A1, and PCMTD1-ST18), providing new insights into the biology of PACG
    corecore