219 research outputs found

    A two-step learning approach for solving full and almost full cold start problems in dyadic prediction

    Full text link
    Dyadic prediction methods operate on pairs of objects (dyads), aiming to infer labels for out-of-sample dyads. We consider the full and almost full cold start problem in dyadic prediction, a setting that occurs when both objects in an out-of-sample dyad have not been observed during training, or if one of them has been observed, but very few times. A popular approach for addressing this problem is to train a model that makes predictions based on a pairwise feature representation of the dyads, or, in case of kernel methods, based on a tensor product pairwise kernel. As an alternative to such a kernel approach, we introduce a novel two-step learning algorithm that borrows ideas from the fields of pairwise learning and spectral filtering. We show theoretically that the two-step method is very closely related to the tensor product kernel approach, and experimentally that it yields a slightly better predictive performance. Moreover, unlike existing tensor product kernel methods, the two-step method allows closed-form solutions for training and parameter selection via cross-validation estimates both in the full and almost full cold start settings, making the approach much more efficient and straightforward to implement

    Supervised selective kernel fusion for membrane protein prediction

    Get PDF
    Membrane protein prediction is a significant classification problem, requiring the integration of data derived from different sources such as protein sequences, gene expression, protein interactions etc. A generalized probabilistic approach for combining different data sources via supervised selective kernel fusion was proposed in our previous papers. It includes, as particular cases, SVM, Lasso SVM, Elastic Net SVM and others. In this paper we apply a further instantiation of this approach, the Supervised Selective Support Kernel SVM and demonstrate that the proposed approach achieves the top-rank position among the selective kernel fusion variants on benchmark data for membrane protein prediction. The method differs from the previous approaches in that it naturally derives a subset of “support kernels” (analogous to support objects within SVMs), thereby allowing the memory-efficient exclusion of significant numbers of irrelevant kernel matrixes from a decision rule in a manner particularly suited to membrane protein prediction

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    State stigmatization in urban Turkey : Managing the 'insurgent' squatter dwellers in Dikmen Valley

    Get PDF
    This paper contributes to the accounts of territorial stigmatisation by examining the state role in it in the case of Turkey, a country that suffers from growing state power. The existing debates are mainly restricted to its function as an economic strategy paving the way for capital accumulation through devaluing working‐class people and places. Drawing on textual analysis of political speeches, local newsletters and mainstream national newspapers and fieldwork material that include interviews and observations in Dikmen Valley where some squatter communities mobilised against the state‐imposed urban transformation project, I demonstrate that state conceptualisation of “problem people” targets the “insurgent” rather than the “unprofitable” groups. Stigma in urban settings functions in inciting the desire to meet the patterns deemed appropriate by the state, rather than the market. Moving from that, I argue that stigma is used as a state‐led political strategy, which is integral to the growing authoritarianism in Turkey

    Kernelized Multiview Projection for Robust Action Recognition

    Get PDF
    Conventional action recognition algorithms adopt a single type of feature or a simple concatenation of multiple features. In this paper, we propose to better fuse and embed different feature representations for action recognition using a novel spectral coding algorithm called Kernelized Multiview Projection (KMP). Computing the kernel matrices from different features/views via time-sequential distance learning, KMP can encode different features with different weights to achieve a low-dimensional and semantically meaningful subspace where the distribution of each view is sufficiently smooth and discriminative. More crucially, KMP is linear for the reproducing kernel Hilbert space, which allows it to be competent for various practical applications. We demonstrate KMP’s performance for action recognition on five popular action datasets and the results are consistently superior to state-of-the-art techniques

    Epigenetic-focused CRISPR/Cas9 screen identifies (absent, small, or homeotic)2-like protein (ASH2L) as a regulator of glioblastoma cell survival

    Get PDF
    Background: Glioblastoma is the most common and aggressive primary brain tumor with extremely poor prognosis, highlighting an urgent need for developing novel treatment options. Identifying epigenetic vulnerabilities of cancer cells can provide excellent therapeutic intervention points for various types of cancers. Method: In this study, we investigated epigenetic regulators of glioblastoma cell survival through CRISPR/Cas9 based genetic ablation screens using a customized sgRNA library EpiDoKOL, which targets critical functional domains of chromatin modifiers. Results: Screens conducted in multiple cell lines revealed ASH2L, a histone lysine methyltransferase complex subunit, as a major regulator of glioblastoma cell viability. ASH2L depletion led to cell cycle arrest and apoptosis. RNA sequencing and greenCUT&RUN together identified a set of cell cycle regulatory genes, such as TRA2B, BARD1, KIF20B, ARID4A and SMARCC1 that were downregulated upon ASH2L depletion. Mass spectrometry analysis revealed the interaction partners of ASH2L in glioblastoma cell lines as SET1/MLL family members including SETD1A, SETD1B, MLL1 and MLL2. We further showed that glioblastoma cells had a differential dependency on expression of SET1/MLL family members for survival. The growth of ASH2L-depleted glioblastoma cells was markedly slower than controls in orthotopic in vivo models. TCGA analysis showed high ASH2L expression in glioblastoma compared to low grade gliomas and immunohistochemical analysis revealed significant ASH2L expression in glioblastoma tissues, attesting to its clinical relevance. Therefore, high throughput, robust and affordable screens with focused libraries, such as EpiDoKOL, holds great promise to enable rapid discovery of novel epigenetic regulators of cancer cell survival, such as ASH2L. Conclusion: Together, we suggest that targeting ASH2L could serve as a new therapeutic opportunity for glioblastoma

    Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

    Get PDF
    Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives

    Time to Recurrence and Survival in Serous Ovarian Tumors Predicted from Integrated Genomic Profiles

    Get PDF
    Serous ovarian cancer (SeOvCa) is an aggressive disease with differential and often inadequate therapeutic outcome after standard treatment. The Cancer Genome Atlas (TCGA) has provided rich molecular and genetic profiles from hundreds of primary surgical samples. These profiles confirm mutations of TP53 in ∌100% of patients and an extraordinarily complex profile of DNA copy number changes with considerable patient-to-patient diversity. This raises the joint challenge of exploiting all new available datasets and reducing their confounding complexity for the purpose of predicting clinical outcomes and identifying disease relevant pathway alterations. We therefore set out to use multi-data type genomic profiles (mRNA, DNA methylation, DNA copy-number alteration and microRNA) available from TCGA to identify prognostic signatures for the prediction of progression-free survival (PFS) and overall survival (OS). prediction algorithm and applied it to two datasets integrated from the four genomic data types. We (1) selected features through cross-validation; (2) generated a prognostic index for patient risk stratification; and (3) directly predicted continuous clinical outcome measures, that is, the time to recurrence and survival time. We used Kaplan-Meier p-values, hazard ratios (HR), and concordance probability estimates (CPE) to assess prediction performance, comparing separate and integrated datasets. Data integration resulted in the best PFS signature (withheld data: p-value = 0.008; HR = 2.83; CPE = 0.72).We provide a prediction tool that inputs genomic profiles of primary surgical samples and generates patient-specific predictions for the time to recurrence and survival, along with outcome risk predictions. Using integrated genomic profiles resulted in information gain for prediction of outcomes. Pathway analysis provided potential insights into functional changes affecting disease progression. The prognostic signatures, if prospectively validated, may be useful for interpreting therapeutic outcomes for clinical trials that aim to improve the therapy for SeOvCa patients

    Scuba:Scalable kernel-based gene prioritization

    Get PDF
    Abstract Background The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. Results We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Conclusions Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba
    • 

    corecore