6,524 research outputs found

    Characteristics of predictor sets found using differential prioritization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Feature selection plays an undeniably important role in classification problems involving high dimensional datasets such as microarray datasets. For filter-based feature selection, two well-known criteria used in forming predictor sets are relevance and redundancy. However, there is a third criterion which is at least as important as the other two in affecting the efficacy of the resulting predictor sets. This criterion is the degree of differential prioritization (DDP), which varies the emphases on relevance and redundancy depending on the value of the DDP. Previous empirical works on publicly available microarray datasets have confirmed the effectiveness of the DDP in molecular classification. We now propose to establish the fundamental strengths and merits of the DDP-based feature selection technique. This is to be done through a simulation study which involves vigorous analyses of the characteristics of predictor sets found using different values of the DDP from toy datasets designed to mimic real-life microarray datasets.</p> <p>Results</p> <p>A simulation study employing analytical measures such as the distance between classes before and after transformation using principal component analysis is implemented on toy datasets. From these analyses, the necessity of adjusting the differential prioritization based on the dataset of interest is established. This conclusion is supported by comparisons against both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods, which demonstrates the superiority of the DDP-based feature selection technique. Reapplying similar analyses to real-life multiclass microarray datasets provides further confirmation of our findings and of the significance of the DDP for practical applications.</p> <p>Conclusion</p> <p>The findings have been achieved based on analytical evaluations, not empirical evaluation involving classifiers, thus providing further basis for the usefulness of the DDP and validating the need for unequal priorities on relevance and redundancy during feature selection for microarray datasets, especially highly multiclass datasets.</p

    A fuzzy-QFD approach for the enhancement of work equipment safety: a case study in the agriculture sector

    Get PDF
    The paper proposes a design for safety methodology based on the use of the Quality Function Deployment (QFD) method, focusing on the need to identify and analyse risks related to a working task in an effective manner, i.e. considering the specific work activities related to such a task. To reduce the drawbacks of subjectivity while augmenting the consistency of judgements, the QFD was augmented by both the Delphi method and the fuzzy logic approach. To verify such an approach, it was implemented through a case study in the agricultural sector. While the proposed approach needs to be validated through further studies in different contexts, its positive results in performing hazard analysis and risk assessment in a comprehensive and thorough manner can contribute practically to the scientific knowledge on the application of QFD in design for safety activities

    Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data

    Get PDF
    BACKGROUND: Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. RESULTS: We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. CONCLUSION: For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures

    Can’t Take a Joke? The Asymmetrical Nature of the Politicized Sense of Humor

    Get PDF
    In an effort to tease out possible expressions of dispositional differences in people of different political ideologies, this study uses media preference and consumption data from the 2008 National Annenberg Election Survey (NAES08-Online) to examine characteristics of audiences for a range of television shows and genres. The individual shows include two political satires, The Daily Show with Jon Stewart, and The Colbert Report; a late-night comedy/variety show, The Tonight Show with Jay Leno; a hospital-based ensemble situation comedy, Scrubs; two animated comedies, The Simpsons, and The Family Guy; and two action-oriented dramas, 24, and CSI: Miami. The genres include comedies, dramas, sports and documentaries. The results of a series of one-way ANOVAs and regression analyses supported the hypotheses that conservatives do not enjoy humor as much as liberals, and that they enjoy political humor even less than non-political humor

    GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

    Full text link
    Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists' capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets. In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets' ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets. Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.Comment: Presented in the International Conference on Intelligent Biology and Medicine (ICIBM 2018) at Los Angeles, CA, USA and published in BMC Systems Biology 2018, 12(Suppl 8):14

    Reactive stroma and trastuzumab resistance in HER2-positive early breast cancer

    Get PDF
    We investigated the value of reactive stroma as a predictor for trastuzumab resistance in patients with early HER2-positive breast cancer receiving adjuvant therapy. The pathological reactive stroma and the mRNA gene signatures that reflect reactive stroma in 209 HER2-positive breast cancer samples from the FinHer adjuvant trial were evaluated. Levels of stromal gene signatures were determined as a continuous parameter, and pathological reactive stromal findings were defined as stromal predominant breast cancer (SPBC; >= 50% stromal) and correlated with distant disease-free survival. Gene signatures associated with reactive stroma in HER2-positive early breast cancer (N = 209) were significantly associated with trastuzumab resistance in estrogen receptor (ER)-negative tumors (hazard ratio [HR] = 1.27 p interaction = 0.014 [DCN], HR = 1.58, p interaction = 0.027 [PLAU], HR = 1.71, p interaction = 0.019 [HER2STROMA, novel HER2 stromal signature]), but not in ER-positive tumors (HR = 0.73 p interaction = 0.47 [DCN], HR = 0.71, p interaction = 0.73 [PLAU], HR = 0.84; p interaction = 0.36 [HER2STROMA]). Pathological evaluation of HER2-positive/ER-negative tumors suggested an association between SPBC and trastuzumab resistance. Reactive stroma did not correlate with tumor-infiltrating lymphocytes (TILs), and the expected benefit from trastuzumab in patients with high levels of TILs was pronounced only in tumors with low stromal reactivity (SPBCPeer reviewe

    Mapping evolutionary process: a multi-taxa approach to conservation prioritization

    Get PDF
    Human-induced land use changes are causing extensive habitat fragmentation. As a result, many species are not able to shift their ranges in response to climate change and will likely need to adapt in situ to changing climate conditions. Consequently, a prudent strategy to maintain the ability of populations to adapt is to focus conservation efforts on areas where levels of intraspecific variation are high. By doing so, the potential for an evolutionary response to environmental change is maximized. Here, we use modeling approaches in conjunction with environmental variables to model species distributions and patterns of genetic and morphological variation in seven Ecuadorian amphibian, bird, and mammal species. We then used reserve selection software to prioritize areas for conservation based on intraspecific variation or species-level diversity. Reserves selected using species richness and complementarity showed little overlap with those based on genetic and morphological variation. Priority areas for intraspecific variation were mainly located along the slopes of the Andes and were largely concordant among species, but were not well represented in existing reserves. Our results imply that in order to maximize representation of intraspecific variation in reserves, genetic and morphological variation should be included in conservation prioritization
    • …
    corecore