26,246 research outputs found

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

    Low-Rank Boolean Matrix Approximation by Integer Programming

    Full text link
    Low-rank approximations of data matrices are an important dimensionality reduction tool in machine learning and regression analysis. We consider the case of categorical variables, where it can be formulated as the problem of finding low-rank approximations to Boolean matrices. In this paper we give what is to the best of our knowledge the first integer programming formulation that relies on only polynomially many variables and constraints, we discuss how to solve it computationally and report numerical tests on synthetic and real-world data

    Genet-CNV: Boolean Implication Networks for Modeling Genome-Wide Co-occurrence of DNA Copy Number Variations

    Get PDF
    Lung cancer is the leading cause of cancer-related death in the world. Lung cancer can be categorized as non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC makes up about 80% to 85% of lung cancer cases diagnosed, whereas SCLC is responsible for 10% to 15% of the cases. It remains a challenge for physicians to identify patients who shall benefit from chemotherapy. In such a scenario, identifying genes that can facilitate therapeutic target discoveries and better understanding disease mechanisms and their regulation in different stages of lung cancer, remains an important topic of research. In this thesis, we develop a computational framework for modelling molecular gene interaction networks, called Genet-CNV, to analyse gene interactions based on DNA Copy Number Variations (CNV). DNA copy number variation is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals in the human population. These variations can be used to study the activity of genes in cancerous cells, compared with that of the normal population. Genet-CNV uses Boolean implication networks to investigate genome-wide DNA CNV to identify relationships called rules, that could potentially lead to the identification of genes of significant biological interest. Boolean implication networks are probabilistic graphical models that express the relationship between two variables terms of six implication rules that can describe if the genes are co-amplified, co-deleted or differentially amplified and deleted. Genet-CNV is run on three publicly available NSCLC genomic datasets. We further evaluate the results obtained with Genet-CNV by comparing them with the benchmark dataset, The Molecular Signatures Database (MSigDB). We identified several genes of interest that are present in survival, apoptosis, proliferation and immunologic pathways. The relationships obtained from this analysis can be tested for biological validations, or to confirm experimental results, thus facilitating the identification of genes playing a significant role in the causation and progress of NSCLC

    What fosters or prevents interprofessional teamworking in primary and community care? A literature review

    Get PDF
    Background: The increase in prevalence of long-term conditions in Western societies, with the subsequent need for non-acute quality patient healthcare, has brought the issue of collaboration between health professionals to the fore. Within primary care, it has been suggested that multidisciplinary teamworking is essential to develop an integrated approach to promoting and maintaining the health of the population whilst improving service effectiveness. Although it is becoming widely accepted that no single discipline can provide complete care for patients with a long-term condition, in practice, interprofessional working is not always achieved. Objectives: This review aimed to explore the factors that inhibit or facilitate interprofessional teamworking in primary and community care settings, in order to inform development of multidisciplinary working at the turn of the century. Design: A comprehensive search of the literature was undertaken using a variety of approaches to identify appropriate literature for inclusion in the study. The selected articles used both qualitative and quantitative research methods. Findings: Following a thematic analysis of the literature, two main themes emerged that had an impact on interprofessional teamworking: team structure and team processes. Within these two themes, six categories were identified: team premises; team size and composition; organisational support; team meetings; clear goals and objectives; and audit. The complex nature of interprofessional teamworking in primary care meant that despite teamwork being an efficient and productive way of achieving goals and results, several barriers exist that hinder its potential from becoming fully exploited; implications and recommendations for practice are discussed. Conclusions: These findings can inform development of current best practice, although further research needs to be conducted into multidisciplinary teamworking at both the team and organisation level, to ensure that enhancement and maintenance of teamwork leads to an improved quality of healthcare provision. © 2007 Elsevier Ltd. All rights reserved

    Feature subset selection: a correlation based filter approach

    Get PDF
    Recent work has shown that feature subset selection can have a position affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance adversely affected by too much data some of which may be irrelevant or redundant to the learning task. Feature subset selection, then, is a method of enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based heuristic to determine the goodness of feature subsets, and evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner(IBI). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees

    Quantitative information flow under generic leakage functions and adaptive adversaries

    Full text link
    We put forward a model of action-based randomization mechanisms to analyse quantitative information flow (QIF) under generic leakage functions, and under possibly adaptive adversaries. This model subsumes many of the QIF models proposed so far. Our main contributions include the following: (1) we identify mild general conditions on the leakage function under which it is possible to derive general and significant results on adaptive QIF; (2) we contrast the efficiency of adaptive and non-adaptive strategies, showing that the latter are as efficient as the former in terms of length up to an expansion factor bounded by the number of available actions; (3) we show that the maximum information leakage over strategies, given a finite time horizon, can be expressed in terms of a Bellman equation. This can be used to compute an optimal finite strategy recursively, by resorting to standard methods like backward induction.Comment: Revised and extended version of conference paper with the same title appeared in Proc. of FORTE 2014, LNC
    corecore