124 research outputs found

    Data abstractions for decision tree induction

    Get PDF
    AbstractWhen descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse.From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user.By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction.We show some experimental results obtained by a system ITA that is an implementation of our abstraction method. From those results, it is verified that our method is very effective in reducing the size of detected decision tree without making classification errors so worse

    Discovery of hidden correlations in a local transaction database based on differences of correlations

    Get PDF
    Abstract. Given a transaction database as a global set of transactions and its sub-database regarded as a local one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. If they show high correlation in the local database, they are detectable by some search methods of previous studies. On the other hand, there exist another kind of paired itemsets such that they are not regarded as characteristic and cannot be found by the methods of previous studies but that their degrees of correlations become drastically higher by the conditioning to the local database. We pay much attention to the latter kind of paired itemsets, as such pairs of itemsets can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones. From this viewpoint, we measure paired itemsets by a difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As the measure is non-monotonic, we present an algorithm, searching for DC pairs, with some new pruning rules for cutting off hopeless itemsets. We show by an experimental result that potentially significant DC pairs can be actually found for a given database and the algorithm successfully detects such DC pairs

    Novel Automated Blood Separations Validate Whole Cell Biomarkers

    Get PDF
    Progress in clinical trials in infectious disease, autoimmunity, and cancer is stymied by a dearth of successful whole cell biomarkers for peripheral blood lymphocytes (PBLs). Successful biomarkers could help to track drug effects at early time points in clinical trials to prevent costly trial failures late in development. One major obstacle is the inaccuracy of Ficoll density centrifugation, the decades-old method of separating PBLs from the abundant red blood cells (RBCs) of fresh blood samples.To replace the Ficoll method, we developed and studied a novel blood-based magnetic separation method. The magnetic method strikingly surpassed Ficoll in viability, purity and yield of PBLs. To reduce labor, we developed an automated platform and compared two magnet configurations for cell separations. These more accurate and labor-saving magnet configurations allowed the lymphocytes to be tested in bioassays for rare antigen-specific T cells. The automated method succeeded at identifying 79% of patients with the rare PBLs of interest as compared with Ficoll's uniform failure. We validated improved upfront blood processing and show accurate detection of rare antigen-specific lymphocytes.Improving, automating and standardizing lymphocyte detections from whole blood may facilitate development of new cell-based biomarkers for human diseases. Improved upfront blood processes may lead to broad improvements in monitoring early trial outcome measurements in human clinical trials

    Multimodality imaging to identify lipid-rich coronary plaques and predict periprocedural myocardial injury: Association between near-infrared spectroscopy and coronary computed tomography angiography

    Get PDF
    BackgroundThis study compares the efficacy of coronary computed tomography angiography (CCTA) and near-infrared spectroscopy intravascular ultrasound (NIRS–IVUS) in patients with significant coronary stenosis for predicting periprocedural myocardial injury during percutaneous coronary intervention (PCI).MethodsWe prospectively enrolled 107 patients who underwent CCTA before PCI and performed NIRS–IVUS during PCI. Based on the maximal lipid core burden index for any 4-mm longitudinal segments (maxLCBI4mm) in the culprit lesion, we divided the patients into two groups: lipid-rich plaque (LRP) group (maxLCBI4mm ≥ 400; n = 48) and no-LRP group (maxLCBI4mm < 400; n = 59). Periprocedural myocardial injury was a postprocedural cardiac troponin T (cTnT) elevation of ≥5 times the upper limit of normal.ResultsThe LRP group had a significantly higher cTnT (p = 0.026), lower CT density (p < 0.001), larger percentage atheroma volume (PAV) by NIRS–IVUS (p = 0.036), and larger remodeling index measured by both CCTA (p = 0.020) and NIRS–IVUS (p < 0.001). A significant negative linear correlation was found between maxLCBI4mm and CT density (rho = −0.552, p < 0.001). Multivariable logistic regression analysis identified maxLCBI4mm [odds ratio (OR): 1.006, p = 0.003] and PAV (OR: 1.125, p = 0.014) as independent predictors of periprocedural myocardial injury, while CT density was not an independent predictor (OR: 0.991, p = 0.22).ConclusionCCTA and NIRS–IVUS correlated well to identify LRP in culprit lesions. However, NIRS–IVUS was more competent in predicting the risk of periprocedural myocardial injury
    • …
    corecore