82 research outputs found

    Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules

    Full text link
    Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape

    Non-stationary covariance function modelling in 2D least-squares collocation

    Get PDF
    Standard least-squares collocation (LSC) assumes 2D stationarity and 3D isotropy, and relies on a covariance function to account for spatial dependence in the ob-served data. However, the assumption that the spatial dependence is constant through-out the region of interest may sometimes be violated. Assuming a stationary covariance structure can result in over-smoothing of, e.g., the gravity field in mountains and under-smoothing in great plains. We introduce the kernel convolution method from spatial statistics for non-stationary covariance structures, and demonstrate its advantage fordealing with non-stationarity in geodetic data. We then compared stationary and non-stationary covariance functions in 2D LSC to the empirical example of gravity anomaly interpolation near the Darling Fault, Western Australia, where the field is anisotropic and non-stationary. The results with non-stationary covariance functions are better than standard LSC in terms of formal errors and cross-validation against data not used in the interpolation, demonstrating that the use of non-stationary covariance functions can improve upon standard (stationary) LSC

    InterPro: the integrative protein signature database

    Get PDF
    The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/

    InterPro in 2011: new developments in the family and domain prediction database

    Get PDF
    InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interface

    Impact of renal impairment on atrial fibrillation: ESC-EHRA EORP-AF Long-Term General Registry

    Get PDF
    Background: Atrial fibrillation (AF) and renal impairment share a bidirectional relationship with important pathophysiological interactions. We evaluated the impact of renal impairment in a contemporary cohort of patients with AF. Methods: We utilised the ESC-EHRA EORP-AF Long-Term General Registry. Outcomes were analysed according to renal function by CKD-EPI equation. The primary endpoint was a composite of thromboembolism, major bleeding, acute coronary syndrome and all-cause death. Secondary endpoints were each of these separately including ischaemic stroke, haemorrhagic event, intracranial haemorrhage, cardiovascular death and hospital admission. Results: A total of 9306 patients were included. The distribution of patients with no, mild, moderate and severe renal impairment at baseline were 16.9%, 49.3%, 30% and 3.8%, respectively. AF patients with impaired renal function were older, more likely to be females, had worse cardiac imaging parameters and multiple comorbidities. Among patients with an indication for anticoagulation, prescription of these agents was reduced in those with severe renal impairment, p <.001. Over 24 months, impaired renal function was associated with significantly greater incidence of the primary composite outcome and all secondary outcomes. Multivariable Cox regression analysis demonstrated an inverse relationship between eGFR and the primary outcome (HR 1.07 [95% CI, 1.01–1.14] per 10 ml/min/1.73 m2 decrease), that was most notable in patients with eGFR <30 ml/min/1.73 m2 (HR 2.21 [95% CI, 1.23–3.99] compared to eGFR ≥90 ml/min/1.73 m2). Conclusion: A significant proportion of patients with AF suffer from concomitant renal impairment which impacts their overall management. Furthermore, renal impairment is an independent predictor of major adverse events including thromboembolism, major bleeding, acute coronary syndrome and all-cause death in patients with AF

    Clinical complexity and impact of the ABC (Atrial fibrillation Better Care) pathway in patients with atrial fibrillation: a report from the ESC-EHRA EURObservational Research Programme in AF General Long-Term Registry

    Get PDF
    Background: Clinical complexity is increasingly prevalent among patients with atrial fibrillation (AF). The ‘Atrial fibrillation Better Care’ (ABC) pathway approach has been proposed to streamline a more holistic and integrated approach to AF care; however, there are limited data on its usefulness among clinically complex patients. We aim to determine the impact of ABC pathway in a contemporary cohort of clinically complex AF patients. Methods: From the ESC-EHRA EORP-AF General Long-Term Registry, we analysed clinically complex AF patients, defined as the presence of frailty, multimorbidity and/or polypharmacy. A K-medoids cluster analysis was performed to identify different groups of clinical complexity. The impact of an ABC-adherent approach on major outcomes was analysed through Cox-regression analyses and delay of event (DoE) analyses. Results: Among 9966 AF patients included, 8289 (83.1%) were clinically complex. Adherence to the ABC pathway in the clinically complex group reduced the risk of all-cause death (adjusted HR [aHR]: 0.72, 95%CI 0.58–0.91), major adverse cardiovascular events (MACEs; aHR: 0.68, 95%CI 0.52–0.87) and composite outcome (aHR: 0.70, 95%CI: 0.58–0.85). Adherence to the ABC pathway was associated with a significant reduction in the risk of death (aHR: 0.74, 95%CI 0.56–0.98) and composite outcome (aHR: 0.76, 95%CI 0.60–0.96) also in the high-complexity cluster; similar trends were observed for MACEs. In DoE analyses, an ABC-adherent approach resulted in significant gains in event-free survival for all the outcomes investigated in clinically complex patients. Based on absolute risk reduction at 1 year of follow-up, the number needed to treat for ABC pathway adherence was 24 for all-cause death, 31 for MACEs and 20 for the composite outcome. Conclusions: An ABC-adherent approach reduces the risk of major outcomes in clinically complex AF patients. Ensuring adherence to the ABC pathway is essential to improve clinical outcomes among clinically complex AF patients

    Impact of clinical phenotypes on management and outcomes in European atrial fibrillation patients: a report from the ESC-EHRA EURObservational Research Programme in AF (EORP-AF) General Long-Term Registry

    Get PDF
    Background: Epidemiological studies in atrial fibrillation (AF) illustrate that clinical complexity increase the risk of major adverse outcomes. We aimed to describe European AF patients\u2019 clinical phenotypes and analyse the differential clinical course. Methods: We performed a hierarchical cluster analysis based on Ward\u2019s Method and Squared Euclidean Distance using 22 clinical binary variables, identifying the optimal number of clusters. We investigated differences in clinical management, use of healthcare resources and outcomes in a cohort of European AF patients from a Europe-wide observational registry. Results: A total of 9363 were available for this analysis. We identified three clusters: Cluster 1 (n = 3634; 38.8%) characterized by older patients and prevalent non-cardiac comorbidities; Cluster 2 (n = 2774; 29.6%) characterized by younger patients with low prevalence of comorbidities; Cluster 3 (n = 2955;31.6%) characterized by patients\u2019 prevalent cardiovascular risk factors/comorbidities. Over a mean follow-up of 22.5 months, Cluster 3 had the highest rate of cardiovascular events, all-cause death, and the composite outcome (combining the previous two) compared to Cluster 1 and Cluster 2 (all P <.001). An adjusted Cox regression showed that compared to Cluster 2, Cluster 3 (hazard ratio (HR) 2.87, 95% confidence interval (CI) 2.27\u20133.62; HR 3.42, 95%CI 2.72\u20134.31; HR 2.79, 95%CI 2.32\u20133.35), and Cluster 1 (HR 1.88, 95%CI 1.48\u20132.38; HR 2.50, 95%CI 1.98\u20133.15; HR 2.09, 95%CI 1.74\u20132.51) reported a higher risk for the three outcomes respectively. Conclusions: In European AF patients, three main clusters were identified, differentiated by differential presence of comorbidities. Both non-cardiac and cardiac comorbidities clusters were found to be associated with an increased risk of major adverse outcomes

    The Compact Linear Collider (CLIC) - 2018 Summary Report

    Get PDF

    Elements About Exploratory, Knowledge-Based, Hybrid, and Explainable Knowledge Discovery

    Get PDF
    International audienceKnowledge Discovery in Databases (KDD) and especially pattern mining can be interpreted along several dimensions, namely data, knowledge, problem-solving and interactivity. These dimensions are not disconnected and have a direct impact on the quality, applicability, and efficiency of KDD. Accordingly, we discuss some objectives of KDD based on these dimensions, namely exploration, knowledge orientation, hybridization, and explanation. The data space and the pattern space can be explored in several ways, depending on specific evaluation functions and heuristics, possibly related to domain knowledge. Furthermore, numerical data are complex and supervised numerical machine learning methods are usually the best candidates for efficiently mining such data. However, the work and output of numerical methods are most of the time hard to understand, while symbolic methods are usually more intelligible. This calls for hybridization, combining numerical and symbolic mining methods to improve the applicability and interpretability of KDD. Moreover, suitable explanations about the operating models and possible subsequent decisions should complete KDD, and this is far from being the case at the moment. For illustrating these dimensions and objectives, we analyze a concrete case about the mining of biological data, where we characterize these dimensions and their connections. We also discuss dimensions and objectives in the framework of Formal Concept Analysis and we draw some perspectives for future research
    corecore