21,005 research outputs found
Using Visualization to Support Data Mining of Large Existing Databases
In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database
Recommended from our members
Machine Learning Framework to Identify Individuals at Risk of Rapid Progression of Coronary Atherosclerosis: From the PARADIGM Registry.
Background Rapid coronary plaque progression (RPP) is associated with incident cardiovascular events. To date, no method exists for the identification of individuals at risk of RPP at a single point in time. This study integrated coronary computed tomography angiography-determined qualitative and quantitative plaque features within a machine learning (ML) framework to determine its performance for predicting RPP. Methods and Results Qualitative and quantitative coronary computed tomography angiography plaque characterization was performed in 1083 patients who underwent serial coronary computed tomography angiography from the PARADIGM (Progression of Atherosclerotic Plaque Determined by Computed Tomographic Angiography Imaging) registry. RPP was defined as an annual progression of percentage atheroma volume ≥1.0%. We employed the following ML models: model 1, clinical variables; model 2, model 1 plus qualitative plaque features; model 3, model 2 plus quantitative plaque features. ML models were compared with the atherosclerotic cardiovascular disease risk score, Duke coronary artery disease score, and a logistic regression statistical model. 224 patients (21%) were identified as RPP. Feature selection in ML identifies that quantitative computed tomography variables were higher-ranking features, followed by qualitative computed tomography variables and clinical/laboratory variables. ML model 3 exhibited the highest discriminatory performance to identify individuals who would experience RPP when compared with atherosclerotic cardiovascular disease risk score, the other ML models, and the statistical model (area under the receiver operating characteristic curve in ML model 3, 0.83 [95% CI 0.78-0.89], versus atherosclerotic cardiovascular disease risk score, 0.60 [0.52-0.67]; Duke coronary artery disease score, 0.74 [0.68-0.79]; ML model 1, 0.62 [0.55-0.69]; ML model 2, 0.73 [0.67-0.80]; all P<0.001; statistical model, 0.81 [0.75-0.87], P=0.128). Conclusions Based on a ML framework, quantitative atherosclerosis characterization has been shown to be the most important feature when compared with clinical, laboratory, and qualitative measures in identifying patients at risk of RPP
- …