21,992 research outputs found
An intelligent assistant for exploratory data analysis
In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm
Bibliometric Perspectives on Medical Innovation using the Medical Subject Headings (MeSH) of PubMed
Multiple perspectives on the nonlinear processes of medical innovations can
be distinguished and combined using the Medical Subject Headings (MeSH) of the
Medline database. Focusing on three main branches-"diseases," "drugs and
chemicals," and "techniques and equipment"-we use base maps and overlay
techniques to investigate the translations and interactions and thus to gain a
bibliometric perspective on the dynamics of medical innovations. To this end,
we first analyze the Medline database, the MeSH index tree, and the various
options for a static mapping from different perspectives and at different
levels of aggregation. Following a specific innovation (RNA interference) over
time, the notion of a trajectory which leaves a signature in the database is
elaborated. Can the detailed index terms describing the dynamics of research be
used to predict the diffusion dynamics of research results? Possibilities are
specified for further integration between the Medline database, on the one
hand, and the Science Citation Index and Scopus (containing citation
information), on the other.Comment: forthcoming in the Journal of the American Society for Information
Science and Technolog
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
Using Visualization to Support Data Mining of Large Existing Databases
In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database
- …