62,121 research outputs found

    Lightweight XML-based query, integration and visualization of distributed, multimodality brain imaging data

    Get PDF
    A need of many neuroimaging researchers is to integrate multimodality brain data that may be stored in separate databases. To address this need we have developed a framework that provides a uniform XML-based query interface across multiple online data sources. The development of this framework is driven by the need to integrate neurosurgical and neuroimaging data related to language. The data sources for the language studies are 1) a web-accessible relational database of neurosurgical cortical stimulation mapping data (CSM) that includes patient-specific 3-D coordinates of each stimulation site mapped to an MRI reconstruction of the patient brain surface; and 2) an XML database of fMRI and structural MRI data and analysis results, created automatically by a batch program we have embedded in SPM. To make these sources available for querying each is wrapped as an XML view embedded in a web service. A top level web application accepts distributed XQueries over the sources, which are dispatched to the underlying web services. Returned results can be displayed as XML, HTML, CSV (Excel format), a 2-D schematic of a parcellated brain, or a 3-D brain visualization. In the latter case the CSM patient-specific coordinates returned by the query are sent to a transformation web-service for conversion to normalized space, after which they are sent to our 3-D visualization program MindSeer, which is accessed via Java WebStart through a generated link. The anatomical distribution of pooled CSM sites can then be visualized using various surfaces derived from brain atlases. As this framework is further developed and generalized we believe it will have appeal for researchers who wish to query, integrate and visualize results across their own databases as well as those of collaborators

    Characterizing Search Behavior in Productivity Software

    Get PDF
    Complex software applications expose hundreds of commands to users through intricate menu hierarchies. One of the most popular productivity software suites, Microsoft Office, has recently developed functionality that allows users to issue free-form text queries to a search system to quickly find commands they want to execute, retrieve help documentation or access web results in a unified interface. In this paper, we analyze millions of search sessions originating from within Microsoft Office applications, collected over one month of activity, in an effort to characterize search behavior in productivity software. Our research brings together previous efforts in analyzing command usage in large-scale applications and efforts in understanding search behavior in environments other than the web. Our findings show that users engage primarily in command search, and that re-accessing commands through search is a frequent behavior. Our work represents the first large-scale analysis of search over command spaces and is an important first step in understanding how search systems integrated with productivity software can be successfully developed

    Querying Streaming System Monitoring Data for Enterprise System Anomaly Detection

    Full text link
    The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each enterprise host, and perform timely abnormal system behavior detection over the stream of monitoring data. However, existing stream-based solutions lack explicit language constructs for expressing anomaly models that capture abnormal system behaviors, thus facing challenges in incorporating expert knowledge to perform timely anomaly detection over the large-scale monitoring data. To address these limitations, we build SAQL, a novel stream-based query system that takes as input, a real-time event feed aggregated from multiple hosts in an enterprise, and provides an anomaly query engine that queries the event feed to identify abnormal behaviors based on the specified anomaly models. SAQL provides a domain-specific query language, Stream-based Anomaly Query Language (SAQL), that uniquely integrates critical primitives for expressing major types of anomaly models. In the demo, we aim to show the complete usage scenario of SAQL by (1) performing an APT attack in a controlled environment, and (2) using SAQL to detect the abnormal behaviors in real time by querying the collected stream of system monitoring data that contains the attack traces. The audience will have the option to interact with the system and detect the attack footprints in real time via issuing queries and checking the query results through a command-line UI.Comment: Accepted paper at ICDE 2020 demonstrations track. arXiv admin note: text overlap with arXiv:1806.0933

    DataSpread: Unifying Databases and Spreadsheets.

    Get PDF
    Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current pane (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases

    Web-based Tools -— NED VO Services

    Get PDF
    The NASA/IPAC Extragalactic Database (NED) is a thematic, web-based research facility in widespread use by scientists, educators, space missions, and observatory operations for observation planning, data analysis, discovery, and publication of research about objects beyond our Milky Way galaxy. NED is a portal into a systematic fusion of data from hundreds of sky surveys and tens of thousands of research publications. The contents and services span the entire electromagnetic spectrum from gamma rays through radio frequencies, and are continuously updated to reflect the current literature and releases of large-scale sky survey catalogs. NED has been on the Internet since 1990, growing in content, automation and services with the evolution of information technology. NED is the world‛s largest database of crossidentified extragalactic objects. As of December 2006, the system contains approximately 10 million objects and 15 million multi-wavelength cross-IDs. Over 4 thousand catalogs and published lists covering the entire electromagnetic spectrum have had their objects cross-identified or associated, with fundamental data parameters federated for convenient queries and retrieval. This chapter describes the interoperability of NED services with other components of the Virtual Observatory (VO). Section 1 is a brief overview of the primary NED web services. Section 2 provides a tutorial for using NED services currently available through the NVO Registry. The “name resolver” provides VO portals and related internet services with celestial coordinates for objects specified by catalog identifier (name); any alias can be queried because this service is based on the source cross-IDs established by NED. All major services have been updated to provide output in VOTable (XML) format that can be accessed directly from the NED web interface or using the NVO registry. These include access to images via SIAP, Cone- Search queries, and services providing fundamental, multi-wavelength extragalactic data such as positions, redshifts, photometry and spectral energy distributions (SEDs), and sizes (all with references and uncertainties when available). Section 3 summarizes the advantages of accessing the NED “name resolver” and other NED services via the web to replace the legacy “server mode” custom data structure previously available through a function library provided only in the C programming language. Section 4 illustrates visualization via VOPlot of an SED and the spatial distribution of sources from a NED All-Sky (By Parameters) query. Section 5 describes the new NED Spectral Archive, illustrating how VOTables are being used to standardize the data and metadata as well as the physical units of spectra made available by authors of journal articles and producers of major survey archives; quick-look spectral analysis through convenient interoperability with the SpecView (STScI) Java applet is also shown. Section 6 closes with a summary of the capabilities described herein, which greatly simplify interoperability of NED with other components of the VO, enabling new opportunities for discovery, visualization, and analysis of multiwavelength data

    Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

    Get PDF
    Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)
    • …
    corecore