310,354 research outputs found

    Efficient Incremental View Maintenance for Data Warehousing

    Get PDF
    Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality

    Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure

    Full text link
    As machine learning systems move from computer-science laboratories into the open world, their accountability becomes a high priority problem. Accountability requires deep understanding of system behavior and its failures. Current evaluation methods such as single-score error metrics and confusion matrices provide aggregate views of system performance that hide important shortcomings. Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement. Characterization of failures and shortcomings is particularly complex for systems composed of multiple machine learned components. For such systems, existing evaluation methods have limited expressiveness in describing and explaining the relationship among input content, the internal states of system components, and final output quality. We present Pandora, a set of hybrid human-machine methods and tools for describing and explaining system failures. Pandora leverages both human and system-generated observations to summarize conditions of system malfunction with respect to the input content and system architecture. We share results of a case study with a machine learning pipeline for image captioning that show how detailed performance views can be beneficial for analysis and debugging

    P ORTOLAN: a Model-Driven Cartography Framework

    Get PDF
    Processing large amounts of data to extract useful information is an essential task within companies. To help in this task, visualization techniques have been commonly used due to their capacity to present data in synthesized views, easier to understand and manage. However, achieving the right visualization display for a data set is a complex cartography process that involves several transformation steps to adapt the (domain) data to the (visualization) data format expected by visualization tools. To maximize the benefits of visualization we propose Portolan, a generic model-driven cartography framework that facilitates the discovery of the data to visualize, the specification of view definitions for that data and the transformations to bridge the gap with the visualization tools. Our approach has been implemented on top of the Eclipse EMF modeling framework and validated on three different use cases

    Assessing the value of forest landscapes: a choice experiment approach

    Get PDF
    Landscape planning and design occupies a major role in forest policy in the UK. Since the 1980s, UK forests have been managed increasingly for multi-purpose objectives, a policy which has been underpinned by international agreements on sustainable forestry. Within this context, there is a need to understand public preferences for forest landscapes in designing policies that meet the needs of multi-purpose forestry. This paper is based on a study to investigate public willingness to pay (WTP) for regular visual and recreational access to a wide variety of generic forest landscapes. A total of thirty-three forest landscapes were investigated, each of which was defined as a combination of the configuration of the planting and the landscape factors. Computergenerated images of each of these landscapes were used to underpin a series of choice experiments conducted as part of a questionnaire survey of over 400 households across Great Britain. The results confirm the importance of landscape in contributing to the social and environmental benefits provided by forests, and suggests that current policies of woodland expansion may generate additional benefits, especially if more woodland is located close to urban populations. The paper concludes by discussing the implications of these results for forest policy across the UK. © AB Academic Publishers 2009

    Exploring user and system requirements of linked data visualization through a visual dashboard approach

    Get PDF
    One of the open problems in SemanticWeb research is which tools should be provided to users to explore linked data. This is even more urgent now that massive amount of linked data is being released by governments worldwide. The development of single dedicated visualization applications is increasing, but the problem of exploring unknown linked data to gain a good understanding of what is contained is still open. An effective generic solution must take into account the user’s point of view, their tasks and interaction, as well as the system’s capabilities and the technical constraints the technology imposes. This paper is a first step in understanding the implications of both, user and system by evaluating our dashboard-based approach. Though we observe a high user acceptance of the dashboard approach, our paper also highlights technical challenges arising out of complexities involving current infrastructure that need to be addressed while visualising linked data. In light of the findings, guidelines for the development of linked data visualization (and manipulation) are provided

    An extensible web interface for databases and its application to storing biochemical data

    Full text link
    This paper presents a generic web-based database interface implemented in Prolog. We discuss the advantages of the implementation platform and demonstrate the system's applicability in providing access to integrated biochemical data. Our system exploits two libraries of SWI-Prolog to create a schema-transparent interface within a relational setting. As is expected in declarative programming, the interface was written with minimal programming effort due to the high level of the language and its suitability to the task. We highlight two of Prolog's features that are well suited to the task at hand: term representation of structured documents and relational nature of Prolog which facilitates transparent integration of relational databases. Although we developed the system for accessing in-house biochemical and genomic data the interface is generic and provides a number of extensible features. We describe some of these features with references to our research databases. Finally we outline an in-house library that facilitates interaction between Prolog and the R statistical package. We describe how it has been employed in the present context to store output from statistical analysis on to the database.Comment: Online proceedings of the Joint Workshop on Implementation of Constraint Logic Programming Systems and Logic-based Methods in Programming Environments (CICLOPS-WLPE 2010), Edinburgh, Scotland, U.K., July 15, 201

    From SMART to agent systems development

    Get PDF
    In order for agent-oriented software engineering to prove effective it must use principled notions of agents and enabling specification and reasoning, while still considering routes to practical implementation. This paper deals with the issue of individual agent specification and construction, departing from the conceptual basis provided by the SMART agent framework. SMART offers a descriptive specification of an agent architecture but omits consideration of issues relating to construction and control. In response, we introduce two new views to complement SMART: a behavioural specification and a structural specification which, together, determine the components that make up an agent, and how they operate. In this way, we move from abstract agent system specification to practical implementation. These three aspects are combined to create an agent construction model, actSMART, which is then used to define the AgentSpeak(L) architecture in order to illustrate the application of actSMART

    Public views on the donation and use of human biological samples in biomedical research: a mixed methods study

    Get PDF
    Objective A mixed methods study exploring the UK general public's willingness to donate human biosamples (HBSs) for biomedical research.<p></p> Setting Cross-sectional focus groups followed by an online survey.<p></p> Participants Twelve focus groups (81 participants) selectively sampled to reflect a range of demographic groups; 1110 survey responders recruited through a stratified sampling method with quotas set on sex, age, geographical location, socioeconomic group and ethnicity.<p></p> Main outcome measures (1) Identify participants’ willingness to donate HBSs for biomedical research, (2) explore acceptability towards donating different types of HBSs in various settings and (3) explore preferences regarding use and access to HBSs.<p></p> Results 87% of survey participants thought donation of HBSs was important and 75% wanted to be asked to donate in general. Responders who self-reported having some or good knowledge of the medical research process were significantly more likely to want to donate (p<0.001). Reasons why focus group participants saw donation as important included: it was a good way of reciprocating for the medical treatment received; it was an important way of developing drugs and treatments; residual tissue would otherwise go to waste and they or their family members might benefit. The most controversial types of HBSs to donate included: brain post mortem (29% would donate), eyes post mortem (35%), embryos (44%), spare eggs (48%) and sperm (58%). Regarding the use of samples, there were concerns over animal research (34%), research conducted outside the UK (35%), and research conducted by pharmaceutical companies (56%), although education and discussion were found to alleviate such concerns.<p></p> Conclusions There is a high level of public support and willingness to donate HBSs for biomedical research. Underlying concerns exist regarding the use of certain types of HBSs and conditions under which they are used. Improved education and more controlled forms of consent for sensitive samples may mitigate such concerns.<p></p&gt
    • 

    corecore