13 research outputs found

    The Piazza Peer Data Management System

    Full text link

    Structural Summaries as a Core Technology for Efficient XML Retrieval

    Get PDF
    The Extensible Markup Language (XML) is extremely popular as a generic markup language for text documents with an explicit hierarchical structure. The different types of XML data found in today’s document repositories, digital libraries, intranets and on the web range from flat text with little meaningful structure to be queried, over truly semistructured data with a rich and often irregular structure, to rather rigidly structured documents with little text that would also fit a relational database system (RDBS). Not surprisingly, various ways of storing and retrieving XML data have been investigated, including native XML systems, relational engines based on RDBSs, and hybrid combinations thereof. Over the years a number of native XML indexing techniques have emerged, the most important ones being structure indices and labelling schemes. Structure indices represent the document schema (i.e., the hierarchy of nested tags that occur in the documents) in a compact central data structure so that structural query constraints (e.g., path or tree patterns) can be efficiently matched without accessing the documents. Labelling schemes specify ways to assign unique identifiers, or labels, to the document nodes so that specific relations (e.g., parent/child) between individual nodes can be inferred from their labels alone in a decentralized manner, again without accessing the documents themselves. Since both structure indices and labelling schemes provide compact approximate views on the document structure, we collectively refer to them as structural summaries. This work presents new structural summaries that enable highly efficient and scalable XML retrieval in native, relational and hybrid systems. The key contribution of our approach is threefold. (1) We introduce BIRD, a very efficient and expressive labelling scheme for XML, and the CADG, a combined text and structure index, and combine them as two complementary building blocks of the same XML retrieval system. (2) We propose a purely relational variant of BIRD and the CADG, called RCADG, that is extremely fast and scales up to large document collections. (3) We present the RCADG Cache, a hybrid system that enhances the RCADG with incremental query evaluation based on cached results of earlier queries. The RCADG Cache exploits schema information in the RCADG to detect cached query results that can supply some or all matches to a new query with little or no computational and I/O effort. A main-memory cache index ensures that reusable query results are quickly retrieved even in a huge cache. Our work shows that structural summaries significantly improve the efficiency and scalability of XML retrieval systems in several ways. Former relational approaches have largely ignored structural summaries. The RCADG shows that these native indexing techniques are equally effective for XML retrieval in RDBSs. BIRD, unlike some other labelling schemes, achieves high retrieval performance with a fairly modest storage overhead. To the best of our knowledge, the RCADG Cache is the only approach to take advantage of structural summaries for effectively detecting query containment or overlap. Moreover, no other XML cache we know of exploits intermediate results that are produced as a by-product during the evaluation from scratch. These are valuable cache contents that increase the effectiveness of the cache at no extra computational cost. Extensive experiments quantify the practical benefit of all of the proposed techniques, which amounts to a performance gain of several orders of magnitude compared to various other approaches

    Supporting software processes analysis and decision-making using provenance data

    Get PDF
    Data provenance can be defined as the description of the origins of a piece of data and the process by which it arrived in a database. Provenance has been successfully used in health sciences, chemical industries, and scientific computing, considering that these areas require a comprehensive traceability mechanism. Moreover, companies have been increasing the amount of data they collect from their systems and processes, considering the dropping cost of memory and storage technologies in the last years. Thus, this thesis investigates if the use of provenance models and techniques can support software processes execution analysis and data-driven decision-making, considering the increasing availability of process data provided by companies. A provenance model for software processes was developed and evaluated by experts in process and provenance area, in addition to an approach for capturing, storing, inferencing of implicit information, and visualization to software process provenance data. In addition, a case study using data from industry’s processes was conducted to evaluate the approach, with a discussion about several specific analysis and data-driven decision-making possibilities.Proveniência de dados é definida como a descrição da origem de um dado e o processo pelo qual este passou até chegar ao seu estado atual. Proveniência de dados tem sido usada com sucesso em domínios como ciências da saúde, indústrias químicas e computação científica, considerando que essas áreas exigem um mecanismo abrangente de rastreabilidade. Por outro lado, as empresas vêm aumentando a quantidade de dados que coletam de seus sistemas e processos, considerando a diminuição no custo das tecnologias de memória e armazenamento nos últimos anos. Assim, esta tese investiga se o uso de modelos e técnicas de proveniência é capaz de apoiar a análise da execução de processos de software e a tomada de decisões baseada em dados, considerando a disponibilização cada vez maior de dados relativos a processos pelas empresas. Um modelo de proveniência para processos de software foi desenvolvido e avaliado por especialistas em processos e proveniência, além de uma abordagem e ferramental de apoio para captura, armazenamento, inferência de novas informações e posterior análise e visualização dos dados de proveniência de processos. Um estudo de caso utilizando dados de processos da indústria foi conduzido para avaliação da abordagem e discussão de possibilidades distintas para análise e tomada de decisão orientada por estes dados

    Situating Open Data

    Get PDF
    Open data and its effects on society are always woven into infrastructural legacies, social relations, and the political economy. This raises questions about how our understanding and engagement with open data shifts when we focus on its situated use. To shed a light on these questions, Situating Open Data provides several empirical accounts of open data practices, the local implementation of global initiatives, and the development of new open data ecosystems. Drawing on case studies in different countries and contexts, the chapters demonstrate the practices and actors involved in open government data initiatives unfolding within different socio-political settings. The book proposes three recommendations for researchers, policy-makers and practitioners. First, beyond upskilling through data literacy programmes, open data initiatives should be specified through the kinds of data practices and effects they generate. Second, global visions of open data implementation require more studies of the resonances and tensions created in localised initiatives. And third, research into open data ecosystems requires more attention to the histories and legacies of information infrastructures and how these shape who benefits from open data flows. As such, this volume departs from the framing of data as a resource to be deployed. Instead, it proposes a prism of different data practices in different contexts through which to study the social relations, capacities, infrastructural histories and power structures affecting open data initiatives. It is hoped that the contributions collected in Situating Open Data will spark critical reflection about the way open data is locally practiced and implemented. The contributions should be of interest to open data researchers, advocates, and those in or advising government administrations designing and rolling out effective open data initiatives

    Investigating the factors that influence use of ICTs for citizen engagement in Malawi

    Get PDF
    Literature has suggested that Malawians are keen to participate. Malawian’s willingness to participate is evident as the country has recorded high voter turnouts during the elections in recent decades. However, literature also suggests that there is minimal citizen engagement in between elections. Elsewhere, Information and Communication Technologies (ICTs) have been used to enhance citizen engagement, but ICT led citizen engagement is still an emerging field and yet to be explored as an area of research particularly in Malawi. We thus sought to explore if the use of ICTs could improve citizen engagement with councils, councilors, and utility companies that provide water and electricity in Malawi. We developed and deployed an ICT platform called Mzinda which means My location in Malawi’s populous Chichewa language. The platform provided various channels for citizens and duty bearers to engage via SMS, USSD, web and a mobile application. We sought to understand the factors that influence citizen’s behavior intention to use an ICT platform to engage. We applied the modified UTAUT model by including Attitude and Self Efficacy social constructs that have among others been cited as limitations of the UTAUT model. We conducted factor loadings of six social constructs of the modified UTAUT model to validate content and reexamine the model in the context of citizen engagement using ICTs in Malawi. We found that, Attitude and Self Efficacy were not significant determinants of the Behaviour Intention for citizens to use the ICT platform. However, 75% of the Behaviour Intention was influenced by Perfomance Expectancy and Effort Expectancy as moderated by age and gender. Empirical evidence showed that responsiveness and actionability of councils and councillors had improved. We also learned that citizens believed that service delivery had improved and that they had more influence over councils, councillors, and the utility companies because of using the ICT platform. We conclude by noting that improvements in service delivery; enhanced responsiveness and actionability of councils, councillors and the utility companies were not necessarily as a result of the ICT platform alone; but a combination of ICTs and non-technology mechanisms of engaging the stakeholders through community campaigns, radio programs, print media engagement, community meetings and debates among others. It is evident that ICTs are not the panacea of all citizen engagement problems. This research can be useful to researchers and practitioners in the technology and citizen engagement domains.Thesis (MSc) -- Faculty of Science, Department of Computer Science, 202

    Assessing Comment Quality in Object-Oriented Languages

    Get PDF
    Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments. Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists. Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques

    Assessing Comment Quality in Object-Oriented Languages

    Get PDF
    Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments. Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists. Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques
    corecore