697 research outputs found

    Network Analysis on Incomplete Structures.

    Full text link
    Over the past decade, networks have become an increasingly popular abstraction for problems in the physical, life, social and information sciences. Network analysis can be used to extract insights into an underlying system from the structure of its network representation. One of the challenges of applying network analysis is the fact that networks do not always have an observed and complete structure. This dissertation focuses on the problem of imputation and/or inference in the presence of incomplete network structures. I propose four novel systems, each of which, contain a module that involves the inference or imputation of an incomplete network that is necessary to complete the end task. I first propose EdgeBoost, a meta-algorithm and framework that repeatedly applies a non-deterministic link predictor to improve the efficacy of community detection algorithms on networks with missing edges. On average EdgeBoost improves performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook. The second system, Butterworth, identifies a social network user's topic(s) of interests and automatically generates a set of social feed ``rankers'' that enable the user to see topic specific sub-feeds. Butterworth uses link prediction to infer the missing semantics between members of a user's social network in order to detect topical clusters embedded in the network structure. For automatically generated topic lists, Butterworth achieves an average top-10 precision of 78%, as compared to a time-ordered baseline of 45%. Next, I propose Dobby, a system for constructing a knowledge graph of user-defined keyword tags. Leveraging a sparse set of labeled edges, Dobby trains a supervised learning algorithm to infer the hypernym relationships between keyword tags. Dobby was evaluated by constructing a knowledge graph of LinkedIn's skills dataset, achieving an average precision of 85% on a set of human labeled hypernym edges between skills. Lastly, I propose Lobbyback, a system that automatically identifies clusters of documents that exhibit text reuse and generates ``prototypes'' that represent a canonical version of text shared between the documents. Lobbyback infers a network structure in a corpus of documents and uses community detection in order to extract the document clusters.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133443/1/mattburg_1.pd

    Capturing the silences in digital archaeological knowledge

    Get PDF
    The availability and accessibility of digital data are increasingly significant in the creation of archaeological knowledge with, for example, multiple datasets being brought together to perform extensive analyses that would not otherwise be possible. However, this makes capturing the silences in those data—what is absent as well as present, what is unknown as well as what is known—a critical challenge for archaeology in terms of the suitability and appropriateness of data for subsequent reuse. This paper reverses the usual focus on knowledge and considers the role of ignorance—the lack of knowledge, or nonknowledge—in archaeological data and knowledge creation. Examining aspects of archaeological practice in the light of different dimensions of ignorance, it proposes ways in which the silences, the range of unknowns, can be addressed within a digital environment and the benefits which may accrue

    A Review and Analysis of Process at the Nexus of Instructional and Software Design

    Get PDF
    This dissertation includes a literature review and a single case analysis at the nexus of instructional design and technology and software development. The purpose of this study is to explore the depth and breadth of educational software design and development processes, and educational software reuse, with the intent of uncovering barriers to software development, software re-use and software replication in educational contexts. First, a thorough review of the academic literature was conducted on a representative sampling of educational technology studies. An examination of a 15-year time period within four representative journals identified 72 studies that addressed educational software to some extent. An additional sampling of the initial results identified 50 of those studies that discussed software the development process. These were further analyzed for evidence of software re-use and replication. Review results found a lack of reusable and/or replication-focused reports of instructional software development in educational technology journals, but found some reporting of educational technology reuse and replication from articles outside of educational technology. Based on the analysis, possible reasons for this occurrence are discussed. The author then proposes how a model for conducting and presenting instructional software design and development research based on the constructs of design-based research and cultural-historical activity theory might help mitigate this gap. Finally, the author presents a qualitative analysis of the software development process within a large, design-based educational technology project using cultural-historical activity theory (CHAT) as a lens. Using CHAT, the author seeks to uncover contradictions between the working worlds of instructional design and technology and software development with the intent of demonstrating how to mitigate tensions between these systems, and ultimately to increase the likelihood of reusable/replicable educational technologies. Findings reveal myriad tensions and social contradictions centered around the translation of instructional goals and requirements into software design and development tasks. Based on these results, the researcher proposes an educational software development framework called the iterative and integrative instructional software design framework that may help alleviate these tensions and thus make educational software design and development more productive, transparent, and replicable

    Retrieval Enhancements for Task-Based Web Search

    Get PDF
    The task-based view of web search implies that retrieval should take the user perspective into account. Going beyond merely retrieving the most relevant result set for the current query, the retrieval system should aim to surface results that are actually useful to the task that motivated the query. This dissertation explores how retrieval systems can better understand and support their users’ tasks from three main angles: First, we study and quantify search engine user behavior during complex writing tasks, and how task success and behavior are associated in such settings. Second, we investigate search engine queries formulated as questions, and explore patterns in a large query log that may help search engines to better support this increasingly prevalent interaction pattern. Third, we propose a novel approach to reranking the search result lists produced by web search engines, taking into account retrieval axioms that formally specify properties of a good ranking.Die Task-basierte Sicht auf Websuche impliziert, dass die Benutzerperspektive berĂŒcksichtigt werden sollte. Über das bloße Abrufen der relevantesten Ergebnismenge fĂŒr die aktuelle Anfrage hinaus, sollten Suchmaschinen Ergebnisse liefern, die tatsĂ€chlich fĂŒr die Aufgabe (Task) nĂŒtzlich sind, die diese Anfrage motiviert hat. Diese Dissertation untersucht, wie Retrieval-Systeme die Aufgaben ihrer Benutzer besser verstehen und unterstĂŒtzen können, und leistet ForschungsbeitrĂ€ge unter drei Hauptaspekten: Erstens untersuchen und quantifizieren wir das Verhalten von Suchmaschinenbenutzern wĂ€hrend komplexer Schreibaufgaben, und wie Aufgabenerfolg und Verhalten in solchen Situationen zusammenhĂ€ngen. Zweitens untersuchen wir Suchmaschinenanfragen, die als Fragen formuliert sind, und untersuchen ein Suchmaschinenlog mit fast einer Milliarde solcher Anfragen auf Muster, die Suchmaschinen dabei helfen können, diesen zunehmend verbreiteten Anfragentyp besser zu unterstĂŒtzen. Drittens schlagen wir einen neuen Ansatz vor, um die von Web-Suchmaschinen erstellten Suchergebnislisten neu zu sortieren, wobei Retrieval-Axiome berĂŒcksichtigt werden, die die Eigenschaften eines guten Rankings formal beschreiben

    The Essence of Software Engineering

    Get PDF
    Software Engineering; Software Development; Software Processes; Software Architectures; Software Managemen

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Veebi otsingumootorid ja vajadus keeruka informatsiooni jÀrele

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Veebi otsingumootorid on muutunud pĂ”hiliseks teabe hankimise vahenditeks internetist. Koos otsingumootorite kasvava populaarsusega on nende kasutusala kasvanud lihtsailt pĂ€ringuilt vajaduseni kĂŒllaltki keeruka informatsiooni otsingu jĂ€rele. Samas on ka akadeemiline huvi otsingu vastu hakanud liikuma lihtpĂ€ringute analĂŒĂŒsilt mĂ€rksa keerukamate tegevuste suunas, mis hĂ”lmavad ka pikemaid ajaraame. Praegused otsinguvahendid ei toeta selliseid tegevusi niivĂ”rd hĂ€sti nagu lihtpĂ€ringute juhtu. Eriti kehtib see toe osas koondada mitme pĂ€ringu tulemusi kokku sĂŒnteesides erinevate lihtotsingute tulemusi ĂŒhte uude dokumenti. Selline lĂ€henemine on alles algfaasis ja ning motiveerib uurijaid arendama vastavaid vahendeid toetamaks taolisi informatsiooniotsingu ĂŒlesandeid. KĂ€esolevas dissertatsioonis esitatakse rida uurimistulemusi eesmĂ€rgiga muuta keeruliste otsingute tuge paremaks kasutades tĂ€napĂ€evaseid otsingumootoreid. AlameesmĂ€rkideks olid: (a) arendada vĂ€lja keeruliste otsingute mudel, (b) mÔÔdikute loomine kompleksotsingute mudelile, (c) eristada kompleksotsingu ĂŒlesandeid lihtotsingutest ning teha kindlaks, kas neid on vĂ”imalik mÔÔta leides ĂŒhtlasi lihtsaid mÔÔdikuid kirjeldamaks nende keerukust, (d) analĂŒĂŒsida, kui erinevalt kasutajad kĂ€ituvad sooritades keerukaid otsinguĂŒlesandeid kasutades veebi otsingumootoreid, (e) uurida korrelatsiooni inimeste tava-veebikasutustavade ja nende otsingutulemuslikkuse vahel, (f) kuidas inimestel lĂ€heb eelhinnates otsinguĂŒlesande raskusastet ja vajaminevat jĂ”upingutust ning (g) milline on soo ja vanuse mĂ”ju otsingu tulemuslikkusele. Keeruka veebiotsingu ĂŒlesanded jaotatakse edukalt kolmeastmeliseks protsessiks. Esitatakse sellise protsessi mudel; seda protsessi on ĂŒhtlasi vĂ”imalik ka mÔÔta. Edasi nĂ€idatakse kompleksotsingu loomupĂ€raseid omadusi, mis teevad selle eristatavaks lihtsamatest juhtudest ning nĂ€idatakse Ă€ra katsemeetod sooritamaks kompleksotsingu kasutaja-uuringuid. Demonstreeritakse pĂ”hilisi samme raamistiku “Search-Logger” (eelmainitud metodoloogia tehnilise teostuse) rakendamisel kasutaja-uuringutes. Esitatakse sellisel viisil teostatud uuringute tulemused. LĂ”puks esitatakse ATMS meetodi realisatsioon ja rakendamine parandamaks kompleksotsingu vajaduste tuge kaasaegsetes otsingumootorites.Search engines have become the means for searching information on the Internet. Along with the increasing popularity of these search tools, the areas of their application have grown from simple look-up to rather complex information needs. Also the academic interest in search has started to shift from analyzing simple query and response patterns to examining more sophisticated activities covering longer time spans. Current search tools do not support those activities as well as they do in the case of simple look-up tasks. Especially the support for aggregating search results from multiple search-queries, taking into account discoveries made and synthesizing them into a newly compiled document is only at the beginning and motivates researchers to develop new tools for supporting those information seeking tasks. In this dissertation I present the results of empirical research with the focus on evaluating search engines and developing a theoretical model of the complex search process that can be used to better support this special kind of search with existing search tools. It is not the goal of the thesis to implement a new search technology. Therefore performance benchmarks against established systems such as question answering systems are not part of this thesis. I present a model that decomposes complex Web search tasks into a measurable, three-step process. I show the innate characteristics of complex search tasks that make them distinguishable from their less complex counterparts and showcase an experimentation method to carry out complex search related user studies. I demonstrate the main steps taken during the development and implementation of the Search-Logger study framework (the technical manifestation of the aforementioned method) to carry our search user studies. I present the results of user studies carried out with this approach. Finally I present development and application of the ATMS (awareness-task-monitor-share) model to improve the support for complex search needs in current Web search engines

    Assessing Comment Quality in Object-Oriented Languages

    Get PDF
    Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments. Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists. Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques
    • 

    corecore