697 research outputs found
Network Analysis on Incomplete Structures.
Over the past decade, networks have become an increasingly popular abstraction for problems in the physical, life, social and information sciences. Network analysis can be used to extract insights into an underlying system from the structure of its network representation. One of the challenges of applying network analysis is the fact that networks do not always have an observed and complete structure. This dissertation focuses on the problem of imputation and/or inference in the presence of incomplete network structures. I propose four novel systems, each of which, contain a module that involves the inference or imputation of an incomplete network that is necessary to complete the end task.
I first propose EdgeBoost, a meta-algorithm and framework that repeatedly applies a non-deterministic link predictor to improve the efficacy of community detection algorithms on networks with missing edges. On average EdgeBoost improves performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook. The second system, Butterworth, identifies a social network user's topic(s) of interests and automatically generates a set of social feed ``rankers'' that enable the user to see topic specific sub-feeds. Butterworth uses link prediction to infer the missing semantics between members of a user's social network in order to detect topical clusters embedded in the network structure. For automatically generated topic lists, Butterworth achieves an average top-10 precision of 78%, as compared to a time-ordered baseline of 45%. Next, I propose Dobby, a system for constructing a knowledge graph of user-defined keyword tags. Leveraging a sparse set of labeled edges, Dobby trains a supervised learning algorithm to infer the hypernym relationships between keyword tags. Dobby was evaluated by constructing a knowledge graph of LinkedIn's skills dataset, achieving an average precision of 85% on a set of human labeled hypernym edges between skills. Lastly, I propose Lobbyback, a system that automatically identifies clusters of documents that exhibit text reuse and generates ``prototypes'' that represent a canonical version of text shared between the documents. Lobbyback infers a network structure in a corpus of documents and uses community detection in order to extract the document clusters.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133443/1/mattburg_1.pd
Capturing the silences in digital archaeological knowledge
The availability and accessibility of digital data are increasingly significant in the creation of archaeological knowledge with, for example, multiple datasets being brought together to perform extensive analyses that would not otherwise be possible. However, this makes capturing the silences in those dataâwhat is absent as well as present, what is unknown as well as what is knownâa critical challenge for archaeology in terms of the suitability and appropriateness of data for subsequent reuse. This paper reverses the usual focus on knowledge and considers the role of ignoranceâthe lack of knowledge, or nonknowledgeâin archaeological data and knowledge creation. Examining aspects of archaeological practice in the light of different dimensions of ignorance, it proposes ways in which the silences, the range of unknowns, can be addressed within a digital environment and the benefits which may accrue
A Review and Analysis of Process at the Nexus of Instructional and Software Design
This dissertation includes a literature review and a single case analysis at the nexus of instructional design and technology and software development. The purpose of this study is to explore the depth and breadth of educational software design and development processes, and educational software reuse, with the intent of uncovering barriers to software development, software re-use and software replication in educational contexts. First, a thorough review of the academic literature was conducted on a representative sampling of educational technology studies. An examination of a 15-year time period within four representative journals identified 72 studies that addressed educational software to some extent. An additional sampling of the initial results identified 50 of those studies that discussed software the development process. These were further analyzed for evidence of software re-use and replication. Review results found a lack of reusable and/or replication-focused reports of instructional software development in educational technology journals, but found some reporting of educational technology reuse and replication from articles outside of educational technology. Based on the analysis, possible reasons for this occurrence are discussed. The author then proposes how a model for conducting and presenting instructional software design and development research based on the constructs of design-based research and cultural-historical activity theory might help mitigate this gap. Finally, the author presents a qualitative analysis of the software development process within a large, design-based educational technology project using cultural-historical activity theory (CHAT) as a lens. Using CHAT, the author seeks to uncover contradictions between the working worlds of instructional design and technology and software development with the intent of demonstrating how to mitigate tensions between these systems, and ultimately to increase the likelihood of reusable/replicable educational technologies. Findings reveal myriad tensions and social contradictions centered around the translation of instructional goals and requirements into software design and development tasks. Based on these results, the researcher proposes an educational software development framework called the iterative and integrative instructional software design framework that may help alleviate these tensions and thus make educational software design and development more productive, transparent, and replicable
Retrieval Enhancements for Task-Based Web Search
The task-based view of web search implies that retrieval should take the user perspective into account. Going beyond merely retrieving the most relevant result set for the current query, the retrieval system should aim to surface results that are actually useful to the task that motivated the query.
This dissertation explores how retrieval systems can better understand and support their usersâ tasks from three main angles: First, we study and quantify search engine user behavior during complex writing tasks, and how task success and behavior are associated in such settings. Second, we investigate search engine queries formulated as questions, and explore patterns in a large query log that may help search engines to better support this increasingly prevalent interaction pattern. Third, we propose a novel approach to reranking the search result lists produced by web search engines, taking into account retrieval axioms that formally specify properties of a good ranking.Die Task-basierte Sicht auf Websuche impliziert, dass die Benutzerperspektive berĂŒcksichtigt werden sollte. Ăber das bloĂe Abrufen der relevantesten Ergebnismenge fĂŒr die aktuelle Anfrage hinaus, sollten Suchmaschinen Ergebnisse liefern, die tatsĂ€chlich fĂŒr die Aufgabe (Task) nĂŒtzlich sind, die diese Anfrage motiviert hat.
Diese Dissertation untersucht, wie Retrieval-Systeme die Aufgaben ihrer Benutzer besser verstehen und unterstĂŒtzen können, und leistet ForschungsbeitrĂ€ge unter drei Hauptaspekten: Erstens untersuchen und quantifizieren wir das Verhalten von Suchmaschinenbenutzern wĂ€hrend komplexer Schreibaufgaben, und wie Aufgabenerfolg und Verhalten in solchen Situationen zusammenhĂ€ngen. Zweitens untersuchen wir Suchmaschinenanfragen, die als Fragen formuliert sind, und untersuchen ein Suchmaschinenlog mit fast einer Milliarde solcher Anfragen auf Muster, die Suchmaschinen dabei helfen können, diesen zunehmend verbreiteten Anfragentyp besser zu unterstĂŒtzen. Drittens schlagen wir einen neuen Ansatz vor, um die von Web-Suchmaschinen erstellten Suchergebnislisten neu zu sortieren, wobei Retrieval-Axiome berĂŒcksichtigt werden, die die Eigenschaften eines guten Rankings formal beschreiben
Recommended from our members
Towards helping end-user programmersâ information foraging by manipulating information features in a patch
Software maintenance tasks often require finding information within existing code, which is time-consuming and difficult even for professional programmers. For example, programmers may need to know what code implements certain functionality or what is the purpose of certain code. In response, researchers have developed tools to help programmers find information during programming tasks. The empirical success of these tools can be explained by Information Foraging Theory (IFT), which predicts how people will seek information by navigating through virtual patches in an information system. In the case of programming, these patches are often chunks of code (e.g., functions), with navigable links for moving among methods. IFT predicts people will perceive cues (such as words or symbols) associated with navigable links, select links that seem relevant to their information needs, and attempt to obtain the needed information by maximizing the rate of information gained relative to the cost of navigating and understanding patches. Many existing tools accelerate foraging by decreasing the cost associated with navigating from one patch to another.
IFT suggests that the visual weight of the information features in a patch can have a strong effect on a predatorâs foraging choices and, consequently, on how well the predator succeeds in maximizing the rate of information gain. In an ideal situation, visual weight will efficiently lead the predator to the needed information; on the other hand, if visual weight leads the predator astray, then this could lead the predator to process more patches than necessary (increasing cost and reducing the rate of information gain). Therefore, it is anticipated that increasing the relative weight of important information features with respect to unimportant information features will aid an end-user programmerâs foraging effort. Towards this end, two prototypes were implemented: each of these uses an existing algorithm to identify the most important lines of code in a function. One prototype increases the relative weight of important information features by highlighting important lines of code; the other prototype decreases the relative weight of unimportant information features by hiding unimportant lines of code. This research's focus is end-user programmers, who have
received minimal attention in prior work.
An empirical study evaluated the effectiveness of the prototypes relative to the baseline (no information feature modification). These results indicate that increasing the relative weight of important information features by highlighting important statements had a significant effect on the amount of information foraged and the rate of information gained; on the other hand, decreasing the relative weight of unimportant information features by hiding unimportant statements had a significant effect on the rate of information gained, but not on the amount of information foraged. Neither approaches seemed to have any effect on the amount of time spent on information foraging or patch-to-patch navigation
The Essence of Software Engineering
Software Engineering; Software Development; Software Processes; Software Architectures; Software Managemen
Toward an Effective Automated Tracing Process
Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the projectâs life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry
Veebi otsingumootorid ja vajadus keeruka informatsiooni jÀrele
VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Veebi otsingumootorid on muutunud pĂ”hiliseks teabe hankimise vahenditeks internetist. Koos otsingumootorite kasvava populaarsusega on nende kasutusala kasvanud lihtsailt pĂ€ringuilt vajaduseni kĂŒllaltki keeruka informatsiooni otsingu jĂ€rele. Samas on ka akadeemiline huvi otsingu vastu hakanud liikuma lihtpĂ€ringute analĂŒĂŒsilt mĂ€rksa keerukamate tegevuste suunas, mis hĂ”lmavad ka pikemaid ajaraame. Praegused otsinguvahendid ei toeta selliseid tegevusi niivĂ”rd hĂ€sti nagu lihtpĂ€ringute juhtu. Eriti kehtib see toe osas koondada mitme pĂ€ringu tulemusi kokku sĂŒnteesides erinevate lihtotsingute tulemusi ĂŒhte uude dokumenti. Selline lĂ€henemine on alles algfaasis ja ning motiveerib uurijaid arendama vastavaid vahendeid toetamaks taolisi informatsiooniotsingu ĂŒlesandeid.
KÀesolevas dissertatsioonis esitatakse rida uurimistulemusi eesmÀrgiga muuta keeruliste otsingute tuge paremaks kasutades tÀnapÀevaseid otsingumootoreid. AlameesmÀrkideks olid:
(a) arendada vÀlja keeruliste otsingute mudel,
(b) mÔÔdikute loomine kompleksotsingute mudelile,
(c) eristada kompleksotsingu ĂŒlesandeid lihtotsingutest ning teha kindlaks, kas neid on vĂ”imalik mÔÔta leides ĂŒhtlasi lihtsaid mÔÔdikuid kirjeldamaks nende keerukust,
(d) analĂŒĂŒsida, kui erinevalt kasutajad kĂ€ituvad sooritades keerukaid otsinguĂŒlesandeid kasutades veebi otsingumootoreid,
(e) uurida korrelatsiooni inimeste tava-veebikasutustavade ja nende otsingutulemuslikkuse vahel,
(f) kuidas inimestel lĂ€heb eelhinnates otsinguĂŒlesande raskusastet ja vajaminevat jĂ”upingutust ning
(g) milline on soo ja vanuse mÔju otsingu tulemuslikkusele.
Keeruka veebiotsingu ĂŒlesanded jaotatakse edukalt kolmeastmeliseks protsessiks. Esitatakse sellise protsessi mudel; seda protsessi on ĂŒhtlasi vĂ”imalik ka mÔÔta. Edasi nĂ€idatakse kompleksotsingu loomupĂ€raseid omadusi, mis teevad selle eristatavaks lihtsamatest juhtudest ning nĂ€idatakse Ă€ra katsemeetod sooritamaks kompleksotsingu kasutaja-uuringuid. Demonstreeritakse pĂ”hilisi samme raamistiku âSearch-Loggerâ (eelmainitud metodoloogia tehnilise teostuse) rakendamisel kasutaja-uuringutes. Esitatakse sellisel viisil teostatud uuringute tulemused. LĂ”puks esitatakse ATMS meetodi realisatsioon ja rakendamine parandamaks kompleksotsingu vajaduste tuge kaasaegsetes otsingumootorites.Search engines have become the means for searching information on the Internet. Along with the increasing popularity of these search tools, the areas of their application have grown from simple look-up to rather complex information needs. Also the academic interest in search has started to shift from analyzing simple query and response patterns to examining more sophisticated activities covering longer time spans. Current search tools do not support those activities as well as they do in the case of simple look-up tasks. Especially the support for aggregating search results from multiple search-queries, taking into account discoveries made and synthesizing them into a newly compiled document is only at the beginning and motivates researchers to develop new tools for supporting those information seeking tasks.
In this dissertation I present the results of empirical research with the focus on evaluating search engines and developing a theoretical model of the complex search process that can be used to better support this special kind of search with existing search tools. It is not the goal of the thesis to implement a new search technology. Therefore performance benchmarks against established systems such as question answering systems are not part of this thesis.
I present a model that decomposes complex Web search tasks into a measurable, three-step process. I show the innate characteristics of complex search tasks that make them distinguishable from their less complex counterparts and showcase an experimentation method to carry out complex search related user studies. I demonstrate the main steps taken during the development and implementation of the Search-Logger study framework (the technical manifestation of the aforementioned method) to carry our search user studies. I present the results of user studies carried out with this approach. Finally I present development and application of the ATMS (awareness-task-monitor-share) model to improve the support for complex search needs in current Web search engines
Assessing Comment Quality in Object-Oriented Languages
Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments.
Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality.
To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010â2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists.
Our contributions provide various kinds of empirical evidence of the developerâs interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques
- âŠ