1,188 research outputs found

    Collective intelligence within web video

    Get PDF

    Recommending Recommendations to Support the Defense Acquisition Workforce

    Get PDF
    Excerpt from the Proceedings of the Nineteenth Annual Acquisition Research SymposiumThis paper presentings the preliminary results of a research study to support the Defense Acquisition Workforce with a Natural Language Processing (NLP)/Machine Learning (ML) prototype of a system to determine what are the most relevant recommendations that stakeholders are providing to the Defense Acquisition community. The problem addressed by the research study is in the realm of NLP and ML and it is part of the quite popular category of “recommendation systems.” Unlike the majority of the cases in this category, though, this task does not focus on numerical data representing behaviors (like in shopping recommendations), but on extracting user-specific relevance from text and “recommending” a document or part of it. In order to identify important pieces of these texts, subjective text analysis is required to be run. The method used for the analysis is the “room theory framework” by Lipizzi et al. (2021) which applies the Framework Theory by Marvin Minsky (1974) through the use of text vectorization. This framework has three main components: a vectorized corpus representing the knowledge base of the specific domain (the “room”), a set of keywords or phrases defining the specific points of interest for the recommendation (the “benchmarks”) and the documents to be analyzed. The documents are then vectorized using the “room” and compared to the “benchmarks.” The sentences/paragraphs within a given document that are most similar to the benchmarks, and thus presumably the most important parts of the document, are highlighted. This enables the DAU reviewers to submit a document, run the program, and be able to clearly see what recommendations will be the most useful.Approved for public release; distribution is unlimited

    Recommending Recommendations to Support the Defense Acquisition Workforce

    Get PDF
    Excerpt from the Proceedings of the Nineteenth Annual Acquisition Research SymposiumThis paper presentings the preliminary results of a research study to support the Defense Acquisition Workforce with a Natural Language Processing (NLP)/Machine Learning (ML) prototype of a system to determine what are the most relevant recommendations that stakeholders are providing to the Defense Acquisition community. The problem addressed by the research study is in the realm of NLP and ML and it is part of the quite popular category of “recommendation systems.” Unlike the majority of the cases in this category, though, this task does not focus on numerical data representing behaviors (like in shopping recommendations), but on extracting user-specific relevance from text and “recommending” a document or part of it. In order to identify important pieces of these texts, subjective text analysis is required to be run. The method used for the analysis is the “room theory framework” by Lipizzi et al. (2021) which applies the Framework Theory by Marvin Minsky (1974) through the use of text vectorization. This framework has three main components: a vectorized corpus representing the knowledge base of the specific domain (the “room”), a set of keywords or phrases defining the specific points of interest for the recommendation (the “benchmarks”) and the documents to be analyzed. The documents are then vectorized using the “room” and compared to the “benchmarks.” The sentences/paragraphs within a given document that are most similar to the benchmarks, and thus presumably the most important parts of the document, are highlighted. This enables the DAU reviewers to submit a document, run the program, and be able to clearly see what recommendations will be the most useful.Approved for public release; distribution is unlimited

    Evaluating Human-Language Model Interaction

    Full text link
    Many real-world applications of language models (LMs), such as writing assistance and code autocomplete, involve human-LM interaction. However, most benchmarks are non-interactive in that a model produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems and dimensions to consider when designing evaluation metrics. Compared to standard, non-interactive evaluation, HALIE captures (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality (e.g., enjoyment and ownership). We then design five tasks to cover different forms of interaction: social dialogue, question answering, crossword puzzles, summarization, and metaphor generation. With four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21 Labs' Jurassic-1), we find that better non-interactive performance does not always translate to better human-LM interaction. In particular, we highlight three cases where the results from non-interactive and interactive metrics diverge and underscore the importance of human-LM interaction for LM evaluation.Comment: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI

    Understanding in-video dropouts and interaction peaks in online lecture videos

    Get PDF
    With thousands of learners watching the same online lecture videos, analyzing video watching patterns provides a unique opportunity to understand how students learn with videos. This paper reports a large-scale analysis of in-video dropout and peaks in viewership and student activity, using second-by-second user interaction data from 862 videos in four Massive Open Online Courses (MOOCs) on edX. We find higher dropout rates in longer videos, re-watching sessions (vs first-time), and tutorials (vs lectures). Peaks in re-watching sessions and play events indicate points of interest and confusion. Results show that tutorials (vs lectures) and re-watching sessions (vs first-time) lead to more frequent and sharper peaks. In attempting to reason why peaks occur by sampling 80 videos, we observe that 61% of the peaks accompany visual transitions in the video, e.g., a slide view to a classroom view. Based on this observation, we identify five student activity patterns that can explain peaks: starting from the beginning of a new material, returning to missed content, following a tutorial step, replaying a brief segment, and repeating a non-visual explanation. Our analysis has design implications for video authoring, editing, and interface design, providing a richer understanding of video learning on MOOCs

    Exploiting cloud utility models for profit and ruin

    Get PDF
    A key characteristic that has led to the early adoption of public cloud computing is the utility pricing model that governs the cost of compute resources consumed. Similar to public utilities like gas and electricity, cloud consumers only pay for the resources they consume and only for the time they are utilized. As a result and pursuant to a Cloud Service Provider\u27s (CSP) Terms of Agreement, cloud consumers are responsible for all computational costs incurred within and in support of their rented computing environments whether these resources were consumed in good faith or not. While initial threat modeling and security research on the public cloud model has primarily focused on the confidentiality and integrity of data transferred, processed, and stored in the cloud, little attention has been paid to the external threat sources that have the capability to affect the financial viability of cloud-hosted services. Bounded by a utility pricing model, Internet-facing web resources hosted in the cloud are vulnerable to Fraudulent Resource Consumption (FRC) attacks. Unlike an application-layer DDoS attack that consumes resources with the goal of disrupting short-term availability, a FRC attack is a considerably more subtle attack that instead targets the utility model over an extended time period. By fraudulently consuming web resources in sufficient volume (i.e. data transferred out of the cloud), an attacker is able to inflict significant fraudulent charges to the victim. This work introduces and thoroughly describes the FRC attack and discusses why current application-layer DDoS mitigation schemes are not applicable to a more subtle attack. The work goes on to propose three detection metrics that together form the criteria for detecting a FRC attack from that of normal web activity and an attribution methodology capable of accurately identifying FRC attack clients. Experimental results based on plausible and challenging attack scenarios show that an attacker, without knowledge of the training web log, has a difficult time mimicking the self-similar and consistent request semantics of normal web activity necessary to carryout a successful FRC attack

    Data Mining Techniques to Understand Textual Data

    Get PDF
    More than ever, information delivery online and storage heavily rely on text. Billions of texts are produced every day in the form of documents, news, logs, search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Text understanding is a fundamental and essential task involving broad research topics, and contributes to many applications in the areas text summarization, search engine, recommendation systems, online advertising, conversational bot and so on. However, understanding text for computers is never a trivial task, especially for noisy and ambiguous text such as logs, search queries. This dissertation mainly focuses on textual understanding tasks derived from the two domains, i.e., disaster management and IT service management that mainly utilizing textual data as an information carrier. Improving situation awareness in disaster management and alleviating human efforts involved in IT service management dictates more intelligent and efficient solutions to understand the textual data acting as the main information carrier in the two domains. From the perspective of data mining, four directions are identified: (1) Intelligently generate a storyline summarizing the evolution of a hurricane from relevant online corpus; (2) Automatically recommending resolutions according to the textual symptom description in a ticket; (3) Gradually adapting the resolution recommendation system for time correlated features derived from text; (4) Efficiently learning distributed representation for short and lousy ticket symptom descriptions and resolutions. Provided with different types of textual data, data mining techniques proposed in those four research directions successfully address our tasks to understand and extract valuable knowledge from those textual data. My dissertation will address the research topics outlined above. Concretely, I will focus on designing and developing data mining methodologies to better understand textual information, including (1) a storyline generation method for efficient summarization of natural hurricanes based on crawled online corpus; (2) a recommendation framework for automated ticket resolution in IT service management; (3) an adaptive recommendation system on time-varying temporal correlated features derived from text; (4) a deep neural ranking model not only successfully recommending resolutions but also efficiently outputting distributed representation for ticket descriptions and resolutions
    • …
    corecore