1,746 research outputs found

    Query Understanding in the Age of Large Language Models

    Full text link
    Querying, conversing, and controlling search and information-seeking interfaces using natural language are fast becoming ubiquitous with the rise and adoption of large-language models (LLM). In this position paper, we describe a generic framework for interactive query-rewriting using LLMs. Our proposal aims to unfold new opportunities for improved and transparent intent understanding while building high-performance retrieval systems using LLMs. A key aspect of our framework is the ability of the rewriter to fully specify the machine intent by the search engine in natural language that can be further refined, controlled, and edited before the final retrieval phase. The ability to present, interact, and reason over the underlying machine intent in natural language has profound implications on transparency, ranking performance, and a departure from the traditional way in which supervised signals were collected for understanding intents. We detail the concept, backed by initial experiments, along with open questions for this interactive query understanding framework.Comment: Accepted to GENIR(SIGIR'23

    Rethinking Similarity Search: Embracing Smarter Mechanisms over Smarter Data

    Full text link
    In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a more comprehensive approach that also enhances the underpinning search mechanisms. We highlight three novel avenues that call for a redefinition of the similarity search problem: exploiting implicit data structures and distributions, engaging users in an iterative feedback loop, and moving beyond a single query vector. These novel pathways have gained relevance in emerging applications such as large-scale language models, video clip retrieval, and data labeling. We discuss the corresponding research challenges posed by these new problem areas and share insights from our preliminary discoveries

    Ransomware Simulator for In-Depth Analysis and Detection: Leveraging Centralized Logging and Sysmon for Improved Cybersecurity

    Get PDF
    Abstract Ransomware attacks have become increasingly prevalent and sophisticated, posing significant threats to organizations and individuals worldwide. To effectively combat these threats, security professionals must continuously develop and adapt their detection and mitigation strategies. This master thesis presents the design and implementation of a ransomware simulator to facilitate an in-depth analysis of ransomware Tactics, Techniques, and Procedures (TTPs) and to evaluate the effectiveness of centralized logging and Sysmon, including the latest event types, in detecting and responding to such attacks. The study explores the advanced capabilities of Sysmon as a logging tool and data source, focusing on its ability to capture multiple event types, such as file creation, process execution, and network traffic, as well as the newly added event types. The aim is to demonstrate the effectiveness of Sysmon in detecting and analyzing malicious activities, with an emphasis on the latest features. By focusing on the comprehensive aspects of a cyber-attack, the study showcases the versatility and utility of Sysmon in detecting and addressing various attack vectors. The ransomware simulator is developed using a PowerShell script that emulates various ransomware TTPs and attack scenarios, providing a comprehensive and realistic simulation of a ransomware attack. Sysmon, a powerful system monitoring tool, is utilized to monitor and log the activities associated with the simulated attack, including the events generated by the new Sysmon features. Centralized logging is achieved through the integration of Splunk Enterprise, a widely used platform for log analysis and management. The collected logs are then analyzed to identify patterns, indicators of compromise (IoCs), and potential detection and mitigation strategies. Through the development of the ransomware simulator and the subsequent analysis of Sysmon logs, this research contributes to strengthening the security posture of organizations and improving cybersecurity measures against ransomware threats, with a focus on the latest Sysmon capabilities. The results demonstrate the importance of monitoring and analyzing system events to effectively detect and respond to ransomware attacks. This research can serve as a basis for further exploration of ransomware detection and response strategies, contributing to the advancement of cybersecurity practices and the development of more robust security measures against ransomware threats

    Holistic recommender systems for software engineering

    Get PDF
    The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering
    • …
    corecore