1,853 research outputs found

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

    Full text link
    Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Comment: 23 page

    Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007

    Get PDF
    This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p

    A Semantic Unsupervised Learning Approach to Word Sense Disambiguation

    Get PDF
    Word Sense Disambiguation (WSD) is the identification of the particular meaning for a word based on the context of its usage. WSD is a complex task that is an important component of language processing and information analysis systems in several fields. The best current methods for WSD rely on human input and are limited to a finite set of words. Complicating matters further, language is dynamic and over time usage changes and new words are introduced. Static definitions created by previously defined analyses become outdated or are inadequate to deal with current usage. Fully automated methods are needed both for sense discovery and for distinguishing the sense being used for a word in context to efficiently realize the benefits of WSD across a broader spectrum of language. Latent Semantic Analysis (LSA) is a powerful automated unsupervised learning system that has not been widely applied in this area. The research described in this proposal will apply advanced LSA techniques in a novel way to the WSD tasks of sense discovery and distinguishing senses in use

    Detection of semantic errors in Arabic texts

    Get PDF
    AbstractDetecting semantic errors in a text is still a challenging area of investigation. A lot of research has been done on lexical and syntactic errors while fewer studies have tackled semantic errors, as they are more difficult to treat. Compared to other languages, Arabic appears to be a special challenge for this problem. Because words are graphically very similar to each other, the risk of getting semantic errors in Arabic texts is bigger. Moreover, there are special cases and unique complexities for this language. This paper deals with the detection of semantic errors in Arabic texts but the approach we have adopted can also be applied for texts in other languages. It combines four contextual methods (using statistics and linguistic information) in order to decide about the semantic validity of a word in a sentence. We chose to implement our approach on a distributed architecture, namely, a Multi Agent System (MAS). The implemented system achieved a precision rate of about 90% and a recall rate of about 83%

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Matching Possible Mitigations to Cyber Threats: A Document-Driven Decision Support Systems Approach

    Get PDF
    Cyber systems are ubiquitous in all aspects of society. At the same time, breaches to cyber systems continue to be front-page news (Calfas, 2018; Equifax, 2017) and, despite more than a decade of heightened focus on cybersecurity, the threat continues to evolve and grow, costing globally up to $575 billion annually (Center for Strategic and International Studies, 2014; Gosler & Von Thaer, 2013; Microsoft, 2016; Verizon, 2017). To address possible impacts due to cyber threats, information system (IS) stakeholders must assess the risks they face. Following a risk assessment, the next step is to determine mitigations to counter the threats that pose unacceptably high risks. The literature contains a robust collection of studies on optimizing mitigation selections, but they universally assume that the starting list of appropriate mitigations for specific threats exists from which to down-select. In current practice, producing this starting list is largely a manual process and it is challenging because it requires detailed cybersecurity knowledge from highly decentralized sources, is often deeply technical in nature, and is primarily described in textual form, leading to dependence on human experts to interpret the knowledge for each specific context. At the same time cybersecurity experts remain in short supply relative to the demand, while the delta between supply and demand continues to grow (Center for Cyber Safety and Education, 2017; Kauflin, 2017; Libicki, Senty, & Pollak, 2014). Thus, an approach is needed to help cybersecurity experts (CSE) cut through the volume of available mitigations to select those which are potentially viable to offset specific threats. This dissertation explores the application of machine learning and text retrieval techniques to automate matching of relevant mitigations to cyber threats, where both are expressed as unstructured or semi-structured English language text. Using the Design Science Research Methodology (Hevner & March, 2004; Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007), we consider a number of possible designs for the matcher, ultimately selecting a supervised machine learning approach that combines two techniques: support vector machine classification and latent semantic analysis. The selected approach demonstrates high recall for mitigation documents in the relevant class, bolstering confidence that potentially viable mitigations will not be overlooked. It also has a strong ability to discern documents in the non-relevant class, allowing approximately 97% of non-relevant mitigations to be excluded automatically, greatly reducing the CSE’s workload over purely manual matching. A false v positive rate of up to 3% prevents totally automated mitigation selection and requires the CSE to reject a few false positives. This research contributes to theory a method for automatically mapping mitigations to threats when both are expressed as English language text documents. This artifact represents a novel machine learning approach to threat-mitigation mapping. The research also contributes an instantiation of the artifact for demonstration and evaluation. From a practical perspective the artifact benefits all threat-informed cyber risk assessment approaches, whether formal or ad hoc, by aiding decision-making for cybersecurity experts whose job it is to mitigate the identified cyber threats. In addition, an automated approach makes mitigation selection more repeatable, facilitates knowledge reuse, extends the reach of cybersecurity experts, and is extensible to accommodate the continued evolution of both cyber threats and mitigations. Moreover, the selection of mitigations applicable to each threat can serve as inputs into multifactor analyses of alternatives, both automated and manual, thereby bridging the gap between cyber risk assessment and final mitigation selection

    Matching Possible Mitigations to Cyber Threats: A Document-Driven Decision Support Systems Approach

    Get PDF
    Despite more than a decade of heightened focus on cybersecurity, the threat continues. To address possible impacts, cyber threats must be addressed. Mitigation catalogs exist in practice today, but these do not map mitigations to the specific threats they counter. Currently, mitigations are manually selected by cybersecurity experts (CSE) who are in short supply. To reduce labor and improve repeatability, an automated approach is needed for matching mitigations to cyber threats. This research explores the application of supervised machine learning and text retrieval techniques to automate matching of relevant mitigations to cyber threats where both are expressed as text, resulting in a novel method that combines two techniques: support vector machine classification and latent semantic analysis. In five test cases, the approach demonstrates high recall for known relevant mitigation documents, bolstering confidence that potentially relevant mitigations will not be overlooked. It automatically excludes 97% of non-relevant mitigations, greatly reducing the CSE’s workload over purely manual matching
    corecore