3,576 research outputs found

    Matching Possible Mitigations to Cyber Threats: A Document-Driven Decision Support Systems Approach

    Get PDF
    Cyber systems are ubiquitous in all aspects of society. At the same time, breaches to cyber systems continue to be front-page news (Calfas, 2018; Equifax, 2017) and, despite more than a decade of heightened focus on cybersecurity, the threat continues to evolve and grow, costing globally up to $575 billion annually (Center for Strategic and International Studies, 2014; Gosler & Von Thaer, 2013; Microsoft, 2016; Verizon, 2017). To address possible impacts due to cyber threats, information system (IS) stakeholders must assess the risks they face. Following a risk assessment, the next step is to determine mitigations to counter the threats that pose unacceptably high risks. The literature contains a robust collection of studies on optimizing mitigation selections, but they universally assume that the starting list of appropriate mitigations for specific threats exists from which to down-select. In current practice, producing this starting list is largely a manual process and it is challenging because it requires detailed cybersecurity knowledge from highly decentralized sources, is often deeply technical in nature, and is primarily described in textual form, leading to dependence on human experts to interpret the knowledge for each specific context. At the same time cybersecurity experts remain in short supply relative to the demand, while the delta between supply and demand continues to grow (Center for Cyber Safety and Education, 2017; Kauflin, 2017; Libicki, Senty, & Pollak, 2014). Thus, an approach is needed to help cybersecurity experts (CSE) cut through the volume of available mitigations to select those which are potentially viable to offset specific threats. This dissertation explores the application of machine learning and text retrieval techniques to automate matching of relevant mitigations to cyber threats, where both are expressed as unstructured or semi-structured English language text. Using the Design Science Research Methodology (Hevner & March, 2004; Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007), we consider a number of possible designs for the matcher, ultimately selecting a supervised machine learning approach that combines two techniques: support vector machine classification and latent semantic analysis. The selected approach demonstrates high recall for mitigation documents in the relevant class, bolstering confidence that potentially viable mitigations will not be overlooked. It also has a strong ability to discern documents in the non-relevant class, allowing approximately 97% of non-relevant mitigations to be excluded automatically, greatly reducing the CSE’s workload over purely manual matching. A false v positive rate of up to 3% prevents totally automated mitigation selection and requires the CSE to reject a few false positives. This research contributes to theory a method for automatically mapping mitigations to threats when both are expressed as English language text documents. This artifact represents a novel machine learning approach to threat-mitigation mapping. The research also contributes an instantiation of the artifact for demonstration and evaluation. From a practical perspective the artifact benefits all threat-informed cyber risk assessment approaches, whether formal or ad hoc, by aiding decision-making for cybersecurity experts whose job it is to mitigate the identified cyber threats. In addition, an automated approach makes mitigation selection more repeatable, facilitates knowledge reuse, extends the reach of cybersecurity experts, and is extensible to accommodate the continued evolution of both cyber threats and mitigations. Moreover, the selection of mitigations applicable to each threat can serve as inputs into multifactor analyses of alternatives, both automated and manual, thereby bridging the gap between cyber risk assessment and final mitigation selection

    A Practical Framework for Storing and Searching Encrypted Data on Cloud Storage

    Full text link
    Security has become a significant concern with the increased popularity of cloud storage services. It comes with the vulnerability of being accessed by third parties. Security is one of the major hurdles in the cloud server for the user when the user data that reside in local storage is outsourced to the cloud. It has given rise to security concerns involved in data confidentiality even after the deletion of data from cloud storage. Though, it raises a serious problem when the encrypted data needs to be shared with more people than the data owner initially designated. However, searching on encrypted data is a fundamental issue in cloud storage. The method of searching over encrypted data represents a significant challenge in the cloud. Searchable encryption allows a cloud server to conduct a search over encrypted data on behalf of the data users without learning the underlying plaintexts. While many academic SE schemes show provable security, they usually expose some query information, making them less practical, weak in usability, and challenging to deploy. Also, sharing encrypted data with other authorized users must provide each document's secret key. However, this way has many limitations due to the difficulty of key management and distribution. We have designed the system using the existing cryptographic approaches, ensuring the search on encrypted data over the cloud. The primary focus of our proposed model is to ensure user privacy and security through a less computationally intensive, user-friendly system with a trusted third party entity. To demonstrate our proposed model, we have implemented a web application called CryptoSearch as an overlay system on top of a well-known cloud storage domain. It exhibits secure search on encrypted data with no compromise to the user-friendliness and the scheme's functional performance in real-world applications.Comment: 146 Pages, Master's Thesis, 6 Chapters, 96 Figures, 11 Table

    Approximate string matching methods for duplicate detection and clustering tasks

    Get PDF
    Approximate string matching methods are utilized by a vast number of duplicate detection and clustering applications in various knowledge domains. The application area is expected to grow due to the recent significant increase in the amount of digital data and knowledge sources. Despite the large number of existing string similarity metrics, there is a need for more precise approximate string matching methods to improve the efficiency of computer-driven data processing, thus decreasing labor-intensive human involvement. This work introduces a family of novel string similarity methods, which outperform a number of effective well-known and widely used string similarity functions. The new algorithms are designed to overcome the most common problem of the existing methods which is the lack of context sensitivity. In this evaluation, the Longest Approximately Common Prefix (LACP) method achieved the highest values of average precision and maximum F1 on three out of four medical informatics datasets used. The LACP demonstrated the lowest execution time ensured by the linear computational complexity within the set of evaluated algorithms. An online interactive spell checker of biomedical terms was developed based on the LACP method. The main goal of the spell checker was to evaluate the LACP method’s ability to make it possible to estimate the similarity of resulting sets at a glance. The Shortest Path Edit Distance (SPED) outperformed all evaluated similarity functions and gained the highest possible values of the average precision and maximum F1 measures on the bioinformatics datasets. The SPED design was inspired by the preceding work on the Markov Random Field Edit Distance (MRFED). The SPED eradicates two shortcomings of the MRFED, which are prolonged execution time and moderate performance. Four modifications of the Histogram Difference (HD) method demonstrated the best performance on the majority of the life and social sciences data sources used in the experiments. The modifications of the HD algorithm were achieved using several re- scorers: HD with Normalized Smith-Waterman Re-scorer, HD with TFIDF and Jaccard re-scorers, HD with the Longest Common Prefix and TFIDF re-scorers, and HD with the Unweighted Longest Common Prefix Re-scorer. Another contribution of this dissertation includes the extensive analysis of the string similarity methods evaluation for duplicate detection and clustering tasks on the life and social sciences, bioinformatics, and medical informatics domains. The experimental results are illustrated with precision-recall charts and a number of tables presenting the average precision, maximum F1, and execution time

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    FACETS: A cognitive business intelligence system

    Full text link
    A cognitive decision support system called FACETS was developed and evaluated based on the situation retrieval (SR) model. The aim of FACETS is to provide decision makers cognitive decision support in ill-structured decision situations. The design and development of FACETS includes novel concepts, models, algorithms and system architecture, such as ontology and experience representation, situation awareness parsing, data warehouse query construction and guided situation presentation. The experiments showed that FACETS is able to play a significant role in supporting ill-structured decision making through developing and enriching situation awareness. © 2013 Elsevier Ltd

    Computational innovation studies: understanding innovation studies through novel scientometric approaches

    Get PDF
    A cientometria é uma importante área de investigação dedicada ao estudo quantitativo da ciência e está a expandir-se a um ritmo sem precedentes. Surgiu como um paradigma de avaliação e espera-se que ajude na resolução de problemas sociais complexos. Apesar da sua importância, pouco se sabe sobre os guardiões da ciência e os mecanismos de governação editorial mais amplos que ajudam a orientar os esforços científicos. Neste projeto, seguimos uma perspetiva pouco explorada (assumimos os conselhos editoriais e as revistas como veículo institucional), numa área específica de investigação científica (os Estudos de Inovação). Abordamos diferentes aspetos em três etapas: em primeiro lugar, produzimos um retrato abrangente do fenómeno editorial, sondando as características estruturais heterogéneas dos conselhos editoriais, que são dominados por editores masculinos, anglo-americanos que exibem uma concentração de 85% das posições editoriais em 20% dos países; em segundo lugar, comparamos os materiais publicitários das revistas (blurbs) com uma medida de semelhança do cosseno identificando seis revistas com mais de 80% de semelhança semântica com a "Research Policy" (a revista principal) e descobrimos que as revistas podem ser classificadas em quatro grupos; e em terceiro lugar, combinamos os resumos (abstracts) das revistas realmente publicados com a descrição publicitária, revelando que o conteúdo selecionado em cinco revistas teria tido maior interesse para outras. Por fim, desenvolvemos uma ferramenta interativa que permite comparar a semelhança dos conteúdos publicados pelas revistas. Estas estratégias de investigação apresentadas juntam-se ao portfólio de metodologias que os analistas de política científica podem usar para compreender sistematicamente as agendas de revistas, a fim de refletir sobre o que foi realizado e o que ainda está por fazer.Scientometrics is an important research field that is dedicated to the quantitative study of science and is expanding at an unprecedented rate. It emerged as an evaluation paradigm and is expected to assist in the resolution of complex societal problems. For years, the impact of research has been at the top of the agenda for policymakers, however little is known about the gatekeeping processes and the broader editorial governance mechanisms that helps steer scientific efforts. In this project, we will pursue an under-explored perspective (we take on editorial boards and the journals as an institutional vehicle) and apply to a specific field of academic research (Innovation Studies). We address different aspects in three steps: first, we provide a comprehensive portrait of the editorship phenomenon by probing the heterogeneous structural features of boards, which dominated by men and angloamerican editors displaying a concentration of 85% of editorial positions in 20% of the countries; second, we compare journals’ advertising materials (blurbs) with a cosine similarity measure identifying six journals with more than 80% semantic similarity with Research Policy (the leading journal) and find out that the journals can be classified into four groups; and third, we match journal blurbs with the abstracts of papers actually published disclosing that the contents from five journals would have greater interest to other outlets. Finally, an interactive tool was developed so that researchers are better empowered to compare the similarity of journals contents in the future. These research strategies presented add to the portfolio of methodologies that science policy analysts can use to systematically understand journal agendas in order to reflect on what has been accomplished and what remains to be done

    The detection of fraudulent financial statements using textual and financial data

    Get PDF
    Das Vertrauen in die Korrektheit veröffentlichter Jahresabschlüsse bildet ein Fundament für funktionierende Kapitalmärkte. Prominente Bilanzskandale erschüttern immer wieder das Vertrauen der Marktteilnehmer in die Glaubwürdigkeit der veröffentlichten Informationen und führen dadurch zu einer ineffizienten Ressourcenallokation. Zuverlässige, automatisierte Betrugserkennungssysteme, die auf öffentlich zugänglichen Daten basieren, können dazu beitragen, die Prüfungsressourcen effizienter zuzuweisen und stärken die Resilienz der Kapitalmärkte indem Marktteilnehmer stärker vor Bilanzbetrug geschützt werden. In dieser Studie steht die Entwicklung eines Betrugserkennungsmodells im Vordergrund, welches aus textuelle und numerische Bestandteile von Jahresabschlüssen typische Muster für betrügerische Manipulationen extrahiert und diese in einem umfangreichen Aufdeckungsmodell vereint. Die Untersuchung stützt sich dabei auf einen umfassenden methodischen Ansatz, welcher wichtige Probleme und Fragestellungen im Prozess der Erstellung, Erweiterung und Testung der Modelle aufgreift. Die Analyse der textuellen Bestandteile der Jahresabschlüsse wird dabei auf Basis von Mehrwortphrasen durchgeführt, einschließlich einer umfassenden Sprachstandardisierung, um erzählerische Besonderheiten und Kontext besser verarbeiten zu können. Weiterhin wird die Musterextraktion um erfolgreiche Finanzprädiktoren aus den Rechenwerken wie Bilanz oder Gewinn- und Verlustrechnung angereichert und somit der Jahresabschluss in seiner Breite erfasst und möglichst viele Hinweise identifiziert. Die Ergebnisse deuten auf eine zuverlässige und robuste Erkennungsleistung über einen Zeitraum von 15 Jahren hin. Darüber hinaus implizieren die Ergebnisse, dass textbasierte Prädiktoren den Finanzkennzahlen überlegen sind und eine Kombination aus beiden erforderlich ist, um die bestmöglichen Ergebnisse zu erzielen. Außerdem zeigen textbasierte Prädiktoren im Laufe der Zeit eine starke Variation, was die Wichtigkeit einer regelmäßigen Aktualisierung der Modelle unterstreicht. Die insgesamt erzielte Erkennungsleistung konnte sich im Durchschnitt gegen vergleichbare Ansätze durchsetzen.Fraudulent financial statements inhibit markets allocating resources efficiently and induce considerable economic cost. Therefore, market participants strive to identify fraudulent financial statements. Reliable automated fraud detection systems based on publically available data may help to allocate audit resources more effectively. This study examines how quantitative data (financials) and corporate narratives, both can be used to identify accounting fraud (proxied by SEC’s AAERs). Thereby, the detection models are based upon a sound foundation from fraud theory, highlighting how accounting fraud is carried out and discussing the causes for companies to engage in fraudulent alteration of financial records. The study relies on a comprehensive methodological approach to create the detection model. Therefore, the design process is divided into eight design and three enhancing questions, shedding light onto important issues during model creation, improving and testing. The corporate narratives are analysed using multi-word phrases, including an extensive language standardisation that allows to capture narrative peculiarities more precisely and partly address context. The narrative clues are enriched by successful predictors from company financials found in previous studies. The results indicate a reliable and robust detection performance over a timeframe of 15 years. Furthermore, they suggest that text-based predictors are superior to financial ratios and a combination of both is required to achieve the best results possible. Moreover, it is found that text-based predictors vary considerably over time, which shows the importance of updating fraud detection systems frequently. The achieved detection performance was slightly higher on average than for comparable approaches
    corecore