31,610 research outputs found

    Automated Big Text Security Classification

    Full text link
    In recent years, traditional cybersecurity safeguards have proven ineffective against insider threats. Famous cases of sensitive information leaks caused by insiders, including the WikiLeaks release of diplomatic cables and the Edward Snowden incident, have greatly harmed the U.S. government's relationship with other governments and with its own citizens. Data Leak Prevention (DLP) is a solution for detecting and preventing information leaks from within an organization's network. However, state-of-art DLP detection models are only able to detect very limited types of sensitive information, and research in the field has been hindered due to the lack of available sensitive texts. Many researchers have focused on document-based detection with artificially labeled "confidential documents" for which security labels are assigned to the entire document, when in reality only a portion of the document is sensitive. This type of whole-document based security labeling increases the chances of preventing authorized users from accessing non-sensitive information within sensitive documents. In this paper, we introduce Automated Classification Enabled by Security Similarity (ACESS), a new and innovative detection model that penetrates the complexity of big text security classification/detection. To analyze the ACESS system, we constructed a novel dataset, containing formerly classified paragraphs from diplomatic cables made public by the WikiLeaks organization. To our knowledge this paper is the first to analyze a dataset that contains actual formerly sensitive information annotated at paragraph granularity.Comment: Pre-print of Best Paper Award IEEE Intelligence and Security Informatics (ISI) 2016 Manuscrip

    Optical tomography: Image improvement using mixed projection of parallel and fan beam modes

    Get PDF
    Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam

    Enhancing Undergraduate AI Courses through Machine Learning Projects

    Full text link
    It is generally recognized that an undergraduate introductory Artificial Intelligence course is challenging to teach. This is, in part, due to the diverse and seemingly disconnected core topics that are typically covered. The paper presents work funded by the National Science Foundation to address this problem and to enhance the student learning experience in the course. Our work involves the development of an adaptable framework for the presentation of core AI topics through a unifying theme of machine learning. A suite of hands-on semester-long projects are developed, each involving the design and implementation of a learning system that enhances a commonly-deployed application. The projects use machine learning as a unifying theme to tie together the core AI topics. In this paper, we will first provide an overview of our model and the projects being developed and will then present in some detail our experiences with one of the projects – Web User Profiling which we have used in our AI class

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Deep Multi-object Spectroscopy to Enhance Dark Energy Science from LSST

    Get PDF
    Community access to deep (i ~ 25), highly-multiplexed optical and near-infrared multi-object spectroscopy (MOS) on 8-40m telescopes would greatly improve measurements of cosmological parameters from LSST. The largest gain would come from improvements to LSST photometric redshifts, which are employed directly or indirectly for every major LSST cosmological probe; deep spectroscopic datasets will enable reduced uncertainties in the redshifts of individual objects via optimized training. Such spectroscopy will also determine the relationship of galaxy SEDs to their environments, key observables for studies of galaxy evolution. The resulting data will also constrain the impact of blending on photo-z's. Focused spectroscopic campaigns can also improve weak lensing cosmology by constraining the intrinsic alignments between the orientations of galaxies. Galaxy cluster studies can be enhanced by measuring motions of galaxies in and around clusters and by testing photo-z performance in regions of high density. Photometric redshift and intrinsic alignment studies are best-suited to instruments on large-aperture telescopes with wider fields of view (e.g., Subaru/PFS, MSE, or GMT/MANIFEST) but cluster investigations can be pursued with smaller-field instruments (e.g., Gemini/GMOS, Keck/DEIMOS, or TMT/WFOS), so deep MOS work can be distributed amongst a variety of telescopes. However, community access to large amounts of nights for surveys will still be needed to accomplish this work. In two companion white papers we present gains from shallower, wide-area MOS and from single-target imaging and spectroscopy.Comment: Science white paper submitted to the Astro2020 decadal survey. A table of time requirements is available at http://d-scholarship.pitt.edu/36036

    Benefits of Computer Based Content Analysis to Foresight

    Get PDF
    Purpose of the article: The present manuscript summarizes benefits of the use of computer-based content analysis in a generation phase of foresight initiatives. Possible advantages, disadvantages and limitations of the content analysis for the foresight projects are discussed as well. Methodology/methods: In order to specify the benefits and identify the limitations of the content analysis within the foresight, results of the generation phase of a particular foresight project performed without and subsequently with the use of computer based content analysis tool were compared by two proposed measurements. Scientific aim: The generation phase of the foresight is the most demanding part in terms of analysis duration, costs and resources due to a significant amount of reviewed text. In addition, the conclusions of the foresight evaluation are dependent on personal views and perceptions of the foresight analysts as the evaluation is based merely on reading. The content analysis may partially or even fully replace the reading and provide an important benchmark. Findings: The use of computer based content analysis tool significantly reduced time to conduct the foresight generation phase. The content analysis tool showed very similar results as compared to the evaluation performed by the standard reading. Only ten % of results were not revealed by the use of content analysis tool. On the other hand, several new topics were identified by means of content analysis tool that were missed by the reading. Conclusions: The results of two measurements should be subjected to further testing within more foresight projects to validate them. The computer based content analysis tool provides valuable benchmark to the foresight analysts and partially substitute the reading. However, a complete replacement of the reading is not recommended, as deep understanding to weak signals interpretation is essential for the foresight
    corecore