31,610 research outputs found
Automated Big Text Security Classification
In recent years, traditional cybersecurity safeguards have proven ineffective
against insider threats. Famous cases of sensitive information leaks caused by
insiders, including the WikiLeaks release of diplomatic cables and the Edward
Snowden incident, have greatly harmed the U.S. government's relationship with
other governments and with its own citizens. Data Leak Prevention (DLP) is a
solution for detecting and preventing information leaks from within an
organization's network. However, state-of-art DLP detection models are only
able to detect very limited types of sensitive information, and research in the
field has been hindered due to the lack of available sensitive texts. Many
researchers have focused on document-based detection with artificially labeled
"confidential documents" for which security labels are assigned to the entire
document, when in reality only a portion of the document is sensitive. This
type of whole-document based security labeling increases the chances of
preventing authorized users from accessing non-sensitive information within
sensitive documents. In this paper, we introduce Automated Classification
Enabled by Security Similarity (ACESS), a new and innovative detection model
that penetrates the complexity of big text security classification/detection.
To analyze the ACESS system, we constructed a novel dataset, containing
formerly classified paragraphs from diplomatic cables made public by the
WikiLeaks organization. To our knowledge this paper is the first to analyze a
dataset that contains actual formerly sensitive information annotated at
paragraph granularity.Comment: Pre-print of Best Paper Award IEEE Intelligence and Security
Informatics (ISI) 2016 Manuscrip
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
Enhancing Undergraduate AI Courses through Machine Learning Projects
It is generally recognized that an undergraduate introductory Artificial Intelligence course is challenging to teach. This is, in part, due to the diverse and seemingly disconnected core topics that are typically covered. The paper presents work funded by the National Science Foundation to address this problem and to enhance the student learning experience in the course. Our work involves the development of an adaptable framework for the presentation of core AI topics through a unifying theme of machine learning. A suite of hands-on semester-long projects are developed, each involving the design and implementation of a learning system that enhances a commonly-deployed application. The projects use machine learning as a unifying theme to tie together the core AI topics. In this paper, we will first provide an overview of our model and the projects being developed and will then present in some detail our experiences with one of the projects – Web User Profiling which we have used in our AI class
From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web
A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Deep Multi-object Spectroscopy to Enhance Dark Energy Science from LSST
Community access to deep (i ~ 25), highly-multiplexed optical and
near-infrared multi-object spectroscopy (MOS) on 8-40m telescopes would greatly
improve measurements of cosmological parameters from LSST. The largest gain
would come from improvements to LSST photometric redshifts, which are employed
directly or indirectly for every major LSST cosmological probe; deep
spectroscopic datasets will enable reduced uncertainties in the redshifts of
individual objects via optimized training. Such spectroscopy will also
determine the relationship of galaxy SEDs to their environments, key
observables for studies of galaxy evolution. The resulting data will also
constrain the impact of blending on photo-z's. Focused spectroscopic campaigns
can also improve weak lensing cosmology by constraining the intrinsic
alignments between the orientations of galaxies. Galaxy cluster studies can be
enhanced by measuring motions of galaxies in and around clusters and by testing
photo-z performance in regions of high density. Photometric redshift and
intrinsic alignment studies are best-suited to instruments on large-aperture
telescopes with wider fields of view (e.g., Subaru/PFS, MSE, or GMT/MANIFEST)
but cluster investigations can be pursued with smaller-field instruments (e.g.,
Gemini/GMOS, Keck/DEIMOS, or TMT/WFOS), so deep MOS work can be distributed
amongst a variety of telescopes. However, community access to large amounts of
nights for surveys will still be needed to accomplish this work. In two
companion white papers we present gains from shallower, wide-area MOS and from
single-target imaging and spectroscopy.Comment: Science white paper submitted to the Astro2020 decadal survey. A
table of time requirements is available at
http://d-scholarship.pitt.edu/36036
Benefits of Computer Based Content Analysis to Foresight
Purpose of the article: The present manuscript summarizes benefits of the use of computer-based content
analysis in a generation phase of foresight initiatives. Possible advantages, disadvantages and limitations of
the content analysis for the foresight projects are discussed as well.
Methodology/methods: In order to specify the benefits and identify the limitations of the content analysis
within the foresight, results of the generation phase of a particular foresight project performed without
and subsequently with the use of computer based content analysis tool were compared by two proposed
measurements.
Scientific aim: The generation phase of the foresight is the most demanding part in terms of analysis duration,
costs and resources due to a significant amount of reviewed text. In addition, the conclusions of the foresight
evaluation are dependent on personal views and perceptions of the foresight analysts as the evaluation is
based merely on reading. The content analysis may partially or even fully replace the reading and provide an
important benchmark.
Findings: The use of computer based content analysis tool significantly reduced time to conduct the foresight
generation phase. The content analysis tool showed very similar results as compared to the evaluation
performed by the standard reading. Only ten % of results were not revealed by the use of content analysis tool.
On the other hand, several new topics were identified by means of content analysis tool that were missed by
the reading.
Conclusions: The results of two measurements should be subjected to further testing within more foresight
projects to validate them. The computer based content analysis tool provides valuable benchmark to the
foresight analysts and partially substitute the reading. However, a complete replacement of the reading is not
recommended, as deep understanding to weak signals interpretation is essential for the foresight
- …