586 research outputs found
Meaning-focused and Quantum-inspired Information Retrieval
In recent years, quantum-based methods have promisingly integrated the
traditional procedures in information retrieval (IR) and natural language
processing (NLP). Inspired by our research on the identification and
application of quantum structures in cognition, more specifically our work on
the representation of concepts and their combinations, we put forward a
'quantum meaning based' framework for structured query retrieval in text
corpora and standardized testing corpora. This scheme for IR rests on
considering as basic notions, (i) 'entities of meaning', e.g., concepts and
their combinations and (ii) traces of such entities of meaning, which is how
documents are considered in this approach. The meaning content of these
'entities of meaning' is reconstructed by solving an 'inverse problem' in the
quantum formalism, consisting of reconstructing the full states of the entities
of meaning from their collapsed states identified as traces in relevant
documents. The advantages with respect to traditional approaches, such as
Latent Semantic Analysis (LSA), are discussed by means of concrete examples.Comment: 11 page
Effect of Tuned Parameters on a LSA MCQ Answering Model
This paper presents the current state of a work in progress, whose objective
is to better understand the effects of factors that significantly influence the
performance of Latent Semantic Analysis (LSA). A difficult task, which consists
in answering (French) biology Multiple Choice Questions, is used to test the
semantic properties of the truncated singular space and to study the relative
influence of main parameters. A dedicated software has been designed to fine
tune the LSA semantic space for the Multiple Choice Questions task. With
optimal parameters, the performances of our simple model are quite surprisingly
equal or superior to those of 7th and 8th grades students. This indicates that
semantic spaces were quite good despite their low dimensions and the small
sizes of training data sets. Besides, we present an original entropy global
weighting of answers' terms of each question of the Multiple Choice Questions
which was necessary to achieve the model's success.Comment: 9 page
Multi-Prover Commitments Against Non-Signaling Attacks
We reconsider the concept of multi-prover commitments, as introduced in the
late eighties in the seminal work by Ben-Or et al. As was recently shown by
Cr\'{e}peau et al., the security of known two-prover commitment schemes not
only relies on the explicit assumption that the provers cannot communicate, but
also depends on their information processing capabilities. For instance, there
exist schemes that are secure against classical provers but insecure if the
provers have quantum information processing capabilities, and there are schemes
that resist such quantum attacks but become insecure when considering general
so-called non-signaling provers, which are restricted solely by the requirement
that no communication takes place.
This poses the natural question whether there exists a two-prover commitment
scheme that is secure under the sole assumption that no communication takes
place; no such scheme is known.
In this work, we give strong evidence for a negative answer: we show that any
single-round two-prover commitment scheme can be broken by a non-signaling
attack. Our negative result is as bad as it can get: for any candidate scheme
that is (almost) perfectly hiding, there exists a strategy that allows the
dishonest provers to open a commitment to an arbitrary bit (almost) as
successfully as the honest provers can open an honestly prepared commitment,
i.e., with probability (almost) 1 in case of a perfectly sound scheme. In the
case of multi-round schemes, our impossibility result is restricted to
perfectly hiding schemes.
On the positive side, we show that the impossibility result can be
circumvented by considering three provers instead: there exists a three-prover
commitment scheme that is secure against arbitrary non-signaling attacks
A Method to Improve the Early Stages of the Robotic Process Automation Lifecycle
The robotic automation of processes is of much interest to
organizations. A common use case is to automate the repetitive manual
tasks (or processes) that are currently done by back-office staff
through some information system (IS). The lifecycle of any Robotic Process
Automation (RPA) project starts with the analysis of the process
to automate. This is a very time-consuming phase, which in practical
settings often relies on the study of process documentation. Such documentation
is typically incomplete or inaccurate, e.g., some documented
cases never occur, occurring cases are not documented, or documented
cases differ from reality. To deploy robots in a production environment
that are designed on such a shaky basis entails a high risk. This paper
describes and evaluates a new proposal for the early stages of an RPA
project: the analysis of a process and its subsequent design. The idea is to
leverage the knowledge of back-office staff, which starts by monitoring
them in a non-invasive manner. This is done through a screen-mousekey-
logger, i.e., a sequence of images, mouse actions, and key actions
are stored along with their timestamps. The log which is obtained in
this way is transformed into a UI log through image-analysis techniques
(e.g., fingerprinting or OCR) and then transformed into a process model
by the use of process discovery algorithms. We evaluated this method for
two real-life, industrial cases. The evaluation shows clear and substantial
benefits in terms of accuracy and speed. This paper presents the method,
along with a number of limitations that need to be addressed such that
it can be applied in wider contexts.Ministerio de Economía y Competitividad TIN2016-76956-C3-2-
Alien Registration- Dumais, Adrienne D. (Auburn, Androscoggin County)
https://digitalmaine.com/alien_docs/22693/thumbnail.jp
Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors
We investigate the application of hierarchical classification schemes to the
annotation of gene function based on several characteristics of protein
sequences including phylogenic descriptors, sequence based attributes, and
predicted secondary structure. We discuss three Bayesian models and compare
their performance in terms of predictive accuracy. These models are the
ordinary multinomial logit (MNL) model, a hierarchical model based on a set of
nested MNL models, and a MNL model with a prior that introduces correlations
between the parameters for classes that are nearby in the hierarchy. We also
provide a new scheme for combining different sources of information. We use
these models to predict the functional class of Open Reading Frames (ORFs) from
the E. coli genome. The results from all three models show substantial
improvement over previous methods, which were based on the C5 algorithm. The
MNL model using a prior based on the hierarchy outperforms both the
non-hierarchical MNL model and the nested MNL model. In contrast to previous
attempts at combining these sources of information, our approach results in a
higher accuracy rate when compared to models that use each data source alone.
Together, these results show that gene function can be predicted with higher
accuracy than previously achieved, using Bayesian models that incorporate
suitable prior information
Transcriptional landscape of the human and fly genomes: Nonlinear and multifunctional modular model of transcriptomes
Regions of the genome not coding for proteins or not involved in cis-acting regulatory activities are frequently viewed as lacking in functional value. However, a number of recent large-scale studies have revealed significant regulated transcription of unannotated portions of a variety of plant and animal genomes, allowing a new appreciation of the widespread transcription of large portions of the genome. High-resolution mapping of the sites of transcription of the human and fly genomes has provided an alternative picture of the extent and organization of transcription and has offered insights for biological functions of some of the newly identified unannotated transcripts. Considerable portions of the unannotated transcription observed are developmental or cell-type-specific parts of protein-coding transcripts, often serving as novel, alternative 5′ transcriptional start sites. These distal 5′ portions are often situated at significant distances from the annotated gene and alternatively join with or ignore portions of other intervening genes to comprise novel unannotated protein-coding transcripts. These data support an interlaced model of the genome in which many regions serve multifunctional purposes and are highly modular in their utilization. This model illustrates the underappreciated organizational complexity of the genome and one of the functional roles of transcription from unannotated portions of the genome. Copyright 2006, Cold Spring Harbor Laboratory Press © 2006 Cold Spring Harbor Laboratory Press
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Computational Indistinguishability between Quantum States and Its Cryptographic Application
We introduce a computational problem of distinguishing between two specific
quantum states as a new cryptographic problem to design a quantum cryptographic
scheme that is "secure" against any polynomial-time quantum adversary. Our
problem, QSCDff, is to distinguish between two types of random coset states
with a hidden permutation over the symmetric group of finite degree. This
naturally generalizes the commonly-used distinction problem between two
probability distributions in computational cryptography. As our major
contribution, we show that QSCDff has three properties of cryptographic
interest: (i) QSCDff has a trapdoor; (ii) the average-case hardness of QSCDff
coincides with its worst-case hardness; and (iii) QSCDff is computationally at
least as hard as the graph automorphism problem in the worst case. These
cryptographic properties enable us to construct a quantum public-key
cryptosystem, which is likely to withstand any chosen plaintext attack of a
polynomial-time quantum adversary. We further discuss a generalization of
QSCDff, called QSCDcyc, and introduce a multi-bit encryption scheme that relies
on similar cryptographic properties of QSCDcyc.Comment: 24 pages, 2 figures. We improved presentation, and added more detail
proofs and follow-up of recent wor
How Question Answering Technology Helps to Locate Malevolent Online Content
The inherent lack of control over the Internet content resulted in proliferation of online material that can be potentially detrimental. For example, the infamous “Anarchist Cookbook” teaching how to make weapons, home made bombs, and poisons, keeps re-appearing in various places. Some websites teach how to break into computer networks to steal passwords and credit card information. Law enforcement, security experts, and public watchdogs started to locate, monitor, and act when such malevolent content surfaces on the Internet. Since the resources of law enforcement are limited, it may take some time before potentially malevolent content is located, enough for it to disseminate and cause harm. Currently applied approach for searching the content of the Internet, available for law enforcement and public watchdogs is by using a search engine, such as Google, AOL, MSN, etc. We have suggested and empirically evaluated an alternative technology (called automated question answering or QA) capable of locating potentially malevolent online content. We have implemented a proof-of-concept prototype that is capable of finding web pages that may potentially contain the answers to specified questions (e.g. “How to steal a password?”). Using students as subjects in a controlled experiment, we have empirically established that our QA prototype finds web pages that are more likely to provide answers to given questions than simple keyword search using Google. This suggests that QA technology can be a good replacement or an addition to the traditional keyword searching for the task of locating malevolent online content and, possibly, for a more general task of interactive online information exploration
- …
