3,023 research outputs found
Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties.
The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings.
Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
NLP-Based Techniques for Cyber Threat Intelligence
In the digital era, threat actors employ sophisticated techniques for which,
often, digital traces in the form of textual data are available. Cyber Threat
Intelligence~(CTI) is related to all the solutions inherent to data collection,
processing, and analysis useful to understand a threat actor's targets and
attack behavior. Currently, CTI is assuming an always more crucial role in
identifying and mitigating threats and enabling proactive defense strategies.
In this context, NLP, an artificial intelligence branch, has emerged as a
powerful tool for enhancing threat intelligence capabilities. This survey paper
provides a comprehensive overview of NLP-based techniques applied in the
context of threat intelligence. It begins by describing the foundational
definitions and principles of CTI as a major tool for safeguarding digital
assets. It then undertakes a thorough examination of NLP-based techniques for
CTI data crawling from Web sources, CTI data analysis, Relation Extraction from
cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
Finally, the challenges and limitations of NLP in threat intelligence are
exhaustively examined, including data quality issues and ethical
considerations. This survey draws a complete framework and serves as a valuable
resource for security professionals and researchers seeking to understand the
state-of-the-art NLP-based threat intelligence techniques and their potential
impact on cybersecurity
Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence
IEEE Access
Volume 3, 2015, Article number 7217798, Pages 1512-1530
Open Access
Cognition-based networks: A new perspective on network optimization using learning and distributed intelligence (Article)
Zorzi, M.a , Zanella, A.a, Testolin, A.b, De Filippo De Grazia, M.b, Zorzi, M.bc
a Department of Information Engineering, University of Padua, Padua, Italy
b Department of General Psychology, University of Padua, Padua, Italy
c IRCCS San Camillo Foundation, Venice-Lido, Italy
View additional affiliations
View references (107)
Abstract
In response to the new challenges in the design and operation of communication networks, and taking inspiration from how living beings deal with complexity and scalability, in this paper we introduce an innovative system concept called COgnition-BAsed NETworkS (COBANETS). The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide learning, modeling, optimization, and data representation. Moreover, in COBANETS, we propose to combine this learning architecture with the emerging network virtualization paradigms, which make it possible to actuate automatic optimization and reconfiguration strategies at the system level, thus fully unleashing the potential of the learning approach. Compared with the past and current research efforts in this area, the technical approach outlined in this paper is deeply interdisciplinary and more comprehensive, calling for the synergic combination of expertise of computer scientists, communications and networking engineers, and cognitive scientists, with the ultimate aim of breaking new ground through a profound rethinking of how the modern understanding of cognition can be used in the management and optimization of telecommunication network
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
- …