4,575 research outputs found

    How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10462-017-9584-0[Abstract]: Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.Carlos Gómez-Rodríguez has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant Agreement No 714150), Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R), and the Oportunius Program (Xunta de Galicia). Iago Alonso-Alonso was funded by an Oportunius Program Grant (Xunta de Galicia). David Vilares has received funding from the Ministerio de Educación, Cultura y Deporte (FPU13/01180) and Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R)

    Analyzing Domestic Abuse using Natural Language Processing on Social Media Data

    Get PDF
    Social media and social networking play a major role in billions of lives. Publicly available posts on websites such as Twitter, Reddit, Tumblr, and Facebook can contain deeply personal accounts of the lives of users – and the crises they face. Health woes, family concerns, accounts of bullying, and any number of other issues that people face every day are detailed on a massive scale online. Utilizing natural language processing and machine learning techniques, these data can be analyzed to understand societal and public health issues. Expensive surveys need not be conducted with automatic understanding of social media data, allowing faster, cost-effective data collection and analysis that can shed light on sociologically important problems. In this thesis, discussions of domestic abuse in social media are analyzed. The efficacy of classifiers that detect text discussing abuse is examined and computationally extracted characteristics of these texts are analyzed for a comprehensive view into the dynamics of abusive relationships. Analysis reveals micro-narratives in reasons for staying in versus leaving abusive relationships, as well as the stakeholders and actions in these relationships. Findings are consistent across various methods, correspond to observations in clinical literature, and affirm the relevance of natural language processing techniques for exploring issues of social importance in social media

    Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction

    Full text link
    Multimodal fusion and multitask learning are two vital topics in machine learning. Despite the fruitful progress, existing methods for both problems are still brittle to the same challenge -- it remains dilemmatic to integrate the common information across modalities (resp. tasks) meanwhile preserving the specific patterns of each modality (resp. task). Besides, while they are actually closely related to each other, multimodal fusion and multitask learning are rarely explored within the same methodological framework before. In this paper, we propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning. At its core, CEN dynamically exchanges channels between subnetworks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. For the application of dense image prediction, the validity of CEN is tested by four different scenarios: multimodal fusion, cycle multimodal fusion, multitask learning, and multimodal multitask learning. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose.Comment: 18 pages. arXiv admin note: substantial text overlap with arXiv:2011.0500

    The Pragmatics of Virtual Environments. Compliment responses in Second Life

    Get PDF
    ENThe advent of the Internet has dramatically changed, among other things, the way people learn a language. With the new ICT tools, more and more users can interact with native speakers in their target language, potentially without ever moving from home. To this regard, virtual worlds appear to be a resourceful place where language learners can meet and practice their L2. However, while these virtual worlds are being increasingly employed for language learning purposes, they still remain a linguistically unexplored ground. But if the Internet has changed the way people interact, it is also plausible that the pragmatic norms underlying this new form of communication have changed; therefore understanding their literacy would help understand their efficacy too. This study sets out to fill a gap in the literature by looking at how compliments are responded in Second Life. The results are then compared with compliment responses in real life to find out whether the language used in virtual environments faithfully reproduces the language used in face-to-face conversation, and to what extent it is convenient and significant for instructors to integrate such environments in their teaching practices. The results show a greater tendency to accept compliments than in real-life conversation. Possible pedagogical implications and directions for further research are discussed.Keywords: pragmatics; compliment responses; virtual environments; language learningITL’avvento di internet ha radicalmente cambiato, tra le altre cose, il modo in cui le persone imparano una lingua. Con i nuovi mezzi ICT, sempre più utenti possono interagire con parlanti nativi nella lingua d’arrivo, senza potenzialmente mai muoversi da casa. A questo riguardo/In questo ambito, i mondi virtuali sembrano essere un luogo pieno di risorse dove gli apprendenti di una lingua possono incontrarsi e praticare l’L2. Tuttavia, mentre questi mondi virtuali vengono sempre più utilizzati per l’apprendimento linguistico, rimangono ancora un terreno linguisticamente inesplorato. Ma se internet ha cambiato il modo in cui la gente interagisce, è anche plausibile che le norme sottostanti questa nuova forma di comunicazione siano cambiate; pertanto comprenderne il funzionamento aiuterebbe a comprenderne anche l’efficacia. Questo studio si propone di colmare un vuoto nella letteratura osservando come si risponde ai complimenti su Second Life. I risultati sono poi confrontati con le risposte ai complimenti nella conversazione reale per capire se la lingua usata negli ambienti virtuali riproduce fedelmente la lingua utilizzata nella conversazione faccia a faccia, e fino a che punto è conveniente e significativo per gli insegnanti integrare tali ambienti nelle pratiche d’insegnamento. I risultati mostrano una tendenza maggiore ad accettare i complimenti rispetto alla conversazione reale. Possibili implicazioni pedagogiche e indicazioni per ricerche future sono prese in considerazione.Parole chiave: pragmatica; risposte ai complimenti; ambienti virtuali; apprendimento linguistic

    Efficient Security Protocols for Constrained Devices

    Get PDF
    During the last decades, more and more devices have been connected to the Internet.Today, there are more devices connected to the Internet than humans.An increasingly more common type of devices are cyber-physical devices.A device that interacts with its environment is called a cyber-physical device.Sensors that measure their environment and actuators that alter the physical environment are both cyber-physical devices.Devices connected to the Internet risk being compromised by threat actors such as hackers.Cyber-physical devices have become a preferred target for threat actors since the consequence of an intrusion disrupting or destroying a cyber-physical system can be severe.Cyber attacks against power and energy infrastructure have caused significant disruptions in recent years.Many cyber-physical devices are categorized as constrained devices.A constrained device is characterized by one or more of the following limitations: limited memory, a less powerful CPU, or a limited communication interface.Many constrained devices are also powered by a battery or energy harvesting, which limits the available energy budget.Devices must be efficient to make the most of the limited resources.Mitigating cyber attacks is a complex task, requiring technical and organizational measures.Constrained cyber-physical devices require efficient security mechanisms to avoid overloading the systems limited resources.In this thesis, we present research on efficient security protocols for constrained cyber-physical devices.We have implemented and evaluated two state-of-the-art protocols, OSCORE and Group OSCORE.These protocols allow end-to-end protection of CoAP messages in the presence of untrusted proxies.Next, we have performed a formal protocol verification of WirelessHART, a protocol for communications in an industrial control systems setting.In our work, we present a novel attack against the protocol.We have developed a novel architecture for industrial control systems utilizing the Digital Twin concept.Using a state synchronization protocol, we propagate state changes between the digital and physical twins.The Digital Twin can then monitor and manage devices.We have also designed a protocol for secure ownership transfer of constrained wireless devices. Our protocol allows the owner of a wireless sensor network to transfer control of the devices to a new owner.With a formal protocol verification, we can guarantee the security of both the old and new owners.Lastly, we have developed an efficient Private Stream Aggregation (PSA) protocol.PSA allows devices to send encrypted measurements to an aggregator.The aggregator can combine the encrypted measurements and calculate the decrypted sum of the measurements.No party will learn the measurement except the device that generated it

    Statistical Deep parsing for spanish

    Get PDF
    This document presents the development of a statistical HPSG parser for Spanish. HPSG is a deep linguistic formalism that combines syntactic and semanticinformation in the same representation, and is capable of elegantly modelingmany linguistic phenomena. Our research consists in the following steps: design of the HPSG grammar, construction of the corpus, implementation of theparsing algorithms, and evaluation of the parsers performance. We created a simple yet powerful HPSG grammar for Spanish that modelsmorphosyntactic information of words, syntactic combinatorial valence, and semantic argument structures in its lexical entries. The grammar uses thirteenvery broad rules for attaching specifiers, complements, modifiers, clitics, relative clauses and punctuation symbols, and for modeling coordinations. In asimplification from standard HPSG, the only type of long range dependency wemodel is the relative clause that modifies a noun phrase, and we use semanticrole labeling as our semantic representation. We transformed the Spanish AnCora corpus using a semi-automatic processand analyzed it using our grammar implementation, creating a Spanish HPSGcorpus of 517,237 words in 17,328 sentences (all of AnCora). We implemented several statistical parsing algorithms and trained them overthis corpus. The implemented strategies are: a bottom-up baseline using bi-lexical comparisons or a multilayer perceptron; a CKY approach that uses theresults of a supertagger; and a top-down approach that encodes word sequencesusing a LSTM network. We evaluated the performance of the implemented parsers and compared them with each other and against other existing Spanish parsers. Our LSTM top-down approach seems to be the best performing parser over our test data, obtaining the highest scores (compared to our strategies and also to externalparsers) according to constituency metrics (87.57 unlabeled F1, 82.06 labeled F1), dependency metrics (91.32 UAS, 88.96 LAS), and SRL (87.68 unlabeled,80.66 labeled), but we must take in consideration that the comparison against the external parsers might be noisy due to the post-processing we needed to do in order to adapt them to our format. We also defined a set of metrics to evaluate the identification of some particular language phenomena, and the LSTM top-down parser out performed the baselines in almost all of these metrics as well.Este documento presenta el desarrollo de un parser HPSG estadístico para el español. HPSG es un formalismo lingüístico profundo que combina información sintáctica y semántica en sus representaciones, y es capaz de modelar elegantemente una buena cantidad de fenómenos lingüísticos. Nuestra investigación se compone de los siguiente pasos: diseño de la gramática HPSG, construcción del corpus, implementación de los algoritmos de parsing y evaluación de la performance de los parsers. Diseñamos una gramática HPSG para el español simple y a la vez poderosa, que modela en sus entradas léxicas la información morfosintáctica de las palabras, la valencia combinatoria sintáctica y la estructura argumental semántica. La gramática utiliza trece reglas genéricas para adjuntar especificadores, complementos, clíticos, cláusulas relativas y símbolos de puntuación, y también para modelar coordinaciones. Como simplificación de la teoría HPSG estándar, el único tipo de dependencia de largo alcance que modelamos son las cláusulas relativas que modifican sintagmas nominales, y utilizamos etiquetado de roles semánticos como representación semántica. Transformamos el corpus AnCora en español utilizando un proceso semiautomático y lo analizamos mediante nuestra implementación de la gramática, para crear un corpus HPSG en español de 517,237 palabras en 17,328 oraciones (todo el contenido de AnCora). Implementamos varios algoritmos de parsing estadístico entrenados sobre este corpus. En particular, teníamos como objetivo probar enfoques basados en redes neuronales. Las estrategias implementadas son: una línea base bottom-up que utiliza comparaciones bi-léxicas o un perceptrón multicapa; un enfoque tipo CKY que utiliza los resultados de un supertagger; y un enfoque top-down que codifica las secuencias de palabras mediante redes tipo LSTM. Evaluamos la performance de los parsers implementados y los comparamos entre sí y con un conjunto de parsers existententes para el español. Nuestro enfoque LSTM top-down parece ser el que tiene mejor desempeño para nuestro conjunto de test, obteniendo los mejores puntajes (comparado con nuestras estrategias y también con parsers externos) en cuanto a métricas de constituyentes (87.57 F1 no etiquetada, 82.06 F1 etiquetada), métricas de dependencias (91.32 UAS, 88.96 LAS), y SRL (87.68 no etiquetada, 80.66 etiquetada), pero debemos tener en cuenta que la comparación con parsers externos puede ser ruidosa debido al post procesamiento realizado para adaptarlos a nuestro formato. También definimos un conjunto de métricas para evaluar la identificación de algunos fenómenos particulares del lenguaje, y el parser LSTM top-down obtuvo mejores resultados que las baselines para casi todas estas métricas

    Navigation, findability and the usage of cultural heritage on the web: an exploratory study

    Get PDF
    The present thesis investigates the usage of cultural heritage resources on the web. In recent years cultural heritage objects has been digitalized and made available on the web for the general public to use. The thesis addresses to what extent the digitalized material is used, and how findable it is on the web. On the web resources needs to be findable in order to be visited and used. The study is done at the intersection of several research areas in Library and Information Science; Information Seeking/Human Information Behaviour, Interactive Information Retrieval, and Webometrics. The two thesis research questions focus on different aspects of the study: (1) findability on the web; and (2) the usage and the users. The usage of the cultural heritage is analysed with Savolainen’s Everyday Life Information Seeking (ELIS) framework. The IS&R framework by Ingwersen and Järvelin is the main theoretical foundation, and a conceptual framework is developed so the examined aspects could be related to each other more clearly. An important distinction in the framework is between object and resource. An object is a single document, file or html page, whereas a resource is a collection of objects, e.g. a cultural heritage web site. Three webometric levels are used to both combine and distinguish the data types: usage, content, and structure. The interaction between the system and its users’ information search process was divided into query dependent and query independent aspects. The query dependent aspects contain the information need on the user side and the topic of the content on the system side. The query independent aspects are the structural findability on the system side and the users search skills on the user side. The conceptual framework is summarised in the User-Resource Interaction (URI) model. The research design is a methodological triangulation, in the form of a mixed methods approach in order to obtain measures and indicators of the resources and the usage from different angels. Four methods are used: site structure analysis; log analysis; web survey; and findability analysis. The research design is both sequential and parallel, the site structure analysis preceded the log analysis and the findability analysis, and the web survey was employed independent of the other methods. Three Danish resources are studied: Arkiv for Dansk Litteratur (ADL), a collection of literary texts written by authors; Kunst Index Danmark (KID), an index of the holdings in the Danish art museums; and Guaman Poma Inch Chronicle (Poma), a digitalized manuscript on the UNESCO list of World cultural heritage. The studied log covers all usage during the period October to December 2010. The site structure is analysed so the resources can be described as different levels, based on function and content. The results from the site structure analysis are used both in the log analysis and the findability analysis, as well as a way to describe the resources. In the log analysis navigation strategies and navigation patterns are studied. Navigation through a web search engine is the most common way to reach the resources, but both direct navigation and link navigation are also used in all three resources. Most users arrive in the middle level in ADL and KID, at information on authors and artists. On average cultural heritage objects are viewed in half of the session. In the analysis of the web survey answers two groups of users’ are distinguished, the professional user in a work context and users in a hobby or leisure context. School or study as a context is prominent in Guaman Poma, the Inca Chronicle. Generally are pages about the cultural heritage more frequently visited than the digitized cultural heritage objects. In the findability framework six aspects are identified as central for the findability of an object on the web: attributes of the object, accessibility, internal navigation, internal search, reachability and web prestige. The six aspects are evaluated through seven indicators. All studied objects are findable in the analysis using the findability framework. A findability issue in KID is the use of the secure https protocol instead of http, which leads to the objects in KID having no PageRank value in Google and thereby a lower ranking in comparison to similar objects with a PageRank value. The internal findability is reduced for the objects in top of all three resources, e.g. the first page, due to the focus of the internal search engine on the cultural heritage objects. Several possible adjustment or developments of the findability frameworks is discussed, such as changing the weightning between the aspects measured, alternative scores and automated measuring. In conclusion, the investigation adds to our knowledge about how resources with digitalized cultural heritage are accessed and used, as well as how findable they are. The thesis provides both theoretical and conceptual contributions to research. The IS&R framework has been adapted to the web, the information search process was split into query dependent and query independent aspects, and a whole findability framework has been developed. Both the empirical findings and the theoretical advancements support the development of better access to web resources

    Navigation, findability and the usage of cultural heritage on the web:an exploratory study

    Get PDF
    The present thesis investigates the usage of cultural heritage resources on the web. In recent years cultural heritage objects has been digitalized and made available on the web for the general public to use. The thesis addresses to what extent the digitalized material is used, and how findable it is on the web. On the web resources needs to be findable in order to be visited and used. The study is done at the intersection of several research areas in Library and Information Science; Information Seeking/Human Information Behaviour, Interactive Information Retrieval, and Webometrics. The two thesis research questions focus on different aspects of the study: (1) findability on the web; and (2) the usage and the users. The usage of the cultural heritage is analysed with Savolainen’s Everyday Life Information Seeking (ELIS) framework. The IS&R framework by Ingwersen and Järvelin is the main theoretical foundation, and a conceptual framework is developed so the examined aspects could be related to each other more clearly. An important distinction in the framework is between object and resource. An object is a single document, file or html page, whereas a resource is a collection of objects, e.g. a cultural heritage web site. Three webometric levels are used to both combine and distinguish the data types: usage, content, and structure. The interaction between the system and its users’ information search process was divided into query dependent and query independent aspects. The query dependent aspects contain the information need on the user side and the topic of the content on the system side. The query independent aspects are the structural findability on the system side and the users search skills on the user side. The conceptual framework is summarised in the User-Resource Interaction (URI) model. The research design is a methodological triangulation, in the form of a mixed methods approach in order to obtain measures and indicators of the resources and the usage from different angels. Four methods are used: site structure analysis; log analysis; web survey; and findability analysis. The research design is both sequential and parallel, the site structure analysis preceded the log analysis and the findability analysis, and the web survey was employed independent of the other methods. Three Danish resources are studied: Arkiv for Dansk Litteratur (ADL), a collection of literary texts written by authors; Kunst Index Danmark (KID), an index of the holdings in the Danish art museums; and Guaman Poma Inch Chronicle (Poma), a digitalized manuscript on the UNESCO list of World cultural heritage. The studied log covers all usage during the period October to December 2010. The site structure is analysed so the resources can be described as different levels, based on function and content. The results from the site structure analysis are used both in the log analysis and the findability analysis, as well as a way to describe the resources. In the log analysis navigation strategies and navigation patterns are studied. Navigation through a web search engine is the most common way to reach the resources, but both direct navigation and link navigation are also used in all three resources. Most users arrive in the middle level in ADL and KID, at information on authors and artists. On average cultural heritage objects are viewed in half of the session. In the analysis of the web survey answers two groups of users’ are distinguished, the professional user in a work context and users in a hobby or leisure context. School or study as a context is prominent in Guaman Poma, the Inca Chronicle. Generally are pages about the cultural heritage more frequently visited than the digitized cultural heritage objects. In the findability framework six aspects are identified as central for the findability of an object on the web: attributes of the object, accessibility, internal navigation, internal search, reachability and web prestige. The six aspects are evaluated through seven indicators. All studied objects are findable in the analysis using the findability framework. A findability issue in KID is the use of the secure https protocol instead of http, which leads to the objects in KID having no PageRank value in Google and thereby a lower ranking in comparison to similar objects with a PageRank value. The internal findability is reduced for the objects in top of all three resources, e.g. the first page, due to the focus of the internal search engine on the cultural heritage objects. Several possible adjustment or developments of the findability frameworks is discussed, such as changing the weightning between the aspects measured, alternative scores and automated measuring. In conclusion, the investigation adds to our knowledge about how resources with digitalized cultural heritage are accessed and used, as well as how findable they are. The thesis provides both theoretical and conceptual contributions to research. The IS&R framework has been adapted to the web, the information search process was split into query dependent and query independent aspects, and a whole findability framework has been developed. Both the empirical findings and the theoretical advancements support the development of better access to web resources

    The Effects of IT, Task, Workgroup, and Knowledge Factors on Workgroup Outcomes: A Longitudinal Investigation

    Get PDF
    In order to successfully manage the knowledge-related processes occurring in their workgroups, organizations need to understand how different contingency factors affect the knowledge-related processes of a workgroup, ultimately affecting the workgroup\u27s knowledge outcomes and performance. To obtain a deeper understanding of the longitudinal effects of different contingency factors on knowledge outcomes and performance of workgroups, this dissertation was guided by the research question: Which factors, from the five categories of factors (a) characteristics of the workgroup; (b) characteristics of the tasks assigned to the workgroup; (c) the interface between the workgroup and the tasks; (d) characteristics of the knowledge required to complete the tasks; and (e) characteristics of the information technologies, affect workgroup outcomes, including (i) average consensus among a workgroup\u27s members about each other\u27s areas of knowledge; (ii) average accuracy of knowledge; and (iii) performance of the workgroup, over time, and in what way? Workgroup processes considered were categorized into three groups: processes related to scheduling of tasks, processes related to completion of tasks and processes accompanying those related to completion of tasks. Results indicate that only a subset of contingency factors from each category affect each of the workgroup outcomes. Specifically, average task priority, average knowledge-intensity of subtasks, average propensity to share, time in training phase, probability of non-specific exchange, number of agents, number of locations and average project intensity were found to have a positive effect on average consensus, while average task intensity, average self-knowledge and average number of tasks per agent had negative effect on average consensus. In the case of average accuracy of knowledge, average knowledge level and number of agents were found to have a positive significant effect. Finally, in the case of percentage of project completed, average propensity to share, average knowledge level, average self-knowledge, and time in training phase were found to have a positive significant effect, while average knowledge intensity of subtasks, richness of email, and average direction time were found to have a negative significant effect. Average number of tasks per agent was found to have a significant negative effect between workgroups and positive significant effect within workgroups
    • …
    corecore