1,303 research outputs found

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis

    Get PDF
    Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development

    Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

    Get PDF
    The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.Open Acces

    Multi-Document Summarisation from Heterogeneous Software Development Artefacts

    Get PDF
    Software engineers create a vast number of artefacts during project development; activities, consisting of related information exchanged between developers. Sifting a large amount of information available within a project repository can be time-consuming. In this dissertation, we proposed a method for multi-document summarisation from heterogeneous software development artefacts to help software developers by automatically generating summaries to help them target their information needs. To achieve this aim, we first had our gold-standard summaries created; we then characterised them, and used them to identify the main types of software artefacts that describe developers’ activities in GitHub project repositories. This initial step was important for the present study, as we had no prior knowledge about the types of artefacts linked to developers’ activities that could be used as sources of input for our proposed multi-document summarisation techniques. In addition, we used the gold-standard summaries later to evaluate the quality of our summarisation techniques. We then developed extractive-based multi- document summarisation approaches to automatically summarise software development artefacts within a given time frame by integrating techniques from natural language processing, software repository mining, and data-driven search-based software engineering. The generated summaries were then evaluated in a user study to investigate whether experts considered that the generated summaries mentioned every important project activity that appeared in the gold-standard summaries. The results of the user study showed that generating summaries from different kinds of software artefacts is possible, and the generated summaries are useful in describing a project’s development activities over a given time frame. Finally, we investigated the potential of using source code comments for summarisation by assessing the documented information of Java primitive variables in comments against three types of knowledge. Results showed that the source code comments did contain additional information and could be useful for summarisation of developers’ development activities.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception

    Get PDF
    In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends, including the creation of fake news and misinformation, the generation of fake online product reviews, or via chatbots as means of convincing users to divulge private information. In this paper, we provide an overview of the NLG field via the identification and examination of 119 survey-like papers focused on NLG research. From these identified papers, we outline a proposed high-level taxonomy of the central concepts that constitute NLG, including the methods used to develop generalised NLG systems, the means by which these systems are evaluated, and the popular NLG tasks and subtasks that exist. In turn, we provide an overview and discussion of each of these items with respect to current research and offer an examination of the potential roles of NLG in deception and detection systems to counteract these threats. Moreover, we discuss the broader challenges of NLG, including the risks of bias that are often exhibited by existing text generation systems. This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research

    Multi Criteria Decision Making Approach For Product Aspect Extraction And Ranking In Aspect-Based Sentiment Analysis

    Get PDF
    Identifying product aspects in customer reviews can have a great influence on both business strategies as well as on customers’ decisions. Presently, most research focuses on machine learning, statistical, and Natural Language Processing (NLP) techniques to identify the product aspects in customer reviews. The challenge of this research is to formulate aspect identification as a decision-making problem. To this end, we propose a product aspect identification approach by combining multi-criteria decision-making (MCDM) with sentiment analysis. The suggested approach consists of two stages namely product aspect extraction and product aspect ranking

    Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

    Get PDF
    In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

    Dynamics of Networking, Knowledge and Performance of Small and Medium-Sized Tourism Enterprises (SMTEs) in Terengganu, Malaysia

    Get PDF
    Small and medium-sized tourism enterprises (SMTEs) are a fundamental component in a tourism destination’s development as they are the economic engines of the tourism sector. However, the entrepreneurs running these businesses face the challenge of scale, which leads to the lack of sufficient resources and knowledge in dealing with the competitive and complex tourism business environment. Networks may be crucial for SMTEs as there is empirical evidence from other industries that enterprises have benefited from their use. Therefore, it is the aim of this thesis to advance knowledge on the characteristics of the entrepreneurs and their SMTEs, as well as to explore the underlying benefits of absorptive capacity derived from the tourism networks for the business performance in the cultural context of Terengganu, Malaysia. A sequential mixed-methods approach was adopted, employing firstly face-to-face questionnaires and then semi-structured interviews. This thesis is considered innovative as it involved four different tourism sub-sectors (hotels and resorts, travel agencies, restaurants and handicrafts), which enables a detailed inter-sectoral comparison in researching the variables of interest. Few if any studies have taken such a detailed comparative approach within tourism studies. A descriptive analysis of the survey data suggests that the distribution of gender is fairly balanced and distinctiveness of the characteristics of entrepreneurs and SMTEs from the inter-sectoral perspective can be observed in 8 dimensions, namely: (1) level of education, (2) years of working experience inside the tourism sector, (3) years of previous working experience, (4) business training and (5) specific training in tourism, (6) years of business operations, (7) existence of a business plan and (8) annual sales turnover. A qualitative analysis shows that the motivations for the majority of entrepreneurs are based on financial rewards. Employment practices in terms of hiring family members or not differ somewhat between enterprises and sub-sectors. With these characteristics, the descriptive analysis indicates that entrepreneurs are more inclined towards accessing informal networks – in terms of the scale of the contacts, strength of the relationship and trust and frequency of communication – as compared to the formal networks. Inferential analysis revealed that 7 dimensions of characteristics of entrepreneurs have statistically significant relationships with the importance of formal networks in SMTEs’ learning, notably: (1) formal tourism training, (2) the business plan, (3) years of working in tourism sector, (4) years living in Terengganu, (5) age, (6) capital sources from personal savings and (7) capital from families. This contrasts with the determinants of the importance of informal networks in SMTEs’ learning, where only years of previous working experience was found to be statistically significant. Building on this, hierarchical regression analyses show that absorptive capacity strongly mediates the relationship between the use of formal networks and one of the dimensions of business performance, which is the business management. Within the relationship between the use of formal networks, absorptive capacity and business management, trust and transformation are the two crucial dynamics that contributed to the significant result. Use of formal networks also appears to have a statistically significant relationship with the annual sales turnover. In this relationship, trust also plays an important role and this is followed by size of the contacts and frequency of communication. For the use of informal networks, no significant relationships are found. These differences can also be seen in the qualitative interviews. The differences of the both networks lie in their breadth, significance and consistency as the offer of formal networks is considered advantageous compared to informal networks. There are also qualitative differences across the four different sub-sectors in their appreciation of the importance and offer of formal and informal networks. Entrepreneurs are found to be informal when they communicate with others at the individual level, while at the organisational level, the nature of the communications can be both formal and informal within the settings of business and community events. Overall, this research makes important theoretical contributions to the tourism body of knowledge with the development of a conceptual framework based on the systematic review of the literature that primarily highlights the interlinkages between the characteristics of entrepreneurs and SMTEs, use of formal and informal networks, absorptive capacity, business management and annual sales turnover. Drawing on statistical modelling and qualitative analysis conducted on the four different tourism sub-sectors, a final integrated model is produced. The model is the first of its kind to be based on empirical evidence on the interlinkages between the different variables investigated. It provides a strong platform for further work on networks in Malaysia and beyond

    Powering the Academic Web

    Get PDF
    Context: Locating resources on the Web has become increasingly difficult for users and poses a number of issues. The sheer size of the Web means that despite what appears to be an increase in the amount of quality material available, the effort involved in locating that material is also increasing; in effect, the higher quality material is being diluted by the lesser quality. One such group affected by this problem is post-graduate students. Having only a finite amount of time to devote to research, this reduces their overall quality study time. Aim: This research investigates how post-graduate students use the Web as a learning resource and identifies a number of areas of concern with its use. It considers the potential for improvement in this matter by using a number of concepts such as: collaboration; peer reviewing and document classification and comparison techniques. This research also investigates whether by combining several of the identified technologies and concepts, student research on the Web can be improved. Method: Using some of the identified concepts as components, this research proposes a model to address the highlighted areas of concern. The proposed model, named the Durham Browsing Assistant (DurBA) is defined, and a number of key concepts which show potential within it are uncovered. One of the key concepts is chosen, that of document comparison. Given a source document, can a computer system reliably identify other documents which most closely match it from other on the Web? A software tool was created which allowed the testing of document comparison techniques, this was called the Durham Textual Comparison system (DurTeC) and it had two key concepts. The first was that it would allow various algorithms to be applied to the comparison process. The second concept was that it could simulate collaboration by allowing data to be altered, added and removed as if by multiple users. A set of experiments were created to test these algorithms and identify those which gave the best results. Results: The results from the experiments identified a number of the most promising relationships between comparison and collaboration processes. It also highlighted those which had a negative effect on the process, and those which produced variable results. Amongst the results, it was found that: 1. By providing DurTeC with additional source documents to the original, as if through a recommendation process, it was able to increase its accuracy substantially. 2. By allowing DurTeC to use synonym lists to expand its vocabulary, in many cases, it was found to have reduced its accuracy. 3. By restricting those words which DurTeC considered in its comparison process, based upon their value in the source document, accuracy could be increased. This could be considered as a form of collaborative keyword selection. Conclusion: This research shows that improvements can be made in the accuracy of identifying similar resources by using a combination of comparison and collaboration processes. The proposed model, DurBA would be an ideal host for such a system
    corecore