Search CORE

2,812 research outputs found

Recommended from our members

OBOME - Ontology based opinion mining in UBIPOL

Author: Husani M
Ko A
Kocyigit A
Lee H
Tapucu D
Publication venue: Brunel University
Publication date: 01/01/2012
Field of study

Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy – related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated

Brunel University Research Archive

Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

Author: A Januliene
A Nenkova
AH Morris
CD Manning
HP Edmundson
J. Richard Landis
Joel Larocca Neto
PB Baxendale
T Dunning
Publication venue
Publication date: 15/08/2017
Field of study

Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular, but have remained largely unexplored. For this reason, there are no sufficient resources of annotated debate data available for conducting research in this genre. In this paper, we collected and annotated debate data for an automatic summarization task. Similar to extractive gold standard summary generation our data contains sentences worthy to include into a summary. Five human annotators performed this task. Inter-annotator agreement, based on semantic similarity, is 36% for Cohen's kappa and 48% for Krippendorff's alpha. Moreover, we also implement an extractive summarization system for online debates and discuss prominent features for the task of summarizing online debate data automatically.Comment: accepted and presented at the CICLING 2017 - 18th International Conference on Intelligent Text Processing and Computational Linguistic

arXiv.org e-Print Archive

Crossref

Streamlining Literature Reviews Using an Automatic and Flexible Data Gathering and Classification Platform

Author: Estima Jacinto
Martins António Miguel
Rodrigues da Silva Alberto
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/10/2023
Field of study

Literature reviews are a crucial but time-consuming and complex task in scientific research. As such, interest in automating this process using machine learning techniques has increased over the last few years. In this paper, we present a method of streamlining the process of writing literature reviews by automating several aspects of the process using Maestro v2023, an automatic and flexible data gathering and classification platform. Maestro v2023 is a revamped version of the original Maestro platform, designed to be modular and configurable, allowing users in an organization to create search contexts that automatically gather and classify data for them. We analyze the work related to literature review automation and suggest how Maestro can contribute to this field, demonstrating how the system was utilized in order to streamline our own literature review process, as well aid us in formulating the abstract and extracting relevant keywords to this paper

AIS Electronic Library (AISeL)

Data-Driven Decisions and Actions in Today’s Software Development

Author: Alexandru Carol V
Ciurumelea Adelina
Gall Harald
Grano Giovanni
Laaber Christoph
Panichella Sebastiano
Proksch Sebastian
Schermann Gerald
Vassallo Carmine
Zhao Jitong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues

Crossref

ZORA

Research and Development Workstation Environment: the new class of Current Research Information Systems

Author: Malakhov Kyrylo
Palagin Oleksandr
Shchurov Oleksandr
Velychko Vitalii
Publication venue
Publication date: 01/01/2018
Field of study

Against the backdrop of the development of modern technologies in the field of scientific research the new class of Current Research Information Systems (CRIS) and related intelligent information technologies has arisen. It was called - Research and Development Workstation Environment (RDWE) - the comprehensive problem-oriented information systems for scientific research and development lifecycle support. The given paper describes design and development fundamentals of the RDWE class systems. The RDWE class system's generalized information model is represented in the article as a three-tuple composite web service that include: a set of atomic web services, each of them can be designed and developed as a microservice or a desktop application, that allows them to be used as an independent software separately; a set of functions, the functional filling-up of the Research and Development Workstation Environment; a subset of atomic web services that are required to implement function of composite web service. In accordance with the fundamental information model of the RDWE class the system for supporting research in the field of ontology engineering - the automated building of applied ontology in an arbitrary domain area, scientific and technical creativity - the automated preparation of application documents for patenting inventions in Ukraine was developed. It was called - Personal Research Information System. A distinctive feature of such systems is the possibility of their problematic orientation to various types of scientific activities by combining on a variety of functional services and adding new ones within the cloud integrated environment. The main results of our work are focused on enhancing the effectiveness of the scientist's research and development lifecycle in the arbitrary domain area.Comment: In English, 13 pages, 1 figure, 1 table, added references in Russian. Published. Prepared for special issue (UkrPROG 2018 conference) of the scientific journal "Problems of programming" (Founder: National Academy of Sciences of Ukraine, Institute of Software Systems of NAS Ukraine

arXiv.org e-Print Archive

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Text summarization of online hotel reviews with sentiment analysis

Author: Ballester Basols Guifré
Publication venue: Universitat Politècnica de Catalunya
Publication date: 16/10/2021
Field of study

The aim of this thesis is the creation of a system that summarizes positive and negative property reviews. To achieve this, an extractive summarization system that produces two summaries is proposed: one for the positive reviews and another for the negative ones. This is achieved with a classification system that will feed positive and nega- tive reviews to the summarization system. To pursue our objective, a study on the different NLP methods, along with their pros and cons, was performed, leading to the conclu- sion that the use of transformers and more specifically, the combination of BERT and GPT-2 architectures, would be the best approach. To obtain the data from TripAdvisor that is in StayForLong website, a crawling process was performed from the StayForLong and TripAdvi- sor. These consisted on a total of over 80000 reviews, and over 175 properties that we pre-processed, cleaned and tokenized, in order to work with BERT for the sentiment analysis and GPT-2 for the sum- marization. Then we proceeded, with an extensive analysis in regards to the impact of the variables. Finally, we fine-tuned each of the mod- els so that it performed at its possible best. To evaluate our two systems, we evaluated the the binary sen- timent classification system, with multi-modal BERT with a 96% of precision and for the GPT-2 summarization system, we opted to apply the ROUGE-F1 metric, were we obtained an average of 57.5%

UPCommons. Portal del coneixement obert de la UPC