2,812 research outputs found
Recommended from our members
OBOME - Ontology based opinion mining in UBIPOL
Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy – related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated
Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data
Usage of online textual media is steadily increasing. Daily, more and more
news stories, blog posts and scientific articles are added to the online
volumes. These are all freely accessible and have been employed extensively in
multiple research areas, e.g. automatic text summarization, information
retrieval, information extraction, etc. Meanwhile, online debate forums have
recently become popular, but have remained largely unexplored. For this reason,
there are no sufficient resources of annotated debate data available for
conducting research in this genre. In this paper, we collected and annotated
debate data for an automatic summarization task. Similar to extractive gold
standard summary generation our data contains sentences worthy to include into
a summary. Five human annotators performed this task. Inter-annotator
agreement, based on semantic similarity, is 36% for Cohen's kappa and 48% for
Krippendorff's alpha. Moreover, we also implement an extractive summarization
system for online debates and discuss prominent features for the task of
summarizing online debate data automatically.Comment: accepted and presented at the CICLING 2017 - 18th International
Conference on Intelligent Text Processing and Computational Linguistic
Streamlining Literature Reviews Using an Automatic and Flexible Data Gathering and Classification Platform
Literature reviews are a crucial but time-consuming and complex task in scientific research. As such, interest in automating this process using machine learning techniques has increased over the last few years. In this paper, we present a method of streamlining the process of writing literature reviews by automating several aspects of the process using Maestro v2023, an automatic and flexible data gathering and classification platform. Maestro v2023 is a revamped version of the original Maestro platform, designed to be modular and configurable, allowing users in an organization to create search contexts that automatically gather and classify data for them. We analyze the work related to literature review automation and suggest how Maestro can contribute to this field, demonstrating how the system was utilized in order to streamline our own literature review process, as well aid us in formulating the abstract and extracting relevant keywords to this paper
Data-Driven Decisions and Actions in Today’s Software Development
Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues
Research and Development Workstation Environment: the new class of Current Research Information Systems
Against the backdrop of the development of modern technologies in the field
of scientific research the new class of Current Research Information Systems
(CRIS) and related intelligent information technologies has arisen. It was
called - Research and Development Workstation Environment (RDWE) - the
comprehensive problem-oriented information systems for scientific research and
development lifecycle support. The given paper describes design and development
fundamentals of the RDWE class systems. The RDWE class system's generalized
information model is represented in the article as a three-tuple composite web
service that include: a set of atomic web services, each of them can be
designed and developed as a microservice or a desktop application, that allows
them to be used as an independent software separately; a set of functions, the
functional filling-up of the Research and Development Workstation Environment;
a subset of atomic web services that are required to implement function of
composite web service. In accordance with the fundamental information model of
the RDWE class the system for supporting research in the field of ontology
engineering - the automated building of applied ontology in an arbitrary domain
area, scientific and technical creativity - the automated preparation of
application documents for patenting inventions in Ukraine was developed. It was
called - Personal Research Information System. A distinctive feature of such
systems is the possibility of their problematic orientation to various types of
scientific activities by combining on a variety of functional services and
adding new ones within the cloud integrated environment. The main results of
our work are focused on enhancing the effectiveness of the scientist's research
and development lifecycle in the arbitrary domain area.Comment: In English, 13 pages, 1 figure, 1 table, added references in Russian.
Published. Prepared for special issue (UkrPROG 2018 conference) of the
scientific journal "Problems of programming" (Founder: National Academy of
Sciences of Ukraine, Institute of Software Systems of NAS Ukraine
Text summarization of online hotel reviews with sentiment analysis
The aim of this thesis is the creation of a system that summarizes positive and negative property reviews. To achieve this, an extractive summarization system that produces two summaries is proposed: one for the positive reviews and another for the negative ones. This is achieved with a classification system that will feed positive and nega- tive reviews to the summarization system. To pursue our objective, a study on the different NLP methods, along with their pros and cons, was performed, leading to the conclu- sion that the use of transformers and more specifically, the combination of BERT and GPT-2 architectures, would be the best approach. To obtain the data from TripAdvisor that is in StayForLong website, a crawling process was performed from the StayForLong and TripAdvi- sor. These consisted on a total of over 80000 reviews, and over 175 properties that we pre-processed, cleaned and tokenized, in order to work with BERT for the sentiment analysis and GPT-2 for the sum- marization. Then we proceeded, with an extensive analysis in regards to the impact of the variables. Finally, we fine-tuned each of the mod- els so that it performed at its possible best. To evaluate our two systems, we evaluated the the binary sen- timent classification system, with multi-modal BERT with a 96% of precision and for the GPT-2 summarization system, we opted to apply the ROUGE-F1 metric, were we obtained an average of 57.5%
- …