184,207 research outputs found
User Story Extraction from Online News with FeatureBased and Maximum Entropy Method for Software Requirements Elicitation
Software requirements query is the frst stage in software requirements engineering. Elicitation is the process of identifying software requirements from various sources such as interviews with resource persons, questionnaires, document analysis, etc. The user story is easy to adapt according to changing system requirements. The user story is a semi-structured language because the compilation of user stories must follow the syntax as a standard for writing features in agile software development methods. In addition, user story also easily understood by end-users who do not have an information technology background because they contain descriptions of system requirements in natural language. In making user stories, there are three aspects, namely the who aspect (actor), what aspect (activity), and the why aspect (reason). This study proposes the extraction of user stories consisting of who and what aspects of online news sites using feature extraction and maximum entropy as a classifcation method. The systems analyst can use the actual information related to the lessons obtained in the online news to get the required software requirements. The expected result of the extraction method in this research is to produce user stories relevant to the software requirements to assist systems analysts in generating requirements. This proposed method shows that the average precision and recall are 98.21% and 95.16% for the who aspect; 87,14% and 87,50% for what aspects; 81.21% and 78.60% for user stories. Thus, this result suggests that the proposed method generates user stories relevant to functional software
Machine Understandable Contracts with Deep Learning
This research investigates the automatic translation of contracts to computer understandable rules trough Natural Language Processing. The most challenging aspect, which is studied throughout this paper, is to understand the meaning of the contract and express it into a structured format. This problem can be reduced to the Named Entity Recognition and Rule Extraction tasks, the latter handles the extraction of terms and conditions. These two problems are difficult, but deep learning models can tackle them. We think that this paper is the first work to approach Rule Extraction with deep learning. This method is data-hungry, so the research also introduces data sets for these two tasks. Additionally, it contributes to the literature by introducing Law-Bert, a model based on BERT which is pre-trained on unlabelled contracts. The results obtained on Named Entity Recognition and Rule Extraction show that pre-training on contracts has a positive effect on performance for the downstream tasks
Comprehensive Review of Opinion Summarization
The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Recommended from our members
OBOME - Ontology based opinion mining in UBIPOL
Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy – related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated
- …