2,536 research outputs found
ELICA: An Automated Tool for Dynamic Extraction of Requirements Relevant Information
Requirements elicitation requires extensive knowledge and deep understanding
of the problem domain where the final system will be situated. However, in many
software development projects, analysts are required to elicit the requirements
from an unfamiliar domain, which often causes communication barriers between
analysts and stakeholders. In this paper, we propose a requirements ELICitation
Aid tool (ELICA) to help analysts better understand the target application
domain by dynamic extraction and labeling of requirements-relevant knowledge.
To extract the relevant terms, we leverage the flexibility and power of
Weighted Finite State Transducers (WFSTs) in dynamic modeling of natural
language processing tasks. In addition to the information conveyed through
text, ELICA captures and processes non-linguistic information about the
intention of speakers such as their confidence level, analytical tone, and
emotions. The extracted information is made available to the analysts as a set
of labeled snippets with highlighted relevant terms which can also be exported
as an artifact of the Requirements Engineering (RE) process. The application
and usefulness of ELICA are demonstrated through a case study. This study shows
how pre-existing relevant information about the application domain and the
information captured during an elicitation meeting, such as the conversation
and stakeholders' intentions, can be captured and used to support analysts
achieving their tasks.Comment: 2018 IEEE 26th International Requirements Engineering Conference
Workshop
Data Mining Techniques to Understand Textual Data
More than ever, information delivery online and storage heavily rely on text. Billions of texts are produced every day in the form of documents, news, logs, search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Text understanding is a fundamental and essential task involving broad research topics, and contributes to many applications in the areas text summarization, search engine, recommendation systems, online advertising, conversational bot and so on. However, understanding text for computers is never a trivial task, especially for noisy and ambiguous text such as logs, search queries. This dissertation mainly focuses on textual understanding tasks derived from the two domains, i.e., disaster management and IT service management that mainly utilizing textual data as an information carrier.
Improving situation awareness in disaster management and alleviating human efforts involved in IT service management dictates more intelligent and efficient solutions to understand the textual data acting as the main information carrier in the two domains. From the perspective of data mining, four directions are identified: (1) Intelligently generate a storyline summarizing the evolution of a hurricane from relevant online corpus; (2) Automatically recommending resolutions according to the textual symptom description in a ticket; (3) Gradually adapting the resolution recommendation system for time correlated features derived from text; (4) Efficiently learning distributed representation for short and lousy ticket symptom descriptions and resolutions. Provided with different types of textual data, data mining techniques proposed in those four research directions successfully address our tasks to understand and extract valuable knowledge from those textual data.
My dissertation will address the research topics outlined above. Concretely, I will focus on designing and developing data mining methodologies to better understand textual information, including (1) a storyline generation method for efficient summarization of natural hurricanes based on crawled online corpus; (2) a recommendation framework for automated ticket resolution in IT service management; (3) an adaptive recommendation system on time-varying temporal correlated features derived from text; (4) a deep neural ranking model not only successfully recommending resolutions but also efficiently outputting distributed representation for ticket descriptions and resolutions
Handling default data under a case-based reasoning approach
The knowledge acquired through past experiences is of the most importance when humans or machines try to find solutions for new problems based on past ones, which makes the core of any Case-based Reasoning approach to problem solving. On the other hand, existent CBR systems are neither complete nor adaptable to specific domains. Indeed, the effort to adapt either the reasoning process or the knowledge representation mechanism to a new problem is too high, i.e., it is extremely difficult to adapt the input to the computational framework in order to get a solution to a particular problem. This is the drawback that is addressed in this work.This work is funded by National Funds through the
FCT – Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within projects PEst-OE/EEI/UI0752/2014 and
PEst-OE/QUI/UI0619/2012
Supporting Analysts by Dynamic Extraction and Classification of Requirements-Related Knowledge
© 2019 IEEE. In many software development projects, analysts are required to deal with systems' requirements from unfamiliar domains. Familiarity with the domain is necessary in order to get full leverage from interaction with stakeholders and for extracting relevant information from the existing project documents. Accurate and timely extraction and classification of requirements knowledge support analysts in this challenging scenario. Our approach is to mine real-time interaction records and project documents for the relevant phrasal units about the requirements related topics being discussed during elicitation. We propose to use both generative and discriminating methods. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modelling of natural language processing tasks. We used an extended version of Support Vector Machines (SVMs) with variable-sized feature vectors to efficiently and dynamically extract and classify requirements-related knowledge from the existing documents. To evaluate the performance of our approach intuitively and quantitatively, we used edit distance and precision/recall metrics. We show in three case studies that the snippets extracted by our method are intuitively relevant and reasonably accurate. Furthermore, we found that statistical and linguistic parameters such as smoothing methods, and words contiguity and order features can impact the performance of both extraction and classification tasks
Adapting Sequence to Sequence models for Text Normalization in Social Media
Social media offer an abundant source of valuable raw data, however informal
writing can quickly become a bottleneck for many natural language processing
(NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot
explicitly handle noise found in short online posts. Moreover, the variety of
frequently occurring linguistic variations presents several challenges, even
for humans who might not be able to comprehend the meaning of such posts,
especially when they contain slang and abbreviations. Text Normalization aims
to transform online user-generated text to a canonical form. Current text
normalization systems rely on string or phonetic similarity and classification
models that work on a local fashion. We argue that processing contextual
information is crucial for this task and introduce a social media text
normalization hybrid word-character attention-based encoder-decoder model that
can serve as a pre-processing step for NLP applications to adapt to noisy text
in social media. Our character-based component is trained on synthetic
adversarial examples that are designed to capture errors commonly found in
online user-generated text. Experiments show that our model surpasses neural
architectures designed for text normalization and achieves comparable
performance with state-of-the-art related work.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM 2019
Comparing tagging suggestion models on discrete corpora
This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry
Performance comparison of Machine Learning algorithms in classifying information technologies incident tickets
Technological problems related to everyday work elements are real, and IT professionals can solve them. However, when they encounter a problem, they must go to a platform where they can detail the category and textual description of the incident so that the support agent understands. However, not all employees are rigorous and accurate in describing an incident, and there is often a category that is totally out of line with the textual description of the ticket, making the deduction of the solution by the professional more time-consuming. In this project, a solution is proposed that aims to assign a category to new incident tickets through their classification, using Text Mining, PLN and ML techniques, to try to reduce human intervention in the classification of tickets as much as possible, reducing the time spent in their perception and resolution. The results were entirely satisfactory and allowed to us determine which are the best textual processing procedures to be carried out, subsequently achieving, in most of the classification models, an accuracy higher than 90%, making its implementation legitimate.This work has been suported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020
- …