Search CORE

226 research outputs found

Clustering and its Application in Requirements Engineering

Author: Duan Chuan
Publication venue: DePaul University
Publication date: 01/04/2008
Field of study

Large scale software systems challenge almost every activity in the software development life-cycle, including tasks related to eliciting, analyzing, and specifying requirements. Fortunately many of these complexities can be addressed through clustering the requirements in order to create abstractions that are meaningful to human stakeholders. For example, the requirements elicitation process can be supported through dynamically clustering incoming stakeholders’ requests into themes. Cross-cutting concerns, which have a significant impact on the architectural design, can be identified through the use of fuzzy clustering techniques and metrics designed to detect when a theme cross-cuts the dominant decomposition of the system. Finally, traceability techniques, required in critical software projects by many regulatory bodies, can be automated and enhanced by the use of cluster-based information retrieval methods. Unfortunately, despite a significant body of work describing document clustering techniques, there is almost no prior work which directly addresses the challenges, constraints, and nuances of requirements clustering. As a result, the effectiveness of software engineering tools and processes that depend on requirements clustering is severely limited. This report directly addresses the problem of clustering requirements through surveying standard clustering techniques and discussing their application to the requirements clustering process

Via Sapientiae: The Institutional Repository at DePaul University

A data mining approach to ontology learning for automatic content-related question-answering in MOOCs.

Author: Shatnawi Safwan
Publication venue
Publication date: 31/10/2016
Field of study

The advent of Massive Open Online Courses (MOOCs) allows massive volume of registrants to enrol in these MOOCs. This research aims to offer MOOCs registrants with automatic content related feedback to fulfil their cognitive needs. A framework is proposed which consists of three modules which are the subject ontology learning module, the short text classification module, and the question answering module. Unlike previous research, to identify relevant concepts for ontology learning a regular expression parser approach is used. Also, the relevant concepts are extracted from unstructured documents. To build the concept hierarchy, a frequent pattern mining approach is used which is guided by a heuristic function to ensure that sibling concepts are at the same level in the hierarchy. As this process does not require specific lexical or syntactic information, it can be applied to any subject. To validate the approach, the resulting ontology is used in a question-answering system which analyses students' content-related questions and generates answers for them. Textbook end of chapter questions/answers are used to validate the question-answering system. The resulting ontology is compared vs. the use of Text2Onto for the question-answering system, and it achieved favourable results. Finally, different indexing approaches based on a subject's ontology are investigated when classifying short text in MOOCs forum discussion data; the investigated indexing approaches are: unigram-based, concept-based and hierarchical concept indexing. The experimental results show that the ontology-based feature indexing approaches outperform the unigram-based indexing approach. Experiments are done in binary classification and multiple labels classification settings . The results are consistent and show that hierarchical concept indexing outperforms both concept-based and unigram-based indexing. The BAGGING and random forests classifiers achieved the best result among the tested classifiers

Open Access Institutional Repository at Robert Gordon University

Messaging Forensic Framework for Cybercrime Investigation

Author: Iqbal Farkhund
Publication venue
Publication date: 27/01/2011
Field of study

Online predators, botmasters, and terrorists abuse the Internet and associated web technologies by conducting illegitimate activities such as bullying, phishing, and threatening. These activities often involve online messages between a criminal and a victim, or between criminals themselves. The forensic analysis of online messages to collect empirical evidence that can be used to prosecute cybercriminals in a court of law is one way to minimize most cybercrimes. The challenge is to develop innovative tools and techniques to precisely analyze large volumes of suspicious online messages. We develop a forensic analysis framework to help an investigator analyze the textual content of online messages with two main objectives. First, we apply our novel authorship analysis techniques for collecting patterns of authorial attributes to address the problem of anonymity in online communication. Second, we apply the proposed knowledge discovery and semantic anal ysis techniques for identifying criminal networks and their illegal activities. The focus of the framework is to collect creditable, intuitive, and interpretable evidence for both technical and non-technical professional experts including law enforcement personnel and jury members. To evaluate our proposed methods, we share our collaborative work with a local law enforcement agency. The experimental result on real-life data suggests that the presented forensic analysis framework is effective for cybercrime investigation

Concordia University Research Repository

Clustering cliques for graph-based summarization of the biomedical research literature

Author: A Naud
A Nenkova
A Ozgür
A Pons-Porrata
AR Aronson
AT McCray
AT McCray
Bartlomiej Wilkowski
C Wartena
Dongwook Shin
F Lerch
G Erkan
G Liu
GC Stein
H Kilicoglu
H Kilicoglu
H Yu
H Zhang
Han Zhang
I Mani
I Yoo
J Ah-Pine
J Goodwin
J Yang
JB Kruskal
K Sparck Jones
KW Boyack
L Smith
LH Reeve
LH Reeve
M Bundschus
M Fiszman
M Fiszman
M Kan
M Lee
Marcelo Fiszman
MG Everett
MJ Norusis
O Bodenreider
P Langfelder
P Tan
PJ Rousseeuw
R Mihalcea
SP Borgatti
T Matsunage
TC Rindflesch
TC Rindflesch
Thomas C Rindflesch
V an der Spek P Klusener S
V Batagelj
VD Blondel
X Liu
X Zhang
Y Yamamoto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively

Crossref

Springer - Publisher Connector

PubMed Central

Online Research Database In Technology

Incremental kernel learning algorithms and applications.

Author: Son Hyung-jin.
Publication venue
Publication date: 01/01/2006
Field of study

Since the Support Vector Machines (SVMs) were introduced in 1995, SVMs have been recognized as essential tools for pattern classification and function approximation. Numerous publications show that SVMs outperform other learning methods in various areas. However, SVMs have a weak performance with large-scale data sets because of high computational complexity. One approach to overcome this limitation is the incremental learning approach where a large-scale data set is divided into several subsets and trained on those subsets updating the core information extracted from the previous subset. This approach also has a drawback that the core information is accumulated during the incremental procedure. When the large-scale data set has a special structure (e.g., in the case of unbalanced data set), the standard SVM might not perform properly. In this study, a novel approach based on the reduced convex hull concept is developed and applied in various applications. In addition, the developed concept is applied to the Support Vector Regression (SVR) to produce better performance. From the performed experiments, the incremental revised SVM significantly reduces the number of support vectors and requires less computing time. In addition the incremental revised SVR produces similar results with the standard SVR by reducing computing time significantly. Furthermore, the filter concept developed in this study may be utilized to reduce the computing time in other learning approach

SHAREOK repository

Toward an Effective Automated Tracing Process

Author: Mahmoud Anas Mohammad
Publication venue: Scholars Junction
Publication date: 28/04/2014
Field of study

Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository