9 research outputs found

    Automated pairwise testing approach based on classification tree modeling and negative selection algorithm

    Get PDF
    Generating the test cases for analysis is an important activity in software testing to increase the trust level of users. The traditional way to generate test cases is called exhaustive testing. It is infeasible and time consuming because it generates too many numbers of test cases. A combinatorial testing was used to solve the exhaustive testing problem. The popular technique in combinatorial testing is called pairwise testing that involves the interaction of two parameters. Although pairwise testing can cover the exhaustive testing problems, there are several issues that should be considered. First issue is related to modeling of the system under test (SUT) as a preprocess for test case generation as it has yet to be implemented in automated proposed approaches. The second issue is different approaches generate different number of test cases for different covering arrays. These issues showed that there is no one efficient way to find the optimal solution in pairwise testing that would consider the invalid combination or constraint. Therefore, a combination of Classification Tree Method and Negative Selection Algorithm (CTM-NSA) was developed in this research. The CTM approach was revised and enhanced to be used as the automated modeling and NSA approach was developed to optimize the pairwise testing by generate the low number of test cases. The findings showed that the CTM-NSA outperformed the other modeling method in terms of easing the tester and generating a low number of test cases in the small SUT size. Furthermore, it is comparable to the efficient approaches as compared to many of the test case generation approaches in large SUT size as it has good characteristic in detecting the self and non-self-sample. This characteristic occurs during the detection stage of NSA by covering the best combination of values for all parameters and considers the invalid combinations or constraints in order to achieve a hundred percent pairwise testing coverage. In addition, validation of the approach was performed using Statistical Wilcoxon Signed-Rank Test. Based on these findings, CTM-NSA had been shown to be able perform modeling in an automated way and achieve the minimum or a low number of test cases in small SUT size

    Reuse Alternatives based on the Sources of Software Assets

    Get PDF
        Abstract— Since the idea of software reuse appeared in 1968, software reuse has become a software engineering discipline. Software reuse is one of the main techniques used to enhance the productivity of software development, which it helps reducing the time, effort, and cost of developing software systems, and enhances the quality of software products. However, software reuse requires understanding, modifying, adapting and testing processes in order to be performed correctly and efficiently. This study aims to analyze and discuss the process of software reuse, identify its elements, sources and usages. The alternatives of acquiring and using software assets either normal or reusable assets are discussed. As a result of this study, four main methods are proposed in order to use the concept of reuse in the software development process. These methods are proposed based on the source of software assets regardless the types of software assets and their usages

    Sentence boundary extraction from scientific literature of electric double layer capacitor domain: Tools and techniques

    Get PDF
    Given the growth of scientific literature on the web, particularly material science, acquiring data precisely from the literature has become more significant. Material information systems, or chemical information systems, play an essential role in discovering data, materials, or synthesis processes using the existing scientific literature. Processing and understanding the natural language of scientific literature is the backbone of these systems, which depend heavily on appropriate textual content. Appropriate textual content means a complete, meaningful sentence from a large chunk of textual content. The process of detecting the beginning and end of a sentence and extracting them as correct sentences is called sentence boundary extraction. The accurate extraction of sentence boundaries from PDF documents is essential for readability and natural language processing. Therefore, this study provides a comparative analysis of different tools for extracting PDF documents into text, which are available as Python libraries or packages and are widely used by the research community. The main objective is to find the most suitable technique among the available techniques that can correctly extract sentences from PDF files as text. The performance of the used techniques Pypdf2, Pdfminer.six, Pymupdf, Pdftotext, Tika, and Grobid is presented in terms of precision, recall, f-1 score, run time, and memory consumption. NLTK, Spacy, and Gensim Natural Language Processing (NLP) tools are used to identify sentence boundaries. Of all the techniques studied, the Grobid PDF extraction package using the NLP tool Spacy achieved the highest f-1 score of 93% and consumed the least amount of memory at 46.13 MegaBytes

    AN EXPLORATORY STUDY TO INVESTIGATE THE USE OF AUGMENTED REALITY IN SUPPORTING COLLABORATIVE LEARNING PROCESSES

    Get PDF
    Collaborative learning mediated by technology was proven as one ofthe efficient learning approaches which benefit in both academic and soft skills. Despite advantages offered by technologies, some obstacles were also created e.g. problem in supporting interaction and communication, disregard of physical and sharing objects' roles. Augmented Reality (AR) offers a unique learning experience by combining the physical and virtual object. The benefits of rich media were enduring and the role of physical material is also considered. However, the literature reveals that there is no design guideline specifically intended for AR based collaborative learning. Hence, this research aims to study and proposed a conceptual framework to guide the development of collaborative AR in learning

    Ausgewählte Chancen und Herausforderungen der digitalen Transformation für die Produktentwicklung und Unternehmensorganisation im Finanzdienstleistungssektor

    Get PDF
    Vor dem Hintergrund der digitalen Transformation sind Finanzdienstleistungsunternehmen auf unterschiedlichen Ebenen zahlreichen Chancen sowie Herausforderungen ausgesetzt. Während der Einsatz neuer Technologien die Optimierung bestehender Geschäftsprozesse sowie das Angebot digitalisierter Finanzdienstleistungen ermöglicht, geht dies zugleich mit veränderten Arbeitsbedingungen innerhalb der Unternehmensorganisation einher. Darüber hinaus sind Finanzdienstleister dazu angehalten die sich ändernden Kundenerwartungen bei den bisherigen Geschäftsaktivitäten sowie bei der Produktentwicklung zu berücksichtigen. Das Ziel der vorliegenden kumulativen Dissertation ist es, bestehende Forschungsdesiderate hinsichtlich der Auswirkungen der digitalen Transformation auf den Finanzdienstleistungssektor, differenziert nach der Kunden- und Produktperspektive sowie der internen Unternehmensperspektive, vertiefend zu analysieren. Das Technology-Organization-Environment (TOE)-Framework von DePietro et al. (1990) wird dabei als theoretischer Rahmen zur Einordnung und Strukturierung der Forschungsmodule verwendet. Die Ergebnisse der acht Module zeigen, dass die Kundenbedürfnisse und –erwartungen im Finanzdienstleistungssektor verstärkt von der digitalen Transformation beeinflusst werden. Dies zeigt sich in der Beratungstätigkeit bspw. durch das Angebot neuer Kundenkanäle sowie der aus dem steigenden Wettbewerbsdruck resultierenden erhöhten Preistransparenz. Im Rahmen der Produktentwicklung sind zudem u. a. ESG-Risiken und Silent Cyber-Risiken zu beachten. Aus der Analyse der Auswirkungen der digitalen Transformation auf die Unternehmensorganisation geht hervor, dass über den Einsatz digitaler Innovationen innerhalb des Backoffice die Realisation von Effizienzgewinnen sowie das Entgegenwirken eines Personalmangels möglich ist. Darüber hinaus wird in den Modulen der Einfluss des Faktors Mensch auf die Cyber-Sicherheit hervorgehoben. Während dieser einerseits als „schwächstes Glied“ und potenzielles Angriffsziel im Sicherheitskonstrukt der Unternehmen dargestellt wird, ist andererseits das Potenzial der Beschäftigten zur Frühwarnung zu berücksichtigen

    Data-efficient methods for information extraction

    Get PDF
    Strukturierte Wissensrepräsentationssysteme wie Wissensdatenbanken oder Wissensgraphen bieten Einblicke in Entitäten und Beziehungen zwischen diesen Entitäten in der realen Welt. Solche Wissensrepräsentationssysteme können in verschiedenen Anwendungen der natürlichen Sprachverarbeitung eingesetzt werden, z. B. bei der semantischen Suche, der Beantwortung von Fragen und der Textzusammenfassung. Es ist nicht praktikabel und ineffizient, diese Wissensrepräsentationssysteme manuell zu befüllen. In dieser Arbeit entwickeln wir Methoden, um automatisch benannte Entitäten und Beziehungen zwischen den Entitäten aus Klartext zu extrahieren. Unsere Methoden können daher verwendet werden, um entweder die bestehenden unvollständigen Wissensrepräsentationssysteme zu vervollständigen oder ein neues strukturiertes Wissensrepräsentationssystem von Grund auf zu erstellen. Im Gegensatz zu den gängigen überwachten Methoden zur Informationsextraktion konzentrieren sich unsere Methoden auf das Szenario mit wenigen Daten und erfordern keine große Menge an kommentierten Daten. Im ersten Teil der Arbeit haben wir uns auf das Problem der Erkennung von benannten Entitäten konzentriert. Wir haben an der gemeinsamen Aufgabe von Bacteria Biotope 2019 teilgenommen. Die gemeinsame Aufgabe besteht darin, biomedizinische Entitätserwähnungen zu erkennen und zu normalisieren. Unser linguistically informed Named-Entity-Recognition-System besteht aus einem Deep-Learning-basierten Modell, das sowohl verschachtelte als auch flache Entitäten extrahieren kann; unser Modell verwendet mehrere linguistische Merkmale und zusätzliche Trainingsziele, um effizientes Lernen in datenarmen Szenarien zu ermöglichen. Unser System zur Entitätsnormalisierung verwendet String-Match, Fuzzy-Suche und semantische Suche, um die extrahierten benannten Entitäten mit den biomedizinischen Datenbanken zu verknüpfen. Unser System zur Erkennung von benannten Entitäten und zur Entitätsnormalisierung erreichte die niedrigste Slot-Fehlerrate von 0,715 und belegte den ersten Platz in der gemeinsamen Aufgabe. Wir haben auch an zwei gemeinsamen Aufgaben teilgenommen: Adverse Drug Effect Span Detection (Englisch) und Profession Span Detection (Spanisch); beide Aufgaben sammeln Daten von der Social Media Plattform Twitter. Wir haben ein Named-Entity-Recognition-Modell entwickelt, das die Eingabedarstellung des Modells durch das Stapeln heterogener Einbettungen aus verschiedenen Domänen verbessern kann; unsere empirischen Ergebnisse zeigen komplementäres Lernen aus diesen heterogenen Einbettungen. Unser Beitrag belegte den 3. Platz in den beiden gemeinsamen Aufgaben. Im zweiten Teil der Arbeit untersuchten wir Strategien zur Erweiterung synthetischer Daten, um ressourcenarme Informationsextraktion in spezialisierten Domänen zu ermöglichen. Insbesondere haben wir backtranslation an die Aufgabe der Erkennung von benannten Entitäten auf Token-Ebene und der Extraktion von Beziehungen auf Satzebene angepasst. Wir zeigen, dass die Rückübersetzung sprachlich vielfältige und grammatikalisch kohärente synthetische Sätze erzeugen kann und als wettbewerbsfähige Erweiterungsstrategie für die Aufgaben der Erkennung von benannten Entitäten und der Extraktion von Beziehungen dient. Bei den meisten realen Aufgaben zur Extraktion von Beziehungen stehen keine kommentierten Daten zur Verfügung, jedoch ist häufig ein großer unkommentierter Textkorpus vorhanden. Bootstrapping-Methoden zur Beziehungsextraktion können mit diesem großen Korpus arbeiten, da sie nur eine Handvoll Startinstanzen benötigen. Bootstrapping-Methoden neigen jedoch dazu, im Laufe der Zeit Rauschen zu akkumulieren (bekannt als semantische Drift), und dieses Phänomen hat einen drastischen negativen Einfluss auf die endgültige Genauigkeit der Extraktionen. Wir entwickeln zwei Methoden zur Einschränkung des Bootstrapping-Prozesses, um die semantische Drift bei der Extraktion von Beziehungen zu minimieren. Unsere Methoden nutzen die Graphentheorie und vortrainierte Sprachmodelle, um verrauschte Extraktionsmuster explizit zu identifizieren und zu entfernen. Wir berichten über die experimentellen Ergebnisse auf dem TACRED-Datensatz für vier Relationen. Im letzten Teil der Arbeit demonstrieren wir die Anwendung der Domänenanpassung auf die anspruchsvolle Aufgabe der mehrsprachigen Akronymextraktion. Unsere Experimente zeigen, dass die Domänenanpassung die Akronymextraktion in wissenschaftlichen und juristischen Bereichen in sechs Sprachen verbessern kann, darunter auch Sprachen mit geringen Ressourcen wie Persisch und Vietnamesisch.The structured knowledge representation systems such as knowledge base or knowledge graph can provide insights regarding entities and relationship(s) among these entities in the real-world, such knowledge representation systems can be employed in various natural language processing applications such as semantic search, question answering and text summarization. It is infeasible and inefficient to manually populate these knowledge representation systems. In this work, we develop methods to automatically extract named entities and relationships among the entities from plain text and hence our methods can be used to either complete the existing incomplete knowledge representation systems to create a new structured knowledge representation system from scratch. Unlike mainstream supervised methods for information extraction, our methods focus on the low-data scenario and do not require a large amount of annotated data. In the first part of the thesis, we focused on the problem of named entity recognition. We participated in the shared task of Bacteria Biotope 2019, the shared task consists of recognizing and normalizing the biomedical entity mentions. Our linguistically informed named entity recognition system consists of a deep learning based model which can extract both nested and flat entities; our model employed several linguistic features and auxiliary training objectives to enable efficient learning in data-scarce scenarios. Our entity normalization system employed string match, fuzzy search and semantic search to link the extracted named entities to the biomedical databases. Our named entity recognition and entity normalization system achieved the lowest slot error rate of 0.715 and ranked first in the shared task. We also participated in two shared tasks of Adverse Drug Effect Span detection (English) and Profession Span Detection (Spanish); both of these tasks collect data from the social media platform Twitter. We developed a named entity recognition model which can improve the input representation of the model by stacking heterogeneous embeddings from a diverse domain(s); our empirical results demonstrate complementary learning from these heterogeneous embeddings. Our submission ranked 3rd in both of the shared tasks. In the second part of the thesis, we explored synthetic data augmentation strategies to address low-resource information extraction in specialized domains. Specifically, we adapted backtranslation to the token-level task of named entity recognition and sentence-level task of relation extraction. We demonstrate that backtranslation can generate linguistically diverse and grammatically coherent synthetic sentences and serve as a competitive augmentation strategy for the task of named entity recognition and relation extraction. In most of the real-world relation extraction tasks, the annotated data is not available, however, quite often a large unannotated text corpus is available. Bootstrapping methods for relation extraction can operate on this large corpus as they only require a handful of seed instances. However, bootstrapping methods tend to accumulate noise over time (known as semantic drift) and this phenomenon has a drastic negative impact on the final precision of the extractions. We develop two methods to constrain the bootstrapping process to minimise semantic drift for relation extraction; our methods leverage graph theory and pre-trained language models to explicitly identify and remove noisy extraction patterns. We report the experimental results on the TACRED dataset for four relations. In the last part of the thesis, we demonstrate the application of domain adaptation to the challenging task of multi-lingual acronym extraction. Our experiments demonstrate that domain adaptation can improve acronym extraction within scientific and legal domains in 6 languages including low-resource languages such as Persian and Vietnamese

    Modelling cost and time performance of public building projects in a terror impacted area of Nigeria

    Get PDF
    Examine the impact of construction cost-and-time-influencing factors on the production performance of public building projects in north eastern Nigeria and whether a predictive model could be devised in assessing this impact. The global poor performances of construction project cost and time, coupled with a dearth of studies into using machine learning systems as artificial neural networks for advanced cost and time impact predictions, has made researching into construction project performance imperative. Moreover, the Boko Haram insurgency in Nigeria's North East geopolitical zone intensified the disruptions of construction site programmes with consequent cost increases. Therefore, prediction and performance measurement tools for construction project cost and time can possibly be developed from the examination of data on initial contract sums, estimated construction duration, final cost, actual construction duration, and the influence of cost and time driving factors on public building projects in north eastern Nigeria. The research objectives include an assessment of the factors influencing the cost and time performance of public building projects in north eastern Nigeria and determination of the cost and time performance of selected public building projects in the study area. Others are the development of models for assessing the impact of cost and time influencing factors on the performance of cost and time in public building projects in the study area. Lastly validation of the developed cost and time performance impact assessment models of public building projects in the study area. A quantitative research approach that employs a questionnaire survey was adopted in sourcing primary and secondary data from purposively sampled construction industry professionals. The study used one-way between-groups Analysis of Variance with a post-hoc test, one-way repeated measures Analysis of Variance with Wilks’ lambda tests, multiple linear regressions (MLR), Factor analysis (FA), and Artificial Neural Network (ANN) to analyze the data collected. This data was about initial contract sums, estimated construction duration, final cost, actual construction duration, and the influence of the identified driving factors on public building projects, completed between 2012 and 2017. The study found that the mean percentage cost overrun of the projects studied decreases from uncomplicated projects to moderately complex projects; and increases from moderately complex to largely complex projects. Also, the mean percentage time overrun decreases with increases in project complexity. The significant cost influencing factors are the inexperience of the contract manager, payment delays to main contractors, unstable foreign exchange, variations to works, and corrupt practices. The time influencing factors include design errors, cash flow problems, payment delays to main contractors, contractors’ improper contract knowledge, and delay in building plans and approval. These factors found in the study area were used to develop MLR and ANN cost and time impact prediction models. The developed ANN impact prediction models were validated and compared using previous similar studies in terms of relative absolute deviations and mean absolute percentage errors. The MLR models, although better than the ANNs in terms of mean absolute percentage errors and relative mean absolute deviations, it yeilded poor explanations of the variances in the dependent variables (impacts or overruns) by the independent variables (multiple influence factors). The alternative ANN impact prediction models’ statistics are: (i) MAPE of the developed cost impact model prediction efficiency was found to be 93.54%. The Rel. MAD of the developed cost impact model was computed to be 1.46, in other words, plus or minus 1.46; (ii) the MAPE of the developed duration impact model iii prediction efficiency is 92.94%. The Rel. MAD of the developed duration impact model computed is 0.85, in other words, plus or minus 0.85. The ANN models compared favourably with previous similar studies in terms of relative absolute deviations and mean absolute percentage errors. The ANN’s capability of learning from examples represents an innovative approach to modelling. The study concludes that the developed ANN cost and time impact prediction models have the potential to aid the construction contractor in predicting the cost outcome of a project during the construction stage, by using the significant cost and time influencing factors. The study recommends that project managers, contractors, quantity surveyors, architects, builders and engineers should place priority on the significant factors identified in this study in their project planning, monitoring and control activities. Also recommended is the conversion of the developed models into Dashboards that construction professionals could use to promptly identify the factors influencing cost and time on construction projects, and to monitor performance
    corecore