6,610 research outputs found

    Review of Current Student-Monitoring Techniques used in eLearning-Focused recommender Systems and Learning analytics. The Experience API & LIME model Case Study

    Get PDF
    Recommender systems require input information in order to properly operate and deliver content or behaviour suggestions to end users. eLearning scenarios are no exception. Users are current students and recommendations can be built upon paths (both formal and informal), relationships, behaviours, friends, followers, actions, grades, tutor interaction, etc. A recommender system must somehow retrieve, categorize and work with all these details. There are several ways to do so: from raw and inelegant database access to more curated web APIs or even via HTML scrapping. New server-centric user-action logging and monitoring standard technologies have been presented in past years by several groups, organizations and standard bodies. The Experience API (xAPI), detailed in this article, is one of these. In the first part of this paper we analyse current learner-monitoring techniques as an initialization phase for eLearning recommender systems. We next review standardization efforts in this area; finally, we focus on xAPI and the potential interaction with the LIME model, which will be also summarized below

    Negative Statements Considered Useful

    No full text
    Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities

    Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Quality Automated Identifier Naming

    Get PDF
    A considerable part of the source code is identifier names-- unique lexical tokens that provide information about entities, and entity interactions, within the code. Identifier names provide human-readable descriptions of classes, functions, variables, etc. Poor or ambiguous identifier names (i.e., names that do not correctly describe the code behavior they are associated with) will lead developers to spend more time working towards understanding the code\u27s behavior. Bad naming can also have detrimental effects on tools that rely on natural language clues; degrading the quality of their output and making them unreliable. Additionally, misinterpretations of the code, caused by poor names, can result in the injection of quality issues into the system under maintenance. Thus, improved identifier naming increases developer effectiveness, higher-quality software, and higher-quality software analysis tools. In this dissertation, I establish several novel concepts that help measure and improve the quality of identifiers. The output of this dissertation work is a set of identifier name appraisal and quality tools that integrate into the developer workflow. Through a sequence of empirical studies, I have formulated a series of heuristics and linguistic patterns to evaluate the quality of identifier names in the code and provide naming structure recommendations. I envision and working towards supporting developers in integrating my contributions, discussed in this dissertation, into their development workflow to significantly improve the process of crafting and maintaining high-quality identifier names in the source code

    Prioritization of Software and System Requirements through Natural Language Processing for Testing Software

    Get PDF
    Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2021Safety¬critical systems have been a constant and increased presence in industrial production, such as railways and vehicles. These systems are highly configurable and must be intensively tested by system engineers before being deliverable to customers. This process is highly time¬consuming and might require associations between the product features and requirements demanded by customers. Requirement prioritization looks to recognize the most relevant requirements of a system, aiming to reduce the costs and time of the testing process. Machine Learning has been shown useful in helping engineers in this task, automating associations between features and requirements. However, its application can be more difficult when requirements are written in natural language and if a ground truth dataset does not exist with them. In our work, we present ARRINA, a Natural Language Processing¬based recommendation system able to extract and associate components from safety¬critical systems with their specifications written in natural language and process customer requirements and map them to components. The system integrates a Weight Association Rule Mining framework to extract the components and their associations and generates visualizations that can help engineers understand which components are generally introduced in project requirements. The system also includes a recommendation framework that can associate in put requirements to existing subsystems, reducing engineers’ effort in terms of requirement analysis and prioritization. We performed several experiments to evaluate the different components of ARRINA over four railway’s subsystems and input requirements. As a result, the system achieved 90% of accuracy, which denotes its importance in reducing the time¬consuming of engineers in discovering the correct subsystem links and prioritizing requirements for the testing process

    A Comparative Study on the Effectiveness of Part-of-speech Tagging Techniques on Bug Reports

    Get PDF

    Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

    Get PDF
    Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline. Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources. Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%. Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
    corecore