256 research outputs found

    Semantic similarity between words and sentences using lexical database and word embeddings

    Get PDF
    Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing. The semantic analysis field has a crucial role to play in the research related to the text analytics. The meaning of the word in general English language differs as the context changes. Hence, the semantic similarity varies significantly as the domain of operation differs. For this reason, it is crucial to consider the appropriate definition of the words when they are compared semantically. We present an unsupervised method that can be applied across multiple domains by incorporating corpora based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (Pearsons Correlation Coefficient = 0.8753) and sentence similarity (PCC = 0.8793) while comparing Rubenstein and Goodenough standard; and the SICK dataset (PCC = 0.8324) outperforming other unsupervised models. We use the semantic similarity algorithm and extend it to compare the Learning Objectives from course outlines. The course description provided by instructors is an essential piece of information as it defines what is expected from the instructor and what he/she is going to deliver during a particular course. One of the key components of a course description is the Learning Objectives section. The contents of this section are used by program managers who are tasked to compare and match two different courses during the development of Transfer Agreements between various institutions. This research introduces the development of semantic similarity algorithms to calculate the similarity between two learning objectives of the same domain. We present a methodology which deals with the semantic similarity by using a previously established algorithm and integrating it with the domain corpus to utilize domain statistics. The disambiguated domain serves as a supervised learning data for the algorithm. We also introduce Bloom Index to calculate the similarity between action verbs in the Learning Objectives referring to the Bloom's taxonomy. We also study and present the approach to calculate the semantic similarity between words under the word2vec model for a specific domain. We present a methodology to compile a corpus for a specific domain using Wikipedia. We then present a case to show the variance in the semantic similarity between words using different corpora. The core contributions of this thesis are a semantic similarity algorithm for words and sentences, and the corpus compilation of a specific domain to train the word2vec model. We also provide the practical uses of algorithms and the implementation

    A Novel Process Model-driven Approach to Comparing Educational Courses using Ontology Alignment

    Get PDF
    Nowadays, the comparison of educational courses and modules is performed manually by experts in the field of education. The main objective of this research work is to create an approach for the automation of this process. The main contribution of this work is a novel, ontology alignment-based methodology for the automated comparison of academic courses and modules, belonging to the cognitive learning domain. The results of this work are appropriate for such tasks as prior learning and degree recognition, the introduction of joint educational programmes and quality assurance in higher education institutions. The set-theoretical models of an educational course, its modules, learning outcomes and keywords were created and converted to the ontology. The choice of the information to be presented in the ontology was based on the careful analysis of programme specifications, module templates and Bologna recommendations for the comparison of educational courses. Ontology was chosen as the data model due to its ability to formally specify semantics, to represent taxonomies and to make inferences regarding data. The formal grammars of a keyword and a learning outcome were created to enable the semi-automated population of the ontology from the module templates. The corresponding annotators were designed in the General Architecture for Text Engineering 6.1. The algorithm for the comparison of educational courses and modules was based on the alignment of ontologies of their keywords and learning outcomes. A novel measure for calculating the similarity between the action verbs in the learning outcomes was introduced and was utilised. Both the measure and the algorithm were implemented in Java. For evaluation purposes, we utilised the module templates from the De Montfort and the Bauman Moscow State Technical Universities. The automatically produced annotations of the keywords and the learning outcomes were evaluated against a manually created gold standard. The high values of the precision, recall and f-measure proved their quality and their suitability for the task. The results produced by the alignment algorithm were compared with those produced by human judgement. The results returned by the experts and the algorithm were comparable, thus showing that the proposed approach is applicable for the partial automation of the comparison of educational modules and courses

    Extracting specific text from documents using machine learning algorithms

    Get PDF
    Increasing use of Portable Document Format (PDF) files has promoted research in analyzing the files' layout for text extraction purpose. For this reason, it is important to have a system in place to analyze these documents and extract required text. The purpose of this research fulfills this need by extracting specific text from PDF documents while considering the document layout. This approach is used to extract learning outcomes from academic course outlines. Our algorithm consists of a supervised leaning algorithm and white space analysis. The supervised algorithm locates the relevant text followed by white space analysis to understand document layout before extraction. The supervised learning approach used for detecting relevant text does so by looking for relevant headings, which mimics the approach used by humans while going through a document. The data set used for this research consists of 500 course outlines randomly sampled from the internet. To show the capability of our text detection algorithm to work with documents other than course outlines, it is also tested on 25 reports and articles sampled from the internet. The implemented system has shown promising results with an accuracy of 81.8% and remediated the limitation shown by the current literature by supporting documents with unknown format. The algorithm has a wide scope of applications and takes a step towards automating the task of text extraction from PDF documents

    Feedback 2.0: An Investigation into Using Sharable Feedback Tags as Programming Feedback

    Get PDF
    Objectives: Learning and teaching computer programming is a recognised challenge in Higher Education. Since feedback is regarded as being the most important part of the learning process, it is expected that improving it could support students' learning. This thesis aims to investigate how new forms of feedback can improve student learning of programming and how feedback sharing can further enhance the students' learning experience. Methods: This thesis investigates the use of new forms of feedback for programming courses. The work explores the use of collaborative tagging often found in Web 2.0 software systems and a feedback approach that requires examiners to annotate students source code with short, potentially reusable feedback. The thesis utilises a variety of research methods including questionnaires, focus groups and collection of system usage data recorded from student interactions with their feedback. Sentiment and thematic analysis are used to investigate how well feedback tags communicate the intended message from examiners to students. The approaches used are tested and refined over two preliminary investigations before use in the final investigation. Results: The work identified that a majority of students responded positively to the new feedback approach described. Student engagement was high with up to 100% viewing their feedback and at least 42% of students opting to share their feedback. Students in the cohort who achieved either the lower or higher marks for the assignment appeared more likely to share their feedback. Conclusions: This thesis has demonstrated that sharing of feedback can be useful for disseminating good practice and common pitfalls. Provision of feedback which is contextually rich and textually concise has resulted in higher engagement from students. However, the outcomes of this research have been shown to be influenced by the assessment process adopted by the University. For example, students were more likely to engage with their feedback if marks are unavailable at the time of feedback release. This issue and many others are proposed as further work

    The Future of Information Sciences : INFuture2015 : e-Institutions – Openness, Accessibility, and Preservation

    Get PDF

    Clustering student interaction data using Bloom's Taxonomy to find predictive reading patterns

    Get PDF
    In modern educational technology we have the ability to capture click-stream interaction data from a student as they work on educational problems within an online environment. This provides us with an opportunity to identify student behaviours within the data (captured by the online environment) that are predictive of student success or failure. The constraints that exist within an educational setting provide the ability to associate these student behaviours to specific educational outcomes. This information could be then used to inform environments that support student learning while improving a student’s metacognitive skills. In this dissertation, we describe how reading behaviour clusters were extracted in an experiment in which students were embedded in a learning environment where they read documents and answered questions. We tracked their keystroke level behaviour and then applied clustering techniques to find pedagogically meaningful clusters. The key to finding these clusters were categorizing the questions as to their level in Bloom’s educational taxonomy: different behaviour patterns predicted success and failure in answering questions at various levels of Bloom. The clusters found in the first experiment were confirmed through two further experiments that explored variations in the number, type, and length of documents and the kinds of questions asked. In the final experiment, we also went beyond the actual keystrokes and explored how the pauses between keystrokes as a student answers a question can be utilized in the process of determining student success. This research suggests that it should be possible to diagnose learner behaviour even in “ill-defined” domains like reading. It also suggests that Bloom’s taxonomy can be an important (even necessary) input to such diagnosis

    31th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    Information modelling is becoming more and more important topic for researchers, designers, and users of information systems.The amount and complexity of information itself, the number of abstractionlevels of information, and the size of databases and knowledge bases arecontinuously growing. Conceptual modelling is one of the sub-areas ofinformation modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines, who have a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore philosophy and logic, cognitive science, knowledge management, linguistics and management science are relevant areas, too. In the conference, there will be three categories of presentations, i.e. full papers, short papers and position papers

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes
    • …
    corecore