323 research outputs found

    Assisted query formulation using normalised word vector and dynamic ontological filtering

    Get PDF
    Information seekers using the usual search techniques and engines are delighted by the sheer power of the technology at their command ? speed, quantity. Upon closer inspection of the results, and reflection upon the next stages of the information seeking knowledge work, users are typically overwhelmed, and frustrated. We propose a partial solution by focusing on the query formulation aspect of the information seeking problem. First we introduce our version of a semantic analysis algorithm, named Normalised Word Vector, and explain its application in assisted query formulation. Secondly we introduce our ideas of supporting query refinement via Dynamic Ontological Filtering

    Determining and satisfying search users real needs via socially constructed search concept classification

    Get PDF
    The focus of the research is to disambiguate search query by categorizing search results returned by search engines and interacting with the user to achieve query and results refinement. A novel special search-browser has been developed which combines search engine results, the Open DirectoryProject (ODP) based lightweight ontology as navigator and classifier, and search results categorizing. Categories are formed based on the ODP as a predefined ontology and Lucene is to be employed to calculate the similarity between retrieved items of the search engine and concepts in the ODP. With theinteraction of users, the search-browser improves the quality of search results by excluding the irrelevant documents and ontologically categorizing results for user inspection

    An integrating text retrieval framework for Digital Ecosystems Paradigm

    Get PDF
    The purpose of the research is to provide effective information retrieval services for digital ?organisms? in a digital ecosystem by leveraging the power of Web searching technology. A novel integrating digital ecosystem search framework (a new digital organism) is proposed which employs the Web search technology and traditional database searching techniques to provide economic organisms with comprehensive, dynamic, and organization-oriented information retrieval ranging from the Internet to personal (semantic) desktop

    Use of normalized word vector approach in document classification for an LKMC

    Get PDF
    In order to realize the objective of expanding library services to provide knowledge managementsupport for small businesses, a series of requirements must be met. This particular phase of a largerresearch project focuses on one of the requirements: the need for a document classificationsystem to rapidly determine the content of digital documents. Document classification techniquesare examined to assess the available alternatives for realization of Library Knowledge ManagementCenters (LKMCs). After evaluating prominent techniques the authors opted to investigate aless well-known method, the Normalized Word Vector (NWV) approach, which has been usedsuccessfully in classifying highly unstructured documents, i.e., student essays. The authors proposeutilizing the NWV approach for LKMC automatic document classification with the goal ofdeveloping a system whereby unfamiliar documents can be quickly classified into existing topiccategories. This conceptual paper will outline an approach to test NWV's suitability in this area

    Culturally-based adaptive learning and concept analytics to guide educational website content integration

    Get PDF
    In modern learning environments, the lecturer or educational designer is often confronted withmulti-national student cohorts, requiring special consideration regarding language, cultural norms and taboos, religion, and ethics. Through a somewhat provocative example we demonstrate that taking such factors into account can be essential to avoid embarrassment and harm to individual learners' cultural sensibilities and, thus, provide the motivation for finding a solution using a specially designed feature, known as adaptive learning paths, for implementation in Learning Management Systems (LMS). Managing cultural conflicts is achievable by a twofold process. First, a learner profile must be created, in which the specific cultural parameters can be recorded. According to the learner profile,a set of content filter tags can be assigned to the learning path for the relevant students. Example content filter tags may be "no sex" or "nudity ok, but not combined with religion". Second, the LMS must have the functionality to select and present content based on the content filter tags. The design of learning material is presented via a meta-data based repository of learning objects that permits the adaptation of learning paths according to learner profiles, which include the cultural sensibilities in addition to prior knowledge and learning and categorized learning content - a detailed example is given.The drawback of using static or predefined meta-data elements is discussed, suggesting a further refinement via the introduction of dynamic concept analysis to be applied to both learner profiles and learning objects (restricted to text at this stage). An automated method of generating the content filter tags is achieved through the use of the Normalised Word Vector algorithm first developed for Automated Essay Grading system known as MarkIT (R. Williams, 2006). An automated method reduces human effort and ensures consistency.Sophisticated fine-grained dynamic learning path adaptivity is achieved through a detailed design given in the article, helping ensure that learners from a variety of cultural backgrounds can be treated appropriately and fairly and are not disadvantaged or offended by inappropriate learning content and examples

    Geospatial crowdsourced data fitness analysis for spatial data infrastructure based disaster management actions

    Get PDF
    The reporting of disasters has changed from official media reports to citizen reporters who are at the disaster scene. This kind of crowd based reporting, related to disasters or any other events, is often identified as 'Crowdsourced Data' (CSD). CSD are freely and widely available thanks to the current technological advancements. The quality of CSD is often problematic as it is often created by the citizens of varying skills and backgrounds. CSD is considered unstructured in general, and its quality remains poorly defined. Moreover, the CSD's location availability and the quality of any available locations may be incomplete. The traditional data quality assessment methods and parameters are also often incompatible with the unstructured nature of CSD due to its undocumented nature and missing metadata. Although other research has identified credibility and relevance as possible CSD quality assessment indicators, the available assessment methods for these indicators are still immature. In the 2011 Australian floods, the citizens and disaster management administrators used the Ushahidi Crowd-mapping platform and the Twitter social media platform to extensively communicate flood related information including hazards, evacuations, help services, road closures and property damage. This research designed a CSD quality assessment framework and tested the quality of the 2011 Australian floods' Ushahidi Crowdmap and Twitter data. In particular, it explored a number of aspects namely, location availability and location quality assessment, semantic extraction of hidden location toponyms and the analysis of the credibility and relevance of reports. This research was conducted based on a Design Science (DS) research method which is often utilised in Information Science (IS) based research. Location availability of the Ushahidi Crowdmap and the Twitter data assessed the quality of available locations by comparing three different datasets i.e. Google Maps, OpenStreetMap (OSM) and Queensland Department of Natural Resources and Mines' (QDNRM) road data. Missing locations were semantically extracted using Natural Language Processing (NLP) and gazetteer lookup techniques. The Credibility of Ushahidi Crowdmap dataset was assessed using a naive Bayesian Network (BN) model commonly utilised in spam email detection. CSD relevance was assessed by adapting Geographic Information Retrieval (GIR) relevance assessment techniques which are also utilised in the IT sector. Thematic and geographic relevance were assessed using Term Frequency – Inverse Document Frequency Vector Space Model (TF-IDF VSM) and NLP based on semantic gazetteers. Results of the CSD location comparison showed that the combined use of non-authoritative and authoritative data improved location determination. The semantic location analysis results indicated some improvements of the location availability of the tweets and Crowdmap data; however, the quality of new locations was still uncertain. The results of the credibility analysis revealed that the spam email detection approaches are feasible for CSD credibility detection. However, it was critical to train the model in a controlled environment using structured training including modified training samples. The use of GIR techniques for CSD relevance analysis provided promising results. A separate relevance ranked list of the same CSD data was prepared through manual analysis. The results revealed that the two lists generally agreed which indicated the system's potential to analyse relevance in a similar way to humans. This research showed that the CSD fitness analysis can potentially improve the accuracy, reliability and currency of CSD and may be utilised to fill information gaps available in authoritative sources. The integrated and autonomous CSD qualification framework presented provides a guide for flood disaster first responders and could be adapted to support other forms of emergencies

    Exploiting the conceptual space in hybrid recommender systems: a semantic-based approach

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 200

    Generating semantically enriched diagnostics for radiological images using machine learning

    Get PDF
    Development of Computer Aided Diagnostic (CAD) tools to aid radiologists in pathology detection and decision making relies considerably on manually annotated images. With the advancement of deep learning techniques for CAD development, these expert annotations no longer need to be hand-crafted, however, deep learning algorithms require large amounts of data in order to generalise well. One way in which to access large volumes of expert-annotated data is through radiological exams consisting of images and reports. Using past radiological exams obtained from hospital archiving systems has many advantages: they are expert annotations available in large quantities, covering a population-representative variety of pathologies, and they provide additional context to pathology diagnoses, such as anatomical location and severity. Learning to auto-generate such reports from images presents many challenges such as the difficulty in representing and generating long, unstructured textual information, accounting for spelling errors and repetition or redundancy, and the inconsistency across different annotators. In this thesis, the problem of learning to automate disease detection from radiological exams is approached from three directions. Firstly, a report generation model is developed such that it is conditioned on radiological image features. Secondly, a number of approaches are explored aimed at extracting diagnostic information from free-text reports. Finally, an alternative approach to image latent space learning from current state-of-the-art is developed that can be applied to accelerated image acquisition.Open Acces

    Knowledge driven approaches to e-learning recommendation.

    Get PDF
    Learners often have difficulty finding and retrieving relevant learning materials to support their learning goals because of two main challenges. The vocabulary learners use to describe their goals is different from that used by domain experts in teaching materials. This challenge causes a semantic gap. Learners lack sufficient knowledge about the domain they are trying to learn about, so are unable to assemble effective keywords that identify what they wish to learn. This problem presents an intent gap. The work presented in this thesis focuses on addressing the semantic and intent gaps that learners face during an e-Learning recommendation task. The semantic gap is addressed by introducing a method that automatically creates background knowledge in the form of a set of rich learning-focused concepts related to the selected learning domain. The knowledge of teaching experts contained in e-Books is used as a guide to identify important domain concepts. The concepts represent important topics that learners should be interested in. An approach is developed which leverages the concept vocabulary for representing learning materials and this influences retrieval during the recommendation of new learning materials. The effectiveness of our approach is evaluated on a dataset of Machine Learning and Data Mining papers, and our approach outperforms benchmark methods. The results confirm that incorporating background knowledge into the representation of learning materials provides a shared vocabulary for experts and learners, and this enables the recommendation of relevant materials. We address the intent gap by developing an approach which leverages the background knowledge to identify important learning concepts that are employed for refining learners' queries. This approach enables us to automatically identify concepts that are similar to queries, and take advantage of distinctive concept terms for refining learners' queries. Using the refined query allows the search to focus on documents that contain topics which are relevant to the learner. An e-Learning recommender system is developed to evaluate the success of our approach using a collection of learner queries and a dataset of Machine Learning and Data Mining learning materials. Users with different levels of expertise are employed for the evaluation. Results from experts, competent users and beginners all showed that using our method produced documents that were consistently more relevant to learners than when the standard method was used. The results show the benefits in using our knowledge driven approaches to help learners find relevant learning materials

    Towards Personalized and Human-in-the-Loop Document Summarization

    Full text link
    The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.Comment: PhD thesi
    • …
    corecore