2,720 research outputs found

    Dataflow Programming and Acceleration of Computationally-Intensive Algorithms

    Get PDF
    The volume of unstructured textual information continues to grow due to recent technological advancements. This resulted in an exponential growth of information generated in various formats, including blogs, posts, social networking, and enterprise documents. Numerous Enterprise Architecture (EA) documents are also created daily, such as reports, contracts, agreements, frameworks, architecture requirements, designs, and operational guides. The processing and computation of this massive amount of unstructured information necessitate substantial computing capabilities and the implementation of new techniques. It is critical to manage this unstructured information through a centralized knowledge management platform. Knowledge management is the process of managing information within an organization. This involves creating, collecting, organizing, and storing information in a way that makes it easily accessible and usable. The research involved the development textual knowledge management system, and two use cases were considered for extracting textual knowledge from documents. The first case study focused on the safety-critical documents of a railway enterprise. Safety is of paramount importance in the railway industry. There are several EA documents including manuals, operational procedures, and technical guidelines that contain critical information. Digitalization of these documents is essential for analysing vast amounts of textual knowledge that exist in these documents to improve the safety and security of railway operations. A case study was conducted between the University of Huddersfield and the Railway Safety Standard Board (RSSB) to analyse EA safety documents using Natural language processing (NLP). A graphical user interface was developed that includes various document processing features such as semantic search, document mapping, text summarization, and visualization of key trends. For the second case study, open-source data was utilized, and textual knowledge was extracted. Several features were also developed, including kernel distribution, analysis offkey trends, and sentiment analysis of words (such as unique, positive, and negative) within the documents. Additionally, a heterogeneous framework was designed using CPU/GPU and FPGAs to analyse the computational performance of document mapping

    Computing Network of Diseases and Pharmacological Entities through the Integration of Distributed Literature Mining and Ontology Mapping

    Get PDF
    The proliferation of -omics (such as, Genomics, Proteomics) and -ology (such as, System Biology, Cell Biology, Pharmacology) have spawned new frontiers of research in drug discovery and personalized medicine. A vast amount (21 million) of published research results are archived in the PubMed and are continually growing in size. To improve the accessibility and utility of such a large number of literatures, it is critical to develop a suit of semantic sensitive technology that is capable of discovering knowledge and can also infer possible new relationships based on statistical co-occurrences of meaningful terms or concepts. In this context, this thesis presents a unified framework to mine a large number of literatures through the integration of latent semantic analysis (LSA) and ontology mapping. In particular, a parameter optimized, robust, scalable, and distributed LSA (DiLSA) technique was designed and implemented on a carefully selected 7.4 million PubMed records related to pharmacology. The DiLSA model was integrated with MeSH to make the model effective and efficient for a specific domain. An optimized multi-gram dictionary was customized by mapping the MeSH to build the DiLSA model. A fully integrated web-based application, called PharmNet, was developed to bridge the gap between biological knowledge and clinical practices. Preliminary analysis using the PharmNet shows an improved performance over global LSA model. A limited expert evaluation was performed to validate the retrieved results and network with biological literatures. A thorough performance evaluation and validation of results is in progress

    Introspective knowledge acquisition for case retrieval networks in textual case base reasoning.

    Get PDF
    Textual Case Based Reasoning (TCBR) aims at effective reuse of information contained in unstructured documents. The key advantage of TCBR over traditional Information Retrieval systems is its ability to incorporate domain-specific knowledge to facilitate case comparison beyond simple keyword matching. However, substantial human intervention is needed to acquire and transform this knowledge into a form suitable for a TCBR system. In this research, we present automated approaches that exploit statistical properties of document collections to alleviate this knowledge acquisition bottleneck. We focus on two important knowledge containers: relevance knowledge, which shows relatedness of features to cases, and similarity knowledge, which captures the relatedness of features to each other. The terminology is derived from the Case Retrieval Network (CRN) retrieval architecture in TCBR, which is used as the underlying formalism in this thesis applied to text classification. Latent Semantic Indexing (LSI) generated concepts are a useful resource for relevance knowledge acquisition for CRNs. This thesis introduces a supervised LSI technique called sprinkling that exploits class knowledge to bias LSI's concept generation. An extension of this idea, called Adaptive Sprinkling has been proposed to handle inter-class relationships in complex domains like hierarchical (e.g. Yahoo directory) and ordinal (e.g. product ranking) classification tasks. Experimental evaluation results show the superiority of CRNs created with sprinkling and AS, not only over LSI on its own, but also over state-of-the-art classifiers like Support Vector Machines (SVM). Current statistical approaches based on feature co-occurrences can be utilized to mine similarity knowledge for CRNs. However, related words often do not co-occur in the same document, though they co-occur with similar words. We introduce an algorithm to efficiently mine such indirect associations, called higher order associations. Empirical results show that CRNs created with the acquired similarity knowledge outperform both LSI and SVM. Incorporating acquired knowledge into the CRN transforms it into a densely connected network. While improving retrieval effectiveness, this has the unintended effect of slowing down retrieval. We propose a novel retrieval formalism called the Fast Case Retrieval Network (FCRN) which eliminates redundant run-time computations to improve retrieval speed. Experimental results show FCRN's ability to scale up over high dimensional textual casebases. Finally, we investigate novel ways of visualizing and estimating complexity of textual casebases that can help explain performance differences across casebases. Visualization provides a qualitative insight into the casebase, while complexity is a quantitative measure that characterizes classification or retrieval hardness intrinsic to a dataset. We study correlations of experimental results from the proposed approaches against complexity measures over diverse casebases

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Deep Neural Attention for Misinformation and Deception Detection

    Get PDF
    PhD thesis in Information technologyAt present the influence of social media on society is so much that without it life seems to have no meaning for many. This kind of over-reliance on social media gives an opportunity to the anarchic elements to take undue advantage. Online misinformation and deception are vivid examples of such phenomenon. The misinformation or fake news spreads faster and wider than the true news [32]. The need of the hour is to identify and curb the spread of misinformation and misleading content automatically at the earliest. Several machine learning models have been proposed by the researchers to detect and prevent misinformation and deceptive content. However, these prior works suffer from some limitations: First, they either use feature engineering heavy methods or use intricate deep neural architectures, which are not so transparent in terms of their internal working and decision making. Second, they do not incorporate and learn the available auxiliary and latent cues and patterns, which can be very useful in forming the adequate context for the misinformation. Third, Most of the former methods perform poorly in early detection accuracy measures because of their reliance on features that are usually absent at the initial stage of news or social media posts on social networks. In this dissertation, we propose suitable deep neural attention based solutions to overcome these limitations. For instance, we propose a claim verification model, which learns embddings for the latent aspects such as author and subject of the claim and domain of the external evidence document. This enables the model to learn important additional context other than the textual content. In addition, we also propose an algorithm to extract evidential snippets out of external evidence documents, which serves as explanation of the model’s decisions. Next, we improve this model by using improved claim driven attention mechanism and also generate a topically diverse and non-redundant multi-document fact-checking summary for the claims, which helps to further interpret the model’s decision making. Subsequently, we introduce a novel method to learn influence and affinity relationships among the social media users present on the propagation paths of the news items. By modeling the complex influence relationship among the users, in addition to textual content, we learn the significant patterns pertaining to the diffusion of the news item on social network. The evaluation shows that the proposed model outperforms the other related methods in early detection performance with significant gains. Next, we propose a synthetic headline generation based headline incongruence detection model. Which uses a word-to-word mutual attention based deep semantic matching between original and synthetic news headline to detect incongruence. Further, we investigate and define a new task of incongruence detection in presence of important cardinal values in headline. For this new task, we propose a part-of-speech pattern driven attention based method, which learns requisite context for cardinal values

    Syntactic and Semantic Analysis and Visualization of Unstructured English Texts

    Get PDF
    People have complex thoughts, and they often express their thoughts with complex sentences using natural languages. This complexity may facilitate efficient communications among the audience with the same knowledge base. But on the other hand, for a different or new audience this composition becomes cumbersome to understand and analyze. Analysis of such compositions using syntactic or semantic measures is a challenging job and defines the base step for natural language processing. In this dissertation I explore and propose a number of new techniques to analyze and visualize the syntactic and semantic patterns of unstructured English texts. The syntactic analysis is done through a proposed visualization technique which categorizes and compares different English compositions based on their different reading complexity metrics. For the semantic analysis I use Latent Semantic Analysis (LSA) to analyze the hidden patterns in complex compositions. I have used this technique to analyze comments from a social visualization web site for detecting the irrelevant ones (e.g., spam). The patterns of collaborations are also studied through statistical analysis. Word sense disambiguation is used to figure out the correct sense of a word in a sentence or composition. Using textual similarity measure, based on the different word similarity measures and word sense disambiguation on collaborative text snippets from social collaborative environment, reveals a direction to untie the knots of complex hidden patterns of collaboration

    EPOS : evolving personal to organizational knowledge spaces

    Get PDF
    EPOS will leverage the user´s personal workspace with its manyfold native information structures to his personal knowledge space and in cooperation with other personal workspaces contribute to the organizational knowledge space which is represented in the organizational memory. This first milestone presents results from the project´s first year in the areas of the personal informational model, user observation for context elicitation, collaborative information retrieval and information visualization
    • …
    corecore