Search CORE

69 research outputs found

Review of Semantic Importance and Role of using Ontologies in Web Information Retrieval Techniques

Author: Ali Ashraf
Publication venue: 'Asian Online Journals'
Publication date: 05/03/2022
Field of study

The Web contains an enormous amount of information, which is managed to accumulate, researched, and regularly used by many users. The nature of the Web is multilingual and growing very fast with its diverse nature of data including unstructured or semi-structured data such as Websites, texts, journals, and files. Obtaining critical relevant data from such vast data with its diverse nature has been a monotonous and challenging task. Simple key phrase data gathering systems rely heavily on statistics, resulting in a word incompatibility problem related to a specific word's inescapable semantic and situation variants. As a result, there is an urgent need to arrange such colossal data systematically to find out the relevant information that can be quickly analyzed and fulfill the users' needs in the relevant context. Over the years ontologies are widely used in the semantic Web to contain unorganized information systematic and structured manner. Still, they have also significantly enhanced the efficiency of various information recovery approaches. Ontological information gathering systems recover files focused on the semantic relation of the search request and the searchable information. This paper examines contemporary ontology-based information extraction techniques for texts, interactive media, and multilingual data types. Moreover, the study tried to compare and classify the most significant developments utilized in the search and retrieval techniques and their major disadvantages and benefits

International Journal of Computer and Information Technology

Intelligent Customer Support Case Routing

Author: Ding Ziyan
Perez Joseph William
Publication venue: Digital WPI
Publication date: 25/04/2017
Field of study

Juniper Networks is an American company selling networking hardware and software. When a customer needs assistance with a product, a customer support case is opened. The route a case takes to get to the resolver is currently human-controlled. This project aimed to assist case routing through the use of machine learning. We developed a system that uses a pipeline of data including previously resolved cases to model the relationship between a case and the engineer responsible for solving the problem. We used a neural network to create a system that could predict both the resolving group and engineer achieving up to 96% accuracy when predicting the 10 most likely groups, and 42% when predicting the 10 most likely engineers. We built a web application for future demonstration of this system

DigitalCommons@WPI

Encoding, Storing and Searching of Analytical Properties and Assigned Metabolite Structures

Author: Helmus Tobias
Publication venue
Publication date: 01/01/2007
Field of study

Informationen über Metabolite und andere kleine organische Moleküle sind von entscheidender Bedeutung in vielen verschiedenen Bereichen der Naturwissenschaften. Sie spielen z.B. eine entscheidende Rolle in metabolischen Netzwerken und das Wissen über ihre Eigenschaften, hilft komplexe biologische Prozesse und komplette biologische Systeme zu verstehen. Da in biologischen und chemischen Laboren täglich Daten anfallen, welche diese Moleküle beschreiben, existiert eine umfassende Datengrundlage, die sich kontinuierlich erweitert. Um Wissenschaftlern die Verarbeitung, den Austausch, die Archivierung und die Suche innerhalb dieser Informationen unter Erhaltung der semantischen Zusammenhänge zu ermöglichen, sind komplexe Softwaresysteme und Datenformate nötig. Das Ziel dieses Projektes bestand darin, Anwendungen und Algorithmen zu entwickeln, welche für die effiziente Kodierung, Sammlung, Normalisierung und Analyse molekularer Daten genutzt werden können. Diese sollen Wissenschaftler bei der Strukturaufklärung, der Dereplikation, der Analyse von molekularen Wechselwirkungen und bei der Veröffentlichung des so gewonnenen Wissens unterstützen. Da die direkte Beschreibung der Struktur und der Funktionsweise einer unbekannten Verbindung sehr schwierig und aufwändig ist, wird dies hauptsächlich indirekt, mit Hilfe beschreibender Eigenschaften erreicht. Diese werden dann zur Vorhersage struktureller und funktioneller Charakteristika genutzt. In diesem Zusammenhang wurden Programmmodule entwickelt, welche sowohl die Visualisierung von Struktur- und Spektroskopiedaten, die gegliederte Darstellung und Veränderung von Metadaten und Eigenschaften, als auch den Import und Export von verschiedenen Datenformaten erlauben. Diese wurden durch Methoden erweitert, welche es ermöglichen, die gewonnenen Informationen weitergehend zu analysieren und Struktur- und Spektroskopiedaten einander zuzuweisen. Außerdem wurde ein System zur strukturierten Archivierung und Verwaltung großer Mengen molekularer Daten und spektroskopischer Informationen, unter Beibehaltung der semantischen Zusammenhänge, sowohl im Dateisystem, als auch in Datenbanken, entwickelt. Um die verlustfreie Speicherung zu gewährleisten, wurde ein offenes und standardisiertes Datenformat definiert (CMLSpect). Dieses erweitert das existierende CML (Chemical Markup Language) Vokabular und erlaubt damit die einfache Handhabung von verknüpften Struktur- und Spektroskopiedaten. Die entwickelten Anwendungen wurden in das Bioclipse System für Bio- und Chemoinformatik eingebunden und bieten dem Nutzer damit eine hochqualitative Benutzeroberfläche und dem Entwickler eine leicht zu erweiternde modulare Programmarchitektur

Kölner UniversitätsPublikationsServer

state of the art analysis ; working packages in project phase II

Author: Coskun Gökhan
Heese Ralf
Oldakowski Radoslaw
Paschke Adrian
Rothe Mario
Schäfermeier Ralph
Streibel Olga
Teymourian Kia
Todor Alexandru
Publication venue
Publication date: 01/01/2011
Field of study

In this report, we introduce our goals and present our requirement analysis for the second phase of the Corporate Semantic Web project. Corporate ontology engineering will improve the facilitation of agile ontology engineering to lessen the costs of ontology development and, especially, maintenance. Corporate semantic collaboration focuses the human-centered aspects of knowledge management in corporate contexts. Corporate semantic search is settled on the highest application level of the three research areas and at that point it is a representative for applications working on and with the appropriately represented and delivered background knowledge

Institutional Repository of the Freie Universität Berlin

Deep Architectures for Visual Recognition and Description

Author: Perunninakulath Parameshwaran Anuja
Publication venue: ScholarWorks @ Georgia State University
Publication date: 11/08/2020
Field of study

In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy

ScholarWorks @ Georgia State University

Digital Image Access & Retrieval

Author: Heidorn P. Bryan
Sandore Beth
Publication venue: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Publication date: 01/01/1997
Field of study

The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Text Embedding-based Event Detection for Social and News Media

Author: Hettiarachchi Hansi
Publication venue
Publication date: 06/06/2023
Field of study

Today, social and news media are the leading platforms that distribute newsworthy content, and most internet users access them regularly to get information. However, due to the data’s unstructured nature and vast volume, manual analyses to extract information require enormous effort. Thus, automated intelligent mechanisms have become crucial. The literature presents several emerging approaches for social and news media event detection, along with distinct evolutions, mainly due to the variations in the media. However, most available social media event detection approaches primarily rely on data statistics, ignoring linguistics, making them vulnerable to information loss. Also, the available news media event detection approaches mostly fail to capture long-range text dependencies and support predictions of low-resource languages (i.e. languages with relatively fewer data). The possibility of utilising interconnections between different data levels to improve final predictions also has not been adequately explored. This research investigates how the characteristics of text embeddings built using prediction-based models that have proven capabilities to capture linguistics can be used in event detection while defeating available limitations. Initially, it redefines the problem of event detection based on two data granularities, coarse- and fine-grained levels, to allow systems to tackle different information requirements. Mainly, the coarse-grained level targets the notification of event occurrences and the fine-grained level targets the provision of event details. Following the new definition, this research proposes two novel approaches for coarse- and fine-grained level event detections on social media, Embed2Detect and WhatsUp, mainly utilising linguistics captured by self-learned word embeddings and their hierarchical relationships in dendrograms. For news media event detection, this proposes a TRansformer-based Event Document classification architecture (TRED) involving long-sequence and cross-lingual transformer encoders and a novel learning strategy, Two-phase Transfer Learning (TTL), supporting the capturing of long-range dependencies and data level interconnections. All the proposed approaches have been evaluated on recent real datasets, covering four aspects crucial for event detection: accuracy, efficiency, expandability and scalability. Social media data from two diverse domains and news media data from four high- and low-resource languages are mainly involved. The obtained results reveal that the proposed approaches outperform the state-of-the-art methods despite the data diversities, proving their accuracy and expandability. Additionally, the evaluations on efficiency and scalability adequately confirm the methods’ appropriateness for (near) real-time processing and ability to handle large data volumes. In summary, the achievement of all crucial requirements evidences the potential and utility of proposed approaches for event detection in social and news media

BCU Open Access

Multi-word unit processing in machine translation. Developing and using language resources for multi-word unit processing in machine translation

Author: Monti Johanna
Publication venue: Universita degli studi di Salerno
Publication date: 24/04/2015
Field of study

2011 - 2012XI n.s

EleA@UniSA - Università degli Studi di Salerno

Exploiting Latent Features of Text and Graphs

Author: Sybrandt Justin George
Publication venue: Clemson University Libraries
Publication date: 01/05/2020
Field of study

As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

Clemson University: TigerPrints

Artificial intelligence for understanding the Hadith

Author: Altammami Shatha Hamad
Publication venue
Publication date: 01/01/2023
Field of study

My research aims to utilize Artificial Intelligence to model the meanings of Classical Arabic Hadith, which are the reports of the life and teachings of the Prophet Muhammad. The goal is to find similarities and relatedness between Hadith and other religious texts, specifically the Quran. These findings can facilitate downstream tasks, such as Islamic question- answering systems, and enhance understanding of these texts to shed light on new interpretations. To achieve this goal, a well-structured Hadith corpus should be created, with the Matn (Hadith teaching) and Isnad (chain of narrators) segmented. Hence, a preliminary task is conducted to build a segmentation tool using machine learning models that automatically deconstruct the Hadith into Isnad and Matn with 92.5% accuracy. This tool is then used to create a well-structured corpus of the canonical Hadith books. After building the Hadith corpus, Matns are extracted to investigate different methods of representing their meanings. Two main methods are tested: a knowledge-based approach and a deep-learning-based approach. To apply the former, existing Islamic ontologies are enumerated, most of which are intended for the Quran. Since the Quran and the Hadith are in the same domain, the extent to which these ontologies cover the Hadith is examined using a corpus-based evaluation. Results show that the most comprehensive Quran ontology covers only 26.8% of Hadith concepts, and extending it is expensive. Therefore, the second approach is investigated by building and evaluating various deep-learning models for a binary classification task of detecting relatedness between the Hadith and the Quran. Results show that the likelihood of the current models reaching a human- level understanding of such texts remains somewhat elusive

White Rose E-theses Online