1,494 research outputs found

    SURVEY OF E-MAIL CLASSIFICATION: REVIEW AND OPEN ISSUES

    Get PDF
    Email is an economical facet of communication, the importance of which is increasing in spite of access to other approaches, such as electronic messaging, social networks, and phone applications. The business arena depends largely on the use of email, which urges the proper management of emails due to disruptive factors such as spams, phishing emails, and multi-folder categorization. The present study aimed to review the studies regarding emails, which were published during 2016-2020, based on the problem description analysis in terms of datasets, applications areas, classification techniques, and feature sets. In addition, other areas involving email classifications were identified and comprehensively reviewed. The results indicated four email application areas, while the open issues and research directions of email classifications were implicated for further investigation

    Emotion AI-Driven Sentiment Analysis: A Survey, Future Research Directions, and Open Issues

    Get PDF
    The essential use of natural language processing is to analyze the sentiment of the author via the context. This sentiment analysis (SA) is said to determine the exactness of the underlying emotion in the context. It has been used in several subject areas such as stock market prediction, social media data on product reviews, psychology, judiciary, forecasting, disease prediction, agriculture, etc. Many researchers have worked on these areas and have produced significant results. These outcomes are beneficial in their respective fields, as they help to understand the overall summary in a short time. Furthermore, SA helps in understanding actual feedback shared across di erent platforms such as Amazon, TripAdvisor, etc. The main objective of this thorough survey was to analyze some of the essential studies done so far and to provide an overview of SA models in the area of emotion AI-driven SA. In addition, this paper o ers a review of ontology-based SA and lexicon-based SA along with machine learning models that are used to analyze the sentiment of the given context. Furthermore, this work also discusses di erent neural network-based approaches for analyzing sentiment. Finally, these di erent approaches were also analyzed with sample data collected from Twitter. Among the four approaches considered in each domain, the aspect-based ontology method produced 83% accuracy among the ontology-based SAs, the term frequency approach produced 85% accuracy in the lexicon-based analysis, and the support vector machine-based approach achieved 90% accuracy among the other machine learning-based approaches.Ministerio de Educación (MOE) en Taiwán N/

    Fully-unsupervised embeddings-based hypernym discovery

    Get PDF
    Funding: Supported in part by Sardegna Ricerche project OKgraph (CRP 120) and MIUR MIUR PRIN 2017 (2019-2022) project HOPE—High quality Open data Publishing and Enrichment.Peer reviewedPublisher PD

    Exploring the State of the Art in Legal QA Systems

    Full text link
    Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. QA (Question answering systems) are designed to generate answers to questions asked in human languages. They use natural language processing to understand questions and search through information to find relevant answers. QA has various practical applications, including customer service, education, research, and cross-lingual communication. However, they face challenges such as improving natural language understanding and handling complex and ambiguous questions. Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. At this time, there is a lack of surveys that discuss legal question answering. To address this problem, we provide a comprehensive survey that reviews 14 benchmark datasets for question-answering in the legal field as well as presents a comprehensive review of the state-of-the-art Legal Question Answering deep learning models. We cover the different architectures and techniques used in these studies and the performance and limitations of these models. Moreover, we have established a public GitHub repository where we regularly upload the most recent articles, open data, and source code. The repository is available at: \url{https://github.com/abdoelsayed2016/Legal-Question-Answering-Review}

    Multi-dimensional mining of unstructured data with limited supervision

    Get PDF
    As one of the most important data forms, unstructured text data plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to healthcare and scientific research. In many emerging applications, people's information needs from text data are becoming multi-dimensional---they demand useful insights for multiple aspects from the given text corpus. However, turning massive text data into multi-dimensional knowledge remains a challenge that cannot be readily addressed by existing data mining techniques. In this thesis, we propose algorithms that turn unstructured text data into multi-dimensional knowledge with limited supervision. We investigate two core questions: 1. How to identify task-relevant data with declarative queries in multiple dimensions? 2. How to distill knowledge from data in a multi-dimensional space? To address the above questions, we propose an integrated cube construction and exploitation framework. First, we develop a cube construction module that organizes unstructured data into a cube structure, by discovering latent multi-dimensional and multi-granular structure from the unstructured text corpus and allocating documents into the structure. Second, we develop a cube exploitation module that models multiple dimensions in the cube space, thereby distilling multi-dimensional knowledge from data to provide insights along multiple dimensions. Together, these two modules constitute an integrated pipeline: leveraging the cube structure, users can perform multi-dimensional, multi-granular data selection with declarative queries; and with cube exploitation algorithms, users can make accurate cross-dimension predictions or extract multi-dimensional patterns for decision making. The proposed framework has two distinctive advantages when turning text data into multi-dimensional knowledge: flexibility and label-efficiency. First, it enables acquiring multi-dimensional knowledge flexibly, as the cube structure allows users to easily identify task-relevant data along multiple dimensions at varied granularities and further distill multi-dimensional knowledge. Second, the algorithms for cube construction and exploitation require little supervision; this makes the framework appealing for many applications where labeled data are expensive to obtain

    Constructive Ontology Engineering

    Get PDF
    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in literature have been used in creating ontologies from various data sources such as structured data in databases or unstructured text found in text documents or HTML documents. Various data mining techniques, natural language processing methods, syntactical analysis, machine learning methods, and other techniques have been used in building ontologies with automated and semi-automated processes. Due to the vast amount of unstructured text and its continued proliferation, the problem of constructing ontologies from text has attracted considerable attention for research. However, the constructed ontologies may be noisy, with missing and incorrect knowledge. Thus ontology construction continues to be a challenging research problem. The goal of this research is to investigate a new method for guiding a process of extracting and assembling candidate terms into domain specific concepts and relationships. The process is part of an overall semi automated system for creating ontologies from unstructured text sources and is driven by the user’s goals in an incremental process. The system applies natural language processing techniques and uses a series of syntactical analysis tools for extracting grammatical relations from a list of text terms representing the parts of speech of a sentence. The extraction process focuses on evaluating the subject predicate-object sequences of the text for potential concept-relation-concept triples to be built into an ontology. Users can guide the system by selecting seedling concept-relation-concept triples to assist building concepts from the extracted domain specific terms. As a result, the ontology building process develops into an incremental one that allows the user to interact with the system, to guide the development of an ontology, and to tailor the ontology for the user’s application needs. The main contribution of this work is the implementation and evaluation of a new semi- automated methodology for constructing domain specific ontologies from unstructured text corpus
    corecore