475 research outputs found

    Comparative study on Judgment Text Classification for Transformer Based Models

    Full text link
    This work involves the usage of various NLP models to predict the winner of a particular judgment by the means of text extraction and summarization from a judgment document. These documents are useful when it comes to legal proceedings. One such advantage is that these can be used for citations and precedence reference in Lawsuits and cases which makes a strong argument for their case by the ones using it. When it comes to precedence, it is necessary to refer to an ample number of documents in order to collect legal points with respect to the case. However, reviewing these documents takes a long time to analyze due to the complex word structure and the size of the document. This work involves the comparative study of 6 different self-attention-based transformer models and how they perform when they are being tweaked in 4 different activation functions. These models which are trained with 200 judgement contexts and their results are being judged based on different benchmark parameters. These models finally have a confidence level up to 99% while predicting the judgment. This can be used to get a particular judgment document without spending too much time searching relevant cases and reading them completely.Comment: 28 pages with 9 figure

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Dataflow Programming and Acceleration of Computationally-Intensive Algorithms

    Get PDF
    The volume of unstructured textual information continues to grow due to recent technological advancements. This resulted in an exponential growth of information generated in various formats, including blogs, posts, social networking, and enterprise documents. Numerous Enterprise Architecture (EA) documents are also created daily, such as reports, contracts, agreements, frameworks, architecture requirements, designs, and operational guides. The processing and computation of this massive amount of unstructured information necessitate substantial computing capabilities and the implementation of new techniques. It is critical to manage this unstructured information through a centralized knowledge management platform. Knowledge management is the process of managing information within an organization. This involves creating, collecting, organizing, and storing information in a way that makes it easily accessible and usable. The research involved the development textual knowledge management system, and two use cases were considered for extracting textual knowledge from documents. The first case study focused on the safety-critical documents of a railway enterprise. Safety is of paramount importance in the railway industry. There are several EA documents including manuals, operational procedures, and technical guidelines that contain critical information. Digitalization of these documents is essential for analysing vast amounts of textual knowledge that exist in these documents to improve the safety and security of railway operations. A case study was conducted between the University of Huddersfield and the Railway Safety Standard Board (RSSB) to analyse EA safety documents using Natural language processing (NLP). A graphical user interface was developed that includes various document processing features such as semantic search, document mapping, text summarization, and visualization of key trends. For the second case study, open-source data was utilized, and textual knowledge was extracted. Several features were also developed, including kernel distribution, analysis offkey trends, and sentiment analysis of words (such as unique, positive, and negative) within the documents. Additionally, a heterogeneous framework was designed using CPU/GPU and FPGAs to analyse the computational performance of document mapping

    Large-scale image collection cleansing, summarization and exploration

    Get PDF
    A perennially interesting topic in the research field of large scale image collection organization is how to effectively and efficiently conduct the tasks of image cleansing, summarization and exploration. The primary objective of such an image organization system is to enhance user exploration experience with redundancy removal and summarization operations on large-scale image collection. An ideal system is to discover and utilize the visual correlation among the images, to reduce the redundancy in large-scale image collection, to organize and visualize the structure of large-scale image collection, and to facilitate exploration and knowledge discovery. In this dissertation, a novel system is developed for exploiting and navigating large-scale image collection. Our system consists of the following key components: (a) junk image filtering by incorporating bilingual search results; (b) near duplicate image detection by using a coarse-to-fine framework; (c) concept network generation and visualization; (d) image collection summarization via dictionary learning for sparse representation; and (e) a multimedia practice of graffiti image retrieval and exploration. For junk image filtering, bilingual image search results, which are adopted for the same keyword-based query, are integrated to automatically identify the clusters for the junk images and the clusters for the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. The duplicate pairs are detected with both global feature (partition based color histogram) and local feature (CPAM and SIFT Bag-of-Word model). The duplications are detected and removed from the data collection to facilitate further exploration and visual correlation analysis. After junk image filtering and duplication removal, the visual concepts are further organized and visualized by the proposed concept network. An automatic algorithm is developed to generate such visual concept network which characterizes the visual correlation between image concept pairs. Multiple kernels are combined and a kernel canonical correlation analysis algorithm is used to characterize the diverse visual similarity contexts between the image concepts. The FishEye visualization technique is implemented to facilitate the navigation of image concepts through our image concept network. To better assist the exploration of large scale data collection, we design an efficient summarization algorithm to extract representative examplars. For this collection summarization task, a sparse dictionary (a small set of the most representative images) is learned to represent all the images in the given set, e.g., such sparse dictionary is treated as the summary for the given image set. The simulated annealing algorithm is adopted to learn such sparse dictionary (image summary) by minimizing an explicit optimization function. In order to handle large scale image collection, we have evaluated both the accuracy performance of the proposed algorithms and their computation efficiency. For each of the above tasks, we have conducted experiments on multiple public available image collections, such as ImageNet, NUS-WIDE, LabelMe, etc. We have observed very promising results compared to existing frameworks. The computation performance is also satisfiable for large-scale image collection applications. The original intention to design such a large-scale image collection exploration and organization system is to better service the tasks of information retrieval and knowledge discovery. For this purpose, we utilize the proposed system to a graffiti retrieval and exploration application and receive positive feedback

    JURI SAYS:An Automatic Judgement Prediction System for the European Court of Human Rights

    Get PDF
    In this paper we present the web platform JURI SAYS that automatically predicts decisions of the European Court of Human Rights based on communicated cases, which are published by the court early in the proceedings and are often available many years before the final decision is made. Our system therefore predicts future judgements of the court. The platform is available at jurisays.com and shows the predictions compared to the actual decisions of the court. It is automatically updated every month by including the prediction for the new cases. Additionally, the system highlights the sentences and paragraphs that are most important for the prediction (i.e. violation vs. no violation of human rights)

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Information fusion for automated question answering

    Get PDF
    Until recently, research efforts in automated Question Answering (QA) have mainly focused on getting a good understanding of questions to retrieve correct answers. This includes deep parsing, lookups in ontologies, question typing and machine learning of answer patterns appropriate to question forms. In contrast, I have focused on the analysis of the relationships between answer candidates as provided in open domain QA on multiple documents. I argue that such candidates have intrinsic properties, partly regardless of the question, and those properties can be exploited to provide better quality and more user-oriented answers in QA.Information fusion refers to the technique of merging pieces of information from different sources. In QA over free text, it is motivated by the frequency with which different answer candidates are found in different locations, leading to a multiplicity of answers. The reason for such multiplicity is, in part, the massive amount of data used for answering, and also its unstructured and heterogeneous content: Besides am¬ biguities in user questions leading to heterogeneity in extractions, systems have to deal with redundancy, granularity and possible contradictory information. Hence the need for answer candidate comparison. While frequency has proved to be a significant char¬ acteristic of a correct answer, I evaluate the value of other relationships characterizing answer variability and redundancy.Partially inspired by recent developments in multi-document summarization, I re¬ define the concept of "answer" within an engineering approach to QA based on the Model-View-Controller (MVC) pattern of user interface design. An "answer model" is a directed graph in which nodes correspond to entities projected from extractions and edges convey relationships between such nodes. The graph represents the fusion of information contained in the set of extractions. Different views of the answer model can be produced, capturing the fact that the same answer can be expressed and pre¬ sented in various ways: picture, video, sound, written or spoken language, or a formal data structure. Within this framework, an answer is a structured object contained in the model and retrieved by a strategy to build a particular view depending on the end user (or taskj's requirements.I describe shallow techniques to compare entities and enrich the model by discovering four broad categories of relationships between entities in the model: equivalence, inclusion, aggregation and alternative. Quantitatively, answer candidate modeling im¬ proves answer extraction accuracy. It also proves to be more robust to incorrect answer candidates than traditional techniques. Qualitatively, models provide meta-information encoded by relationships that allow shallow reasoning to help organize and generate the final output

    Computing point-of-view : modeling and simulating judgments of taste

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2006.Includes bibliographical references (p. 153-163).People have rich points-of-view that afford them the ability to judge the aesthetics of people, things, and everyday happenstance; yet viewpoint has an ineffable quality that is hard to articulate in words, let alone capture in computer models. Inspired by cultural theories of taste and identity, this thesis explores end-to-end computational modeling of people's tastes-from model acquisition, to generalization, to application- under various realms. Five aesthetical realms are considered-cultural taste, attitudes, ways of perceiving, taste for food, and sense-of-humor. A person's model is acquired by reading her personal texts, such as a weblog diary, a social network profile, or emails. To generalize a person model, methods such as spreading activation, analogy, and imprimer supplementation are applied to semantic resources and search spaces mined from cultural corpora. Once a generalized model is achieved, a person's tastes are brought to life through perspective-based applications, which afford the exploration of someone else's perspective through interactivity and play. The thesis describes model acquisition systems implemented for each of the five aesthetical realms.(cont.) The techniques of 'reading for affective themes' (RATE), and 'culture mining' are described, along with their enabling technologies, which are commonsense reasoning and textual affect analysis. Finally, six perspective-based applications were implemented to illuminate a range of real-world beneficiaries to person modeling-virtual mentoring, self-reflection, and deep customization.by Xinyu Hugo Liu.Ph.D
    corecore