3,447 research outputs found

    Alʔilbīrī’s Book of the rational conclusions. Introduction, Critical Edition of the Arabic Text and Materials for the History of the Ḫawāṣṣic Genre in Early Andalus

    Full text link
    [eng] The Book of the rational conclusions, written perhaps somewhen in the 10th c. by a physician from Ilbīrah (Andalus), is a multi-section medical pandect. The author brings together, from a diversity of sources, materials dealing with matters related to drug-handling, natural philosophy, therapeutics, medical applications of the specific properties of things, a regimen, and a dispensatory. This dissertation includes three different parts. First the transmission of the text, its contents, and its possible context are discussed. Then a critical edition of the Arabic text is offered. Last, but certainly not least, the subject of the specific properties is approached from several points of view. The analysis of Section III of the original book leads to an exploration of the early Andalusī assimilation of this epistemic tradition and to the establishment of a well-defined textual family in which our text must be inscribed. On the other hand, the concept itself of ‘specific property’ is often misconstrued and it is usually made synonymous to magic and superstition. Upon closer inspection, however, the alleged irrationality of the knowledge of these properties appears to be largely the result of anachronistic interpretation. As a complement of this particular research and as an illustration of the genre, a sample from an ongoing integral commentary on this section of the book is presented.[cat] El Llibre de les conclusions racionals d’un desconegut metge d’Ilbīrah (l’Àndalus) va ser compilat probablement durant la segona meitat del s. X. Es tracta d’un rudimentari però notablement complet kunnaix (un gènere epistèmic que és definit sovint com a ‘enciclopèdia mèdica’) en què l’autor aplega materials manllevats (sovint de manera literal i no-explícita) de diversos gèneres. El llibre obre amb una secció sobre apoteconomia (una mena de manual d’apotecaris) però se centra després en les diferents branques de la medicina. A continuació d’uns prolegòmens filosòfics l’autor copia, amb mínima adaptació lingüística, un tractat sencer de terapèutica, després un altre sobre les aplicacions mèdiques de les propietats específiques de les coses, una sèrie de fragments relacionats amb la dietètica (un règim en termes tradicionals) i, finalment, una col·lecció de receptes mèdiques. Cadascuna d’aquestes seccions mostren evidents lligams d’intertextualitat que apunten cap a una intensa activitat sintetitzadora de diverses tradicions aliades a la medicina a l’Àndalus califal. El text és, de fet, un magnífic objecte sobre el qual aplicar la metodologia de la crítica textual i de fonts. L’edició crítica del text incorpora la dimensió cronològica dins l’aparat, que esdevé així un element contextualitzador. Quant l’estudi de les fonts, si tot al llarg de la primera part d’aquesta tesi és només secundari, aquesta disciplina pren un protagonisme gairebé absolut en la tercera part, especialment en el capítol dedicat a l’anàlisi individual de cada passatge recollit en la secció sobre les propietats específiques de les coses

    Improving Cross-Lingual Transfer Learning for Event Detection

    Get PDF
    The widespread adoption of applications powered by Artificial Intelligence (AI) backbones has unquestionably changed the way we interact with the world around us. Applications such as automated personal assistants, automatic question answering, and machine-based translation systems have become mainstays of modern culture thanks to the recent considerable advances in Natural Language Processing (NLP) research. Nonetheless, with over 7000 spoken languages in the world, there still remain a considerable number of marginalized communities that are unable to benefit from these technological advancements largely due to the language they speak. Cross-Lingual Learning (CLL) looks to address this issue by transferring the knowledge acquired from a popular, high-resource source language (e.g., English, Chinese, or Spanish) to a less favored, lower-resourced target language (e.g., Urdu or Swahili). This dissertation leverages the Event Detection (ED) sub-task of Information Extraction (IE) as a testbed and presents three novel approaches that improve cross-lingual transfer learning from distinct perspectives: (1) direct knowledge transfer, (2) hybrid knowledge transfer, and (3) few-shot learning

    Dataflow Programming and Acceleration of Computationally-Intensive Algorithms

    Get PDF
    The volume of unstructured textual information continues to grow due to recent technological advancements. This resulted in an exponential growth of information generated in various formats, including blogs, posts, social networking, and enterprise documents. Numerous Enterprise Architecture (EA) documents are also created daily, such as reports, contracts, agreements, frameworks, architecture requirements, designs, and operational guides. The processing and computation of this massive amount of unstructured information necessitate substantial computing capabilities and the implementation of new techniques. It is critical to manage this unstructured information through a centralized knowledge management platform. Knowledge management is the process of managing information within an organization. This involves creating, collecting, organizing, and storing information in a way that makes it easily accessible and usable. The research involved the development textual knowledge management system, and two use cases were considered for extracting textual knowledge from documents. The first case study focused on the safety-critical documents of a railway enterprise. Safety is of paramount importance in the railway industry. There are several EA documents including manuals, operational procedures, and technical guidelines that contain critical information. Digitalization of these documents is essential for analysing vast amounts of textual knowledge that exist in these documents to improve the safety and security of railway operations. A case study was conducted between the University of Huddersfield and the Railway Safety Standard Board (RSSB) to analyse EA safety documents using Natural language processing (NLP). A graphical user interface was developed that includes various document processing features such as semantic search, document mapping, text summarization, and visualization of key trends. For the second case study, open-source data was utilized, and textual knowledge was extracted. Several features were also developed, including kernel distribution, analysis offkey trends, and sentiment analysis of words (such as unique, positive, and negative) within the documents. Additionally, a heterogeneous framework was designed using CPU/GPU and FPGAs to analyse the computational performance of document mapping

    Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLP

    Get PDF
    Extracting valuable insights from vast amounts of unstructured digital text presents significant challenges across diverse domains. This research addresses this challenge by proposing a novel pipeline-based system that generates domain-agnostic and task-agnostic text representations. The proposed approach leverages labeled property graphs (LPG) to encode contextual information, facilitating the integration of diverse linguistic elements into a unified representation. The proposed system enables efficient graph-based querying and manipulation by addressing the crucial aspect of comprehensive context modeling and fine-grained semantics. The effectiveness of the proposed system is demonstrated through the implementation of NLP components that operate on LPG-based representations. Additionally, the proposed approach introduces specialized patterns and algorithms to enhance specific NLP tasks, including nominal mention detection, named entity disambiguation, event enrichments, event participant detection, and temporal link detection. The evaluation of the proposed approach, using the MEANTIME corpus comprising manually annotated documents, provides encouraging results and valuable insights into the system\u27s strengths. The proposed pipeline-based framework serves as a solid foundation for future research, aiming to refine and optimize LPG-based graph structures to generate comprehensive and semantically rich text representations, addressing the challenges associated with efficient information extraction and analysis in NLP

    Rules, frequency, and predictability in morphological generalization: behavioral and computational evidence from the German plural system

    Get PDF
    Morphological generalization, or the task of mapping an unknown word (such as a novel noun Raun) to an inflected form (such as the plural Rauns), has historically proven a contested topic within computational linguistics and cognitive science, e.g. within the past tense debate (Rumelhart and McClelland, 1986; Pinker and Prince, 1988; Seidenberg and Plaut, 2014). Marcus et al. (1995) identified German plural inflection as a key challenge domain to evaluate two competing accounts of morphological generalization: a rule generation view focused on linguistic features of input words, and a type frequency view focused on the distribution of output inflected forms, thought to reflect more domain-general cognitive processes. More recent behavioral and computational research developments support a new view based on predictability, which integrates both input and output distributions. My research uses these methodological innovations to revisit a core dispute of the past tense debate: how do German speakers generalize plural inflection, and can computational learners generalize similarly? This dissertation evaluates the rule generation, type frequency, and predictability accounts of morphological generalization in a series of behavioral and computational experiments with the stimuli developed by Marcus et al.. I assess predictions for three aspects of German plural generalization: distribution of infrequent plural classes, influence of grammatical gender, and within-item variability. Overall, I find that speaker behavior is best characterized as frequency-matching to a phonologically-conditioned lexical distribution. This result does not support the rule generation view, and qualifies the predictability view: speakers use some, but not all available information to reduce uncertainty in morphological generalization. Neural and symbolic model predictions are typically overconfident relative to speakers; simple Bayesian models show somewhat higher speaker-like variability and accuracy. All computational models are outperformed by a static phonologically-conditioned lexical baseline, suggesting these models have not learned the selective feature preferences that inform speaker generalization

    Dataset And Deep Neural Network Based Approach To Audio Question Answering

    Get PDF
    Audio question answering (AQA) is a multimodal task in which a system analyzes an audio signal and a question in natural language, to produce a desirable answer in natural language. In this thesis, a new dataset for audio question answering, Clotho-AQA, consisting of 1991 audio files each between 15 to 30 seconds in duration is presented. For each audio file in the dataset, six different questions and their corresponding answers were crowdsourced using Amazon Mechanical Turk (AMT). The questions and their corresponding answers were created by different annotators. Out of the six questions for each audio, two questions each were designed to have ‘yes’ and ‘no’ as answers respectively, while the remaining two questions have other single-word answers. For every question, answers from three independent annotators were collected. In this thesis, two baseline experiments are presented to portray the usage of the Clotho-AQA dataset - a multimodal binary classifier for ‘yes’ or ‘no’ answers and a multimodal multi-class classifier for single-word answers both based on long short-term memory (LSTM) layers. The binary classifier achieved an accuracy of 62.7% and the multi-class classifier achieved a top-1 accuracy of 54.2% and a top-5 accuracy of 93.7%. Further, an attention-based model was proposed, which increased the binary classifier accuracy to 66.2% and the top-1 and top-5 multiclass classifier accuracy to 57.5% and 99.8% respectively. Some drawbacks of the Clotho-AQA dataset such as the presence of the same answer words in different tenses, singular-plural forms, etc., that are considered as different classes for the classification problem were addressed and a refined version called Clotho-AQA_v2 is also presented. The multimodal baseline model achieved a top-1 and top-5 accuracy of 59.8% and 96.6% respectively while the attention-based model achieved a top-1 and top-5 accuracy of 61.3% and 99.6% respectively on this refined dataset

    Mapping Brains with Language Models: A Survey

    Full text link
    Over the years, many researchers have seemingly made the same observation: Brain and language model activations exhibit some structural similarities, enabling linear partial mappings between features extracted from neural recordings and computational language models. In an attempt to evaluate how much evidence has been accumulated for this observation, we survey over 30 studies spanning 10 datasets and 8 metrics. How much evidence has been accumulated, and what, if anything, is missing before we can draw conclusions? Our analysis of the evaluation methods used in the literature reveals that some of the metrics are less conservative. We also find that the accumulated evidence, for now, remains ambiguous, but correlations with model size and quality provide grounds for cautious optimism

    An Overview of Context Capturing Techniques in NLP

    Get PDF
    In the NLP context identification has become a prominent way to overcome syntactic and semantic ambiguities. Ambiguities are unsolved problems but can be reduced to a certain level. This ambiguity reduction helps to improve the quality of several NLP processes, such as text translation, text simplification, text retrieval, word sense disambiguation, etc. Context identification, also known as contextualization, takes place in the preprocessing phase of NLP processes. The essence of this identification is to uniquely represent a word or a phrase to improve the decision-making during the transfer phase of the NLP processes. The improved decision-making helps to improve the quality of the output. This paper tries to provide an overview of different context-capturing mechanisms used in NLP

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    PEJL: A path-enhanced joint learning approach for knowledge graph completion

    Get PDF
    Knowledge graphs (KGs) often suffer from incompleteness. Knowledge graph completion (KGC) is proposed to complete missing components in a KG. Most KGC methods focus on direct relations and fail to leverage rich semantic information in multi-hop paths. In contrast, path-based embedding methods can capture path information and utilize extra semantics to improve KGC. However, most path-based methods cannot take advantage of full multi-hop information and neglect to capture multiple semantic associations between single and multi-hop triples. To bridge the gap, we propose a novel path-enhanced joint learning approach called PEJL for KGC. Rather than learning multi-hop representations, PEJL can recover multi-hop embeddings by encoding full multi-hop components. Meanwhile, PEJL extends the definition of translation energy functions and generates new semantic representations for each multi-hop component, which is rarely considered in path-based methods. Specifically, we first use the path constraint resource allocation (PCRA) algorithm to extract multi-hop triples. Then we use an embedding recovering module consisting of a bidirectional gated recurrent unit (GRU) layer and a fully connected layer to obtain multi-hop embeddings. Next, we employ a KG modeling module to leverage various semantic information and model the whole knowledge graph based on translation methods. Finally, we define a joint learning approach to train our proposed PEJL. We evaluate our model on two KGC datasets: FB15K-237 and NELL-995. Experiments show the effectiveness and superiority of PEJL
    corecore