356 research outputs found

    Improving Search via Named Entity Recognition in Morphologically Rich Languages – A Case Study in Urdu

    Get PDF
    University of Minnesota Ph.D. dissertation. February 2018. Major: Computer Science. Advisors: Vipin Kumar, Blake Howald. 1 computer file (PDF); xi, 236 pages.Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem -- the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous -- a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages

    Brain structural predispositions for music and language processing

    Get PDF
    [eng] It has been shown that music and language training can elicit plastic changes on brain structure and function bringing along behavioural benefits. For instance, musicians have been reported to have better auditory discrimination including pitch and speech-in-noise perception, motor-synchronization, verbal memory and general IQ than individuals without formal musical background. Also, bilinguals have shown higher executive function and attention-related abilities than monolinguals. Furthermore, altered functional and structural connectivity can be tracked to brain areas related to the activities most frequently performed by both musicians (instrumentalists and singers) and linguistic experts (such as bilinguals or professional phoneticians). While research in the last decade has devoted important effort to the study of brain plasticity, only a few investigations have addressed the connection between the initial functional or structural properties of brain networks related to auditory-motor function and subsequent language or musical training. Indeed, brain structural markers such as grey matter volume/density or white-matter diffusivity measurements from diffusion tensor imaging (DTI) data, as well as functional measurements from task- related activity or resting-state data from magnetic resonance imaging (MRI) or electroenceplhalography (EEG) have been demonstrated to correlate with consecutive performance and learning in the auditory-motor domain. The main goal of the present dissertation was twofold: we aimed to further the existing knowledge regarding brain plasticity elicited during putative sensitive periods and after long-term music practice, and to explore the white-matter pathways that predict linguistic or musical skills at baseline . Our secondary goals were to confirm previous findings regarding the brain structures involved in music and language processing, as well as to provide evidence of the benefits of usingstructural measurements and correlational analyses between imaging and behavioural data to study inter-individual differences. Study I focused on the comparison between professional pianists and non- musicians observing a complex pattern of increases and decreases in grey matter volume. In comparison to non-musician individuals, pianists showed greater grey matter volume in areas related to motor skill and the automatization of learned movements, as well as reinforcement learning and emotional processing. On the other hand, regions associated to sensorimotor control, score reading and auditory and musical perception presented a reduction in grey matter volume. Study II explored the relationship between white-matter structural properties of the arcuate fasciculus (AF) and the performance of native German speakers in a foreign- language (Hindi) sentence and word imitation task. We found that a greater left lateralization of the AF volume predicted performance on the imitation task. This result was confirmed by using not only a manual deterministic approach but also an automatic atlas-based fibre-reconstruction method, which in addition pointed out to a specific region in the anterior half of the left AF as the most related to imitation ability. Study III aimed to investigate whether the white-matter structural connectivity of the pathways previously described as targets for plasticity mechanisms in professional musicians predicted musical abilities in non-musicians. We observed that the white- matter microstructural organization of the right hemisphere pathways involved in motor-control (corticospinal tract) and auditory-motor transformations (AF) correlated with the performance of non-musician individuals during the initial stages of rhythmic and melodic learning. The present work confirmed the involvement of several brain structures previously described to display plastic effects associated to music and language training in the first stages of audio-motor learning. Furthermore, they challenge previous views regarding music-induced plasticity by showing that expertise is not always or uniquely correlated with increases in brain tissue. This raises the question of the role of efficiency mechanisms derived from professional-like practice. Most importantly, the results from these three studies converge in showing that a prediction-feedback-feedforward loop for auditory-motor processing may be crucially involved in both musical and language learning and skills. We thus suggest that brain auditory-motor systems previously described as participating in native language processing (cortical areas of the dorsal route for language processing and the AF that connects them) may also be recruited during exposure to new linguistic or musical material, being refined after sustained music practice.[spa] Estudios previos muestran que la formación musical y lingüística provoca cambios plásticos en las estructuras y funciones cerebrales, acompañándose también de beneficios conductuales. Por ejemplo, se ha descrito que los músicos poseen mejores habilidades de discriminación auditiva (incluyendo la percepción tonal y la discriminación del habla en un ambiente ruidoso), una mayor capacidad de sincronización motora, así como mejor memoria verbal y coeficiente intelectual general en comparación con personas sin formación musical. Paralelamente, los bilingües muestran mejores funciones ejecutivas y habilidades relacionadas con la atención en comparación con individuos monolingües. Además, las alteraciones en la conectividad cerebral funcional y estructural pueden ser rastreadas estudiando las áreas cerebrales relacionadas con las actividades más utilizadas por músicos (instrumentistas y cantantes) y expertos lingüísticos (como bilingües o fonetistas profesionales). Pese a que en la última década se han dedicado esfuerzos importantes en el campo de la investigación sobre la plasticidad cerebral, sólo unos pocos estudios han tratado de investigar la conexión entre las propiedades iniciales del cerebro, en cuanto a las funciones y estructuras que se relacionan con las funciones auditivo-motoras, y el posterior aprendizaje musical o del lenguaje. Sin embargo, los marcadores estructurales cerebrales, tales como volumen/densidad de materia gris o medidas de difusividad en la sustancia blanca a partir de datos de imagen del tensor de difusión, así como medidas funcionales de la actividad relacionada con una tarea o datos de resting-state (estado de reposo) obtenidos por resonancia magnética o electroencefalografía, han demostrado que pueden correlacionar con el rendimiento y el aprendizaje en el dominio auditivo- motor. En la presente tesis pretendíamos ampliar nuestro conocimiento en cuanto a la plasticidad cerebral obtenida durante los supuestos “períodos sensibles” y después de la práctica musical mantenida en el tiempo, por un lado, y explorar las vías de sustancia blanca que pueden predecir habilidades lingüísticas o musicales al inicio del aprendizaje, por otro lado. Como objetivos secundarios, queríamos confirmar resultados previos con respecto a las estructuras cerebrales involucradas en el procesamiento de la música y el lenguaje, así como apoyar el uso de mediciones estructurales y enfoques correlacionales (entre datos de neuroimagen y conductuales) para estudiar las diferencias inter- individuales. El Estudio I se centró en la comparación entre pianistas profesionales y no músicos, observando un complejo patrón de aumentos y disminuciones en el volumen de materia gris. En comparación con los individuos no músicos, los pianistas mostraron mayor volumen de sustancia gris en áreas relacionadas con la habilidad motora y la automatización de movimientos aprendidos, así como el aprendizaje a través del refuerzo y el procesamiento emocional, mientras que las regiones asociadas al control sensoriomotor, lectura de partituras y percepción auditiva y musical presentaron una reducción del volumen de materia gris. El Estudio II exploró la relación entre las propiedades estructurales de la materia blanca del fascículo arqueado (AF por sus siglas en inglés) y el rendimiento de hablantes nativos de alemán en una tarea de imitación de frases y palabras en una lengua extranjera (hindi). Encontramos que una mayor lateralización del volumen de AF hacia la izquierda predecía el desempeño en la tarea de imitación. Este resultado se confirmó utilizando no sólo un enfoque determinístico-manual sino también una reconstrucción automática (basada en atlas anatómicos) de las fibras de sustancia blanca que, además, señalaba una región específica en la mitad anterior del AF izquierdo como la más relacionada con las capacidades de imitación. El Estudio III tenía como objetivo investigar si la conectividad estructural de vías de sustancia blanca anteriormente descritas como dianas para los mecanismos de plasticidad en músicos profesionales, podría predecir las habilidades musicales en los no músicos. Se observó que la organización micro-estructural de la materia blanca en el hemisferio derecho en vías involucradas en el control motor (tracto corticoespinal) y en transformaciones auditivo-motoras (AF) correlacionaba con el desempeño de individuos no músicos en las etapas iniciales del aprendizaje rítmico y melódico. El presente trabajo ha confirmado la implicación en las primeras etapas del aprendizaje audio-motor de varias estructuras cerebrales que previamente habían mostrado efectos plásticos asociados al aprendizaje musical y del lenguaje. Además, estos resultados desafían las opiniones anteriores sobre la plasticidad inducida por la experiencia musical al demostrar que la experiencia no se correlaciona siempre ni únicamente con un aumento del tejido cerebral, y planteando así preguntas sobre los mecanismos de eficiencia derivados de la práctica musical a nivel profesional. Más importante aún es que los resultados de estos tres estudios convergen mostrando que un bucle de predicción–retroalimentación (feedback)–alimentación directa (feedforward) para el procesamiento auditivo-motor puede estar implicado de manera crucial tanto en el aprendizaje musical como en el aprendizaje de idiomas. Por tanto, sugerimos que los sistemas auditivo-motrices del cerebro, que previamente se habían descrito como participantes en el procesamiento del lenguaje nativo (áreas corticales involucradas en la vía dorsal para el procesamiento del lenguaje, y el AF, que las conecta) también pueden ser reclutados durante la exposición a material lingüístico o musical nuevo, siendo refinado tras años de práctica musical activ

    Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

    Full text link
    Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Character-level and syntax-level models for low-resource and multilingual natural language processing

    Get PDF
    There are more than 7000 languages in the world, but only a small portion of them benefit from Natural Language Processing resources and models. Although languages generally present different characteristics, “cross-lingual bridges” can be exploited, such as transliteration signals and word alignment links. Such information, together with the availability of multiparallel corpora and the urge to overcome language barriers, motivates us to build models that represent more of the world’s languages. This thesis investigates cross-lingual links for improving the processing of low-resource languages with language-agnostic models at the character and syntax level. Specifically, we propose to (i) use orthographic similarities and transliteration between Named Entities and rare words in different languages to improve the construction of Bilingual Word Embeddings (BWEs) and named entity resources, and (ii) exploit multiparallel corpora for projecting labels from high- to low-resource languages, thereby gaining access to weakly supervised processing methods for the latter. In the first publication, we describe our approach for improving the translation of rare words and named entities for the Bilingual Dictionary Induction (BDI) task, using orthography and transliteration information. In our second work, we tackle BDI by enriching BWEs with orthography embeddings and a number of other features, using our classification-based system to overcome script differences among languages. The third publication describes cheap cross-lingual signals that should be considered when building mapping approaches for BWEs since they are simple to extract, effective for bootstrapping the mapping of BWEs, and overcome the failure of unsupervised methods. The fourth paper shows our approach for extracting a named entity resource for 1340 languages, including very low-resource languages from all major areas of linguistic diversity. We exploit parallel corpus statistics and transliteration models and obtain improved performance over prior work. Lastly, the fifth work models annotation projection as a graph-based label propagation problem for the part of speech tagging task. Part of speech models trained on our labeled sets outperform prior work for low-resource languages like Bambara (an African language spoken in Mali), Erzya (a Uralic language spoken in Russia’s Republic of Mordovia), Manx (the Celtic language of the Isle of Man), and Yoruba (a Niger-Congo language spoken in Nigeria and surrounding countries)

    Projecting named entity tags from a resource rich language to a resource poor language

    Get PDF
    Named Entities (NE) are the prominent entities appearing in textual documents.Automatic classification of NE in a textual corpus is a vital process in Information Extraction and Information Retrieval research. Named Entity Recognition (NER) is the identification of words in text that correspond to a pre-defined taxonomy such as person, organization, location, date, time, etc.This article focuses on the person (PER), organization (ORG) and location (LOC) entities for a Malay journalistic corpus of terrorism.A projection algorithm, using the Dice Coefficient function and bigram scoring method with domain-specific rules, is suggested to map the NE information from the English corpus to the Malay corpus of terrorism.The English corpus is the translated version of the Malay corpus.Hence, these two corpora are treated as parallel corpora. The method computes the string similarity between the English words and the list of available lexemes in a pre-built lexicon that approximates the best NE mapping.The algorithm has been effectively evaluated using our own terrorism tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure.An evaluation of the selected open source NER tool for English is also presented

    Bilingual sentence production and code-switching: Neural network simulations

    Get PDF

    Sequential grouping constraints on across-channel auditory processing

    Get PDF
    corecore