11 research outputs found
Constructing syntax-based distributional semantic models for novel languages
Rechner-gestützte Modelle von Wortbedeutung bedürfen typischerweise umfangreiche Textdaten in der gewünschten Zielsprache.
Heutzutage sorgt die ständig wachsende Anzahl von frei verfügbaren Webseiten dafür, dass die Erstellung solcher distributionellen semantischen Modellen (DSMs), welche robust und von hoher lexikalischen Abdeckung sind, in immer mehr Sprachen möglich wird.
Zu den vielseitigsten DSMs gehören die strukturierten DSMs (SDSMs), welche den Kontextbegriff über einfache Nachbarworten auf syntaktische und andere Relationen ausdehnen.
Dadurch erlauben sie Ähnlichkeitsvorhersagen, die über die thematischen Bedeutungsaspekte eines Wortes, oder gar einer syntaktischen Verknüpfung von Wörtern, hinaus auch die relationaler Natur einbeziehen.
Textdaten alleine reichen jedoch nicht aus, um SDSMs zu konstruieren. Es werden zuverlässige und effiziente Parser in der Zielsprache benötigt, um die syntaktischen Analysen zu erhalten; was zur Folge hat, dass momentan leider nur wenige Sprachen von solchen Modellen profitieren können.
Diese Dissertation untersucht Verfahren, die es erlauben, für neue Sprachen strukturierte distributionelle semantische Modelle zu erzeugen und testet diese auf einer Reihe von semantischen Aufgaben.
Es wird zunächst ein monolinguales SDSM von einem zielsprachigen Textcorpus mittler Größe erzeugt; werden Methoden ermittelt, mit denen man ausschließlich mithilfe eines einfachen bilingualen Lexikons ein cross-linguales SDSM.
Weiter wird aufgezeigt, wie diese zwei SDSM-Typen verknüpft werden können, um ein multilinguales Modell zu erhalten, welches die Vorteile beider Eingabemodelle behält und somit hohe Abdeckungsraten mit genauen Vorhersagen aufweist
A Graph-Theoretic Algorithm for Automatic Extension of Translation Lexicons
This paper presents a graph-theoretic approach to the identification of yetunknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.
Plasma metabolomics profiles in Black and White participants of the Adventist Health Study-2 cohort
Abstract Background Black Americans suffer disparities in risk for cardiometabolic and other chronic diseases. Findings from the Adventist Health Study-2 (AHS-2) cohort have shown associations of plant-based dietary patterns and healthy lifestyle factors with prevention of such diseases. Hence, it is likely that racial differences in metabolic profiles correlating with disparities in chronic diseases are explained largely by diet and lifestyle, besides social determinants of health. Methods Untargeted plasma metabolomics screening was performed on plasma samples from 350 participants of the AHS-2, including 171 Black and 179 White participants, using ultrahigh-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) and a global platform of 892 metabolites. Differences in metabolites or biochemical subclasses by race were analyzed using linear regression, considering various models adjusted for known confounders, dietary and/or other lifestyle behaviors, social vulnerability, and psychosocial stress. The Storey permutation approach was used to adjust for false discovery at FDR < 0.05. Results Linear regression revealed differential abundance of over 40% of individual metabolites or biochemical subclasses when comparing Black with White participants after adjustment for false discovery (FDR < 0.05), with the vast majority showing lower abundance in Blacks. Associations were not appreciably altered with adjustment for dietary patterns and socioeconomic or psychosocial stress. Metabolite subclasses showing consistently lower abundance in Black participants included various lipids, such as lysophospholipids, phosphatidylethanolamines, monoacylglycerols, diacylglycerols, and long-chain monounsaturated fatty acids, among other subclasses or lipid categories. Among all biochemical subclasses, creatine metabolism exclusively showed higher abundance in Black participants, although among metabolites within this subclass, only creatine showed differential abundance after adjustment for glomerular filtration rate. Notable metabolites in higher abundance in Black participants included methyl and propyl paraben sulfates, piperine metabolites, and a considerable proportion of acetylated amino acids, including many previously found associated with glomerular filtration rate. Conclusions Differences in metabolic profiles were evident when comparing Black and White participants of the AHS-2 cohort. These differences are likely attributed in part to dietary behaviors not adequately explained by dietary pattern covariates, besides other environmental or genetic factors. Alterations in these metabolites and associated subclasses may have implications for the prevention of chronic diseases in Black Americans