166 research outputs found

    Neural Combinatory Constituency Parsing

    Get PDF
    東京都立大学Tokyo Metropolitan University博士(情報科学)doctoral thesi

    The Artisan Teacher: A Field Guide to Skillful Teaching

    Get PDF
    A capstone submitted in partial fulfillment of the requirements for the degree of Doctor of Education in the College of Education at Morehead State University by Michael A. Rutherford on March 26, 2013

    USTAŠKA PROMIDŽBA O KONFERENCIJI „VELIKE TROJICE“ U TEHERANU 1943. I ZNAČAJ SAVEZNIČKIH ODLUKA ZA ISHOD DRUGOG SVJETSKOG RATA, PORAZ TREĆEG REICHA I SLOM NEZAVISNE DRŽAVE HRVATSKE

    Get PDF
    Based on the primary documents of the Croatian State Archive, the Fonds of the Government Presidency of the Independent State of Croatia (the NDH), the documents on the Great Alliance, 1942-1943 (the Tehran Conference), and information from daily and periodical journals, the author of the article explains the ways in which the public was informed in the NDH and how media was governed in a totalitarian state like the NDH. The paper, which is dedicated to the crucial period of World War Two, additionally analyzes topics such as the decisions of the Alliance in Tehran, the reaction of the Ustasha government to the political and military plans of the Alliance, as well as the consequences for the NDH authorities, which the said decisions brought about. The first meeting between the Big Three was considered by the Ustasha authorities to be a proof of the Soviet victory and dominance in south-eastern parts of Europe, as well as a step towards the restitution of Yugoslavia led by the Bolsheviks, or Tito’s Partisans, under the presumption that the Third Reich was defeated. Using the influence of the current press, the Ustasha regime tried to form the public opinion that there was no alternative to the alliance between the NDH and the Third Reich and that any other solution, apart from the victory of the Axis powers, would lead to the loss of the state.Na temelju dijela izvorne građe, Fonda Predsjedništva vlade Nezavisne Države Hrvatske i zapisa sastanaka „velike trojice“ u Teheranu te dnevnih i periodičnih tiskovina koje su izlazile u Nezavisnoj Državi Hrvatskoj, autor članka prije svega pruža uvid u način javnog informiranja i položaj tiskovnih medija u totalitarnoj državi, kakva je bila NDH. Uz to, u radu se, koji obrađuje prijelomno razdoblje Drugog svjetskog rata posebno tematiziraju odluke Saveznika iz Teherana, reagiranja ustaških vlasti prema političkim i vojnim odlukama Saveznika, kao i posljedice koje su po vlasti NDH proizašle iz dogovora Saveznika. Ustaške vlasti, autor zaključuje, prvi sastanak predsjednika SAD-a, SSSR-a i Velike Britanije u Teheranu ocjenjuju kao potvrdu podređenog položaja zapadnih Saveznika u sprezi sa Sovjetskim Savezom, sovjetske dominacije na jugoistoku Europe i korak ka obnovi Jugoslavije, ali pod vodstvom boljševika, Titovih partizana, bude li Treći Reich poražen. Ustaški režim, služeći se tada dominantnim tiskovnim medijima, nastojao je stvoriti opće uvjerenje među građanima NDH kako alternative savezništvu s Trećim Reichom, zapravo nema te da bi svako drugo vojno i političko rješenje, izuzev pobjede sila Trojnog sporazuma, značio gubitak države

    USTAŠKA PROMIDŽBA O KONFERENCIJI „VELIKE TROJICE“ U TEHERANU 1943. I ZNAČAJ SAVEZNIČKIH ODLUKA ZA ISHOD DRUGOG SVJETSKOG RATA, PORAZ TREĆEG REICHA I SLOM NEZAVISNE DRŽAVE HRVATSKE

    Get PDF
    Based on the primary documents of the Croatian State Archive, the Fonds of the Government Presidency of the Independent State of Croatia (the NDH), the documents on the Great Alliance, 1942-1943 (the Tehran Conference), and information from daily and periodical journals, the author of the article explains the ways in which the public was informed in the NDH and how media was governed in a totalitarian state like the NDH. The paper, which is dedicated to the crucial period of World War Two, additionally analyzes topics such as the decisions of the Alliance in Tehran, the reaction of the Ustasha government to the political and military plans of the Alliance, as well as the consequences for the NDH authorities, which the said decisions brought about. The first meeting between the Big Three was considered by the Ustasha authorities to be a proof of the Soviet victory and dominance in south-eastern parts of Europe, as well as a step towards the restitution of Yugoslavia led by the Bolsheviks, or Tito’s Partisans, under the presumption that the Third Reich was defeated. Using the influence of the current press, the Ustasha regime tried to form the public opinion that there was no alternative to the alliance between the NDH and the Third Reich and that any other solution, apart from the victory of the Axis powers, would lead to the loss of the state.Na temelju dijela izvorne građe, Fonda Predsjedništva vlade Nezavisne Države Hrvatske i zapisa sastanaka „velike trojice“ u Teheranu te dnevnih i periodičnih tiskovina koje su izlazile u Nezavisnoj Državi Hrvatskoj, autor članka prije svega pruža uvid u način javnog informiranja i položaj tiskovnih medija u totalitarnoj državi, kakva je bila NDH. Uz to, u radu se, koji obrađuje prijelomno razdoblje Drugog svjetskog rata posebno tematiziraju odluke Saveznika iz Teherana, reagiranja ustaških vlasti prema političkim i vojnim odlukama Saveznika, kao i posljedice koje su po vlasti NDH proizašle iz dogovora Saveznika. Ustaške vlasti, autor zaključuje, prvi sastanak predsjednika SAD-a, SSSR-a i Velike Britanije u Teheranu ocjenjuju kao potvrdu podređenog položaja zapadnih Saveznika u sprezi sa Sovjetskim Savezom, sovjetske dominacije na jugoistoku Europe i korak ka obnovi Jugoslavije, ali pod vodstvom boljševika, Titovih partizana, bude li Treći Reich poražen. Ustaški režim, služeći se tada dominantnim tiskovnim medijima, nastojao je stvoriti opće uvjerenje među građanima NDH kako alternative savezništvu s Trećim Reichom, zapravo nema te da bi svako drugo vojno i političko rješenje, izuzev pobjede sila Trojnog sporazuma, značio gubitak države

    Constructing and modeling text-rich information networks: a phrase mining-based approach

    Get PDF
    A lot of digital ink has been spilled on "big data" over the past few years, which is often characterized by an explosion of information. Most of this surge owes its origin to the unstructured data in the wild like words, images and video as comparing to the structured information stored in fielded form in databases. The proliferation of text-heavy data is particularly overwhelming, reflected in everyone's daily life in forms of web documents, business reviews, news, social posts, etc. In the mean time, textual data and structured entities often come in intertwined, such as authors/posters, document categories and tags, and document-associated geo locations. With this background, a core research challenge presents itself as how to turn massive, (semi-)unstructured data into structured knowledge. One promising paradigm studied in this dissertation is to integrate structured and unstructured data, constructing an organized heterogeneous information network, and developing powerful modeling mechanisms on such organized network. We name it text-rich information network, since it is an integrated representation of both structured and unstructured textual data. To thoroughly develop the construction and modeling paradigm, this dissertation will focus on forming a scalable data-driven framework and propose a new line of techniques relying on the idea of phrase mining to bridge textual documents and structured entities. We will first introduce the phrase mining method named SegPhrase+ to globally discover semantically meaningful phrases from massive textual data, providing a high quality dictionary for text structuralization. Clearly distinct from previous works that mostly focused on raw statistics of string matching, SegPhrase+ looks into the phrase context and effectively rectifies raw statistics to significantly boost the performance. Next, a novel algorithm based on latent keyphrases is developed and adopted to largely eliminate irregularities in massive text via providing an consistent and interpretable document representation. As a critical process in constructing the network, it uses the quality phrases generated in the previous step as candidates. From them a set of keyphrases are extracted to represent a particular document with inferred strength through a statistical model. After this step, documents become more structured and are consistently represented in the form of a bipartite network connecting documents with quality keyphrases. A more heterogeneous text-rich information network can be constructed by incorporating different types of document-associated entities as additional nodes. Lastly, a general and scalable framework, Tensor2vec, are to be added to trational data minining machanism, as the latter cannot readily solve the problem when the organized heterogeneous network has nodes with different types. Tensor2vec is expected to elegantly handle relevance search, entity classification, summarization and recommendation problems, by making use of higher-order link information and projecting multi-typed nodes into a shared low-dimensional vectorial space such that node proximity can be easily computed and accurately predicted

    Integrating deep and shallow natural language processing components : representations and hybrid architectures

    Get PDF
    We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen für die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten für natürliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer Einführung in constraintbasierte Analyse natürlicher Sprache geben wir einen Überblick über typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir führen XML Standoff-Markup als zusätzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache für die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere Beiträge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und Unterstützung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, Kreativitätsunterstützung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage für semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform für weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert

    Empirical machine translation and its evaluation

    Get PDF
    Aquesta tesi estudia l'aplicació de les tecnologies del Processament del Llenguatge Natural disponibles actualment al problema de la Traducció Automàtica basada en Mètodes Empírics i la seva Avaluació.D'una banda, tractem el problema de l'avaluació automàtica. Hem analitzat les principals deficiències dels mètodes d'avaluació actuals, les quals es deuen, al nostre parer, als principis de qualitat superficials en els que es basen. En comptes de limitar-nos al nivell lèxic, proposem una nova direcció cap a avaluacions més heterogènies. El nostre enfocament es basa en el disseny d'un ric conjunt de mesures automàtiques destinades a capturar un ampli ventall d'aspectes de qualitat a diferents nivells lingüístics (lèxic, sintàctic i semàntic). Aquestes mesures lingüístiques han estat avaluades sobre diferents escenaris. El resultat més notable ha estat la constatació de que les mètriques basades en un coneixement lingüístic més profund (sintàctic i semàntic) produeixen avaluacions a nivell de sistema més fiables que les mètriques que es limiten a la dimensió lèxica, especialment quan els sistemes avaluats pertanyen a paradigmes de traducció diferents. Tanmateix, a nivell de frase, el comportament d'algunes d'aquestes mètriques lingüístiques empitjora lleugerament en comparació al comportament de les mètriques lèxiques. Aquest fet és principalment atribuïble als errors comesos pels processadors lingüístics. A fi i efecte de millorar l'avaluació a nivell de frase, a més de recòrrer a la similitud lèxica en absència d'anàlisi lingüística, hem estudiat la possibiliat de combinar les puntuacions atorgades per mètriques a diferents nivells lingüístics en una sola mesura de qualitat. S'han presentat dues estratègies no paramètriques de combinació de mètriques, essent el seu principal avantatge no haver d'ajustar la contribució relativa de cadascuna de les mètriques a la puntuació global. A més, el nostre treball mostra com fer servir el conjunt de mètriques heterogènies per tal d'obtenir detallats informes d'anàlisi d'errors automàticament.D'altra banda, hem estudiat el problema de la selecció lèxica en Traducció Automàtica Estadística. Amb aquesta finalitat, hem construit un sistema de Traducció Automàtica Estadística Castellà-Anglès basat en -phrases', i hem iterat en el seu cicle de desenvolupament, analitzant diferents maneres de millorar la seva qualitat mitjançant la incorporació de coneixement lingüístic. En primer lloc, hem extès el sistema a partir de la combinació de models de traducció basats en anàlisi sintàctica superficial, obtenint una millora significativa. En segon lloc, hem aplicat models de traducció discriminatius basats en tècniques d'Aprenentatge Automàtic. Aquests models permeten una millor representació del contexte de traducció en el que les -phrases' ocorren, efectivament conduint a una millor selecció lèxica. No obstant, a partir d'avaluacions automàtiques heterogènies i avaluacions manuals, hem observat que les millores en selecció lèxica no comporten necessàriament una millor estructura sintàctica o semàntica. Així doncs, la incorporació d'aquest tipus de prediccions en el marc estadístic requereix, per tant, un estudi més profund.Com a qüestió complementària, hem estudiat una de les principals crítiques en contra dels sistemes de traducció basats en mètodes empírics, la seva forta dependència del domini, i com els seus efectes negatius poden ésser mitigats combinant adequadament fonts de coneixement externes. En aquest sentit, hem adaptat amb èxit un sistema de traducció estadística Anglès-Castellà entrenat en el domini polític, al domini de definicions de diccionari.Les dues parts d'aquesta tesi estan íntimament relacionades, donat que el desenvolupament d'un sistema real de Traducció Automàtica ens ha permès viure en primer terme l'important paper dels mètodes d'avaluació en el cicle de desenvolupament dels sistemes de Traducció Automàtica.In this thesis we have exploited current Natural Language Processing technology for Empirical Machine Translation and its Evaluation.On the one side, we have studied the problem of automatic MT evaluation. We have analyzed the main deficiencies of current evaluation methods, which arise, in our opinion, from the shallow quality principles upon which they are based. Instead of relying on the lexical dimension alone, we suggest a novel path towards heterogeneous evaluations. Our approach is based on the design of a rich set of automatic metrics devoted to capture a wide variety of translation quality aspects at different linguistic levels (lexical, syntactic and semantic). Linguistic metrics have been evaluated over different scenarios. The most notable finding is that metrics based on deeper linguistic information (syntactic/semantic) are able to produce more reliable system rankings than metrics which limit their scope to the lexical dimension, specially when the systems under evaluation are different in nature. However, at the sentence level, some of these metrics suffer a significant decrease, which is mainly attributable to parsing errors. In order to improve sentence-level evaluation, apart from backing off to lexical similarity in the absence of parsing, we have also studied the possibility of combining the scores conferred by metrics at different linguistic levels into a single measure of quality. Two valid non-parametric strategies for metric combination have been presented. These offer the important advantage of not having to adjust the relative contribution of each metric to the overall score. As a complementary issue, we show how to use the heterogeneous set of metrics to obtain automatic and detailed linguistic error analysis reports.On the other side, we have studied the problem of lexical selection in Statistical Machine Translation. For that purpose, we have constructed a Spanish-to-English baseline phrase-based Statistical Machine Translation system and iterated across its development cycle, analyzing how to ameliorate its performance through the incorporation of linguistic knowledge. First, we have extended the system by combining shallow-syntactic translation models based on linguistic data views. A significant improvement is reported. This system is further enhanced using dedicated discriminative phrase translation models. These models allow for a better representation of the translation context in which phrases occur, effectively yielding an improved lexical choice. However, based on the proposed heterogeneous evaluation methods and manual evaluations conducted, we have found that improvements in lexical selection do not necessarily imply an improved overall syntactic or semantic structure. The incorporation of dedicated predictions into the statistical framework requires, therefore, further study.As a side question, we have studied one of the main criticisms against empirical MT systems, i.e., their strong domain dependence, and how its negative effects may be mitigated by properly combining outer knowledge sources when porting a system into a new domain. We have successfully ported an English-to-Spanish phrase-based Statistical Machine Translation system trained on the political domain to the domain of dictionary definitions.The two parts of this thesis are tightly connected, since the hands-on development of an actual MT system has allowed us to experience in first person the role of the evaluation methodology in the development cycle of MT systems
    corecore