26 research outputs found

    WikiParq: A Tabulated Wikipedia Resource Using the Parquet Format

    Get PDF
    Wikipedia has become one of the most popular resources in natural language processing and it is used in quantities of applications. However, Wikipedia requires a substantial pre-processing step before it can be used. For instance, its set of nonstandardized annotations, referred to as the wiki markup, is language-dependent and needs specific parsers from language to language, for English, French, Italian, etc. In addition, the intricacies of the different Wikipedia resources: main article text, categories, wikidata, infoboxes, scattered into the article document or in different files make it difficult to have global view of this outstanding resource. In this paper, we describe WikiParq, a unified format based on the Parquet standard to tabulate and package the Wikipedia corpora. In combination with Spark, a map-reduce computing framework, and the SQL query language, WikiParq makes it much easier to write database queries to extract specific information or subcorpora from Wikipedia, such as all the first paragraphs of the articles in French, or all the articles on persons in Spanish, or all the articles on persons that have versions in French, English, and Spanish. WikiParq is available in six language versions and is potentially extendible to all the languages of Wikipedia. The WikiParq files are downloadable as tarball archives from this location: http://semantica.cs.lth.se/wikiparq/

    Intellectual Capital's Importance for Corporate Performance

    Get PDF
    The purpose of the thesis is to empirically investigate the relationship between intellectual capital and corporate performance. The estimated models will be used tp predict future corporate performance. Definitions of Intellectual Capital are presented as well as measurements of the concept, with focus on VAIC. Prior research investigating the relationship between intellectual capital and corporate performance are brought forward. A quantitative approach is used to investiagte the relationship between intellectual capital and corporate performance. Panel data regressions are used to analyze the relationship and estimate prediction models. 823 obseravtions during the period 1998-2007 are collected. The analyze shows a positive relationship between intellctual capital and profitability. For accurate predictions of corporate performance, more factors than intellectual capital, form size and leverage, are needed to be included in the model

    Börsmisslyckande - En studie av misslyckade börsintroduktioner pÄ Stockholmsbörsen

    Get PDF
    Vi vill med vÄr uppsats försöka hitta förklaringar till varför börsintroduktioner misslyckas. Med dessa förklarande variabler vill vi försöka skapa en modell som kan förutse huruvida en introduktion kommer bli lyckad eller ej. Teorin bygger pÄ aktuell och tidigare forskning om misslyckade börsintroduktioner. Det redogörs Àven för förekommande begrepp och variabler. Vi anvÀnder ett kvantitativt angreppssÀtt för att besvara uppsatsens problemformulering samt uppfylla dess syfte. För att hitta de variabler som förklarar en misslyckad börsintroduktion anvÀnder vi oss av en logistisk regression. De variabler som vi funnit som kan förklara en misslyckad börsintroduktion Àr företagets Älder, dess skuldsÀttning, huruvida företaget Àr uppbackat av venture capital eller ej samt andelen kvarhÄllna vinster i förhÄllande till totala tillgÄngar. Den modell vi skapar förutser en lyckad respektive misslyckad introduktion i 69 procent av fallen

    EasyNER: A Customizable Easy-to-Use Pipeline for Deep Learning- and Dictionary-based Named Entity Recognition from Medical Text

    Full text link
    Background Medical research generates millions of publications and it is a great challenge for researchers to utilize this information in full since its scale and complexity greatly surpasses human reading capabilities. Automated text mining can help extract and connect information spread across this large body of literature but this technology is not easily accessible to life scientists. Results Here, we developed an easy-to-use end-to-end pipeline for deep learning- and dictionary-based named entity recognition (NER) of typical entities found in medical research articles, including diseases, cells, chemicals, genes/proteins, and species. The pipeline can access and process large medical research article collections (PubMed, CORD-19) or raw text and incorporates a series of deep learning models fine-tuned on the HUNER corpora collection. In addition, the pipeline can perform dictionary-based NER related to COVID-19 and other medical topics. Users can also load their own NER models and dictionaries to include additional entities. The output consists of publication-ready ranked lists and graphs of detected entities and files containing the annotated texts. An associated script allows rapid inspection of the results for specific entities of interest. As model use cases, the pipeline was deployed on two collections of autophagy-related abstracts from PubMed and on the CORD19 dataset, a collection of 764 398 research article abstracts related to COVID-19. Conclusions The NER pipeline we present is applicable in a variety of medical research settings and makes customizable text mining accessible to life scientists

    Is there a connection between experienced realism, rate of fire and loop length on fully automatic rifles in a first-person shooter game in first-person?

    No full text
    This thesis aims to test if there is a connection between modern day fully automatic rifles rate of fire, used loop length in implementation and experienced realism in a first-person shooter game in first-person, fired by the players own character. With a background consisting of papers, books and lectures/conferences given by experienced people and other experts in the game industry regarding first person shooter games, a listening test was conducted and carried out on a computer using headphones with both trained and untrained subjects since players can be both. A simple firing-range was constructed in Unreal Engine 4 (Epic Games, 2017) where the subjects could switch between two weapons with different rates of fire and three versions of each with different loop lengths, 4, 8 and 16. The sounds were divided into layers, e.g. body, mechanical and bottom, played back using looping as implementation. The subjects were also asked to rate the sounds regarding gameplay and preference to see if the results would differ between the three categories. The results showed a tendency to choose the longer loop for all categories, but only four comparisons gave a significant result when doing t-tests

    Building Knowledge Graphs : Processing Infrastructure and Named Entity Linking

    No full text
    Things such as organizations, persons, or locations are ubiquitous in all texts circulating on the internet, particularly in the news, forum posts, and social media. Today, there is more written material than any single person can read through during a typical lifespan. Automatic systems can help us amplify our abilities to find relevant information, where, ideally, a system would learn knowledge from our combined written legacy. Ultimately, this would enable us, one day, to build automatic systems that have reasoning capabilities and can answer any question in any human language.In this work, I explore methods to represent linguistic structures in text, build processing infrastructures, and how they can be combined to process a comprehensive collection of documents. The goal is to extract knowledge from text via things, entities. As text, I focused on encyclopedic resources such as Wikipedia.As knowledge representation, I chose to use graphs, where the entities correspond to graph nodes. To populate such graphs, I created a named entity linker that can find entities in multiple languages such as English, Spanish, and Chinese, and associate them to unique identifiers. In addition, I describe a published state-of-the-art Swedish named entity recognizer that finds mentions of entities in text that I evaluated on the four majority classes in the Stockholm-UmeÄ Corpus (SUC) 3.0. To collect the text resources needed for the implementation of the algorithms and the training of the machine-learning models, I also describe a document representation, Docria, that consists of multiple layers of annotations: A model capable of representing structures found in Wikipedia and beyond. Finally, I describe how to construct processing pipelines for large-scale processing with Wikipedia using Docria

    SĂ€kerstĂ€llning av intern kontroll och finansiell rapportering enligt The Sarbanes- Oxley Act, sektion 404 – En kvalitativ undersökning pĂ„ Volvo Cars Uddevalla

    Get PDF
    Bakgrund och problem: Efter flera stora företagsskandaler i USA dÀr företag manipulerat med sin bokföring för att kunna visa upp bÀttre resultat instiftades The Sarbanes-Oxley Act. Denna lag har som syfte att förebygga liknande framtida skandaler och inriktar sig bland annat pÄ att utkrÀva att företag kan redogöra för att de har adekvata interna kontroller för den finansiella rapporteringen. Syfte: Syftet med den hÀr uppsatsen Àr dels att hjÀlpa Volvo Cars Uddevalla att sÀkra nÄgra av deras interna processer till de krav som stÀlls utifrÄn Sarbanes-Oxley Act samt att studera lagen utifrÄn ett processperspektiv och ett redovisningsperspektiv. AvgrÀnsningar: I denna uppsats har vi enbart fokuserat pÄ den del av Sarbanes-Oxley Act som inriktar sig pÄ den interna kontrollen, frÀmst sektion 404. Uppsatsen Àr dessutom baserat pÄ endast ett fallföretag, nÀmligen Volvo Cars Uddevalla. Metod: Denna uppsats har haft tvÄ olika infallsvinklar. Dels har koppling till det praktiska arbetet som vi gör för VCU haft en undersökande ansats och dels har vi presenterat en forskningsansats dÀr vi kopplar befintlig teori till vÄr undersökning. Den data vi anvÀnts oss av har dels varit primÀrdata insamlat via informella kvalitativa intervjuer och dels sekundÀrdata insamlat via bland annat litteraturstudier. Resultat och slutsatser: I vÄrt arbete pÄ Volvo Cars Uddevalla hittade vi vissa omrÄden som behövdes förbÀttras för att leva upp till de krav som Sarbanes-Oxley Act stÀller. Ur ett processperspektiv anser vi att arbeten med att sÀkra verksamheter till Sarbanes-Oxley Act bör Àven ha processorienterade inslag för att ta vara pÄ de organisationsförbÀttrandefilosofierna som föresprÄkas dÀr. Ur ett redovisningsperspektiv ser vi att en konsekvens av Sarbanes-Oxley Act blir att den interna styrning i allt större utstrÀckning kopplas till de normer som finns för den externa redovisningen. Förslag till fortsatt forskning: DÄ Sarbanes-Oxley Act och dess implikationer Àr ett synnerligen aktuellt Àmne föresprÄkar vi vidare studier inom omrÄdet. Exempel pÄ aspekter som bör studeras enligt oss Àr bland annat ifall Sarbanes-Oxley Act kommer leda till ökad eller minskad effektivitet i företag, hur lagen kommer att uppfattas av medarbetare samt ifall lagen de facto kommer att innebÀra att den interna styrning i allt större utstrÀckning blir styrd och kopplad till de normer som finns för den externa redovisningen

    Linking, Searching, and Visualizing Entities in Wikipedia

    No full text
    In this paper, we describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, we designed a high-performance, multilingual, entity linker and we used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a speciïŹc annotation, here the entities. We evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and we reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. We applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identiïŹers from Wikidat

    ”Det handlar om att hitta mervĂ€rdet.” : LĂ€rares erfarenheter av analogt och digitalt skrivande i Ă€mnet svenska.

    No full text
    Syftet med studien Àr att öka kunskapen om lÀrares didaktiska arbete i skolÀmnet svenska med fokus pÄ analogt och digitalt skrivande. Studien som Àr av kvalitativt slag bygger pÄ tio semistrukturerade intervjuer med svensklÀrare pÄ mellanstadiet som arbetar pÄ fyra olika skolor inom samma kommun, dÀr tvÄ av skolorna arbetar enligt modellen STL (skriva sig till lÀrande) och tvÄ av skolorna inte har nÄgon uttalad modell kopplad till undervisningen, i syfte att undersöka om det fanns nÄgra skillnader hos dessa lÀrare. Som teoretisk ansats anvÀnds ramverket TPACK i syfte att beskriva hur lÀrarna anvÀnder digitala verktyg i sin skrivundervisning och teori om kognitiva skrivprocesser anvÀnds i syfte att belysa hur elevers skrivande förÀndras beroende pÄ vilket skrivverktyg som anvÀnds. Resultatet i studien visar bland annat att lÀrarna som arbetar enligt STL har fÄtt mer kompetensutveckling kopplat till digitala verktyg och digitalt skrivande samt att de har en mer samstÀmmig syn pÄ hur de digitala verktygen ska anvÀndas i undervisningen för att gynna eleverna mest möjligt, jÀmfört med de lÀrare som inte arbetar efter nÄgon uttalad modell.
    corecore