32,658 research outputs found

    Non-Standard Words as Features for Text Categorization

    Full text link
    This paper presents categorization of Croatian texts using Non-Standard Words (NSW) as features. Non-Standard Words are: numbers, dates, acronyms, abbreviations, currency, etc. NSWs in Croatian language are determined according to Croatian NSW taxonomy. For the purpose of this research, 390 text documents were collected and formed the SKIPEZ collection with 6 classes: official, literary, informative, popular, educational and scientific. Text categorization experiment was conducted on three different representations of the SKIPEZ collection: in the first representation, the frequencies of NSWs are used as features; in the second representation, the statistic measures of NSWs (variance, coefficient of variation, standard deviation, etc.) are used as features; while the third representation combines the first two feature sets. Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms were used in text categorization experiments. The best categorization results are achieved using the first feature set (NSW frequencies) with the categorization accuracy of 87%. This suggests that the NSWs should be considered as features in highly inflectional languages, such as Croatian. NSW based features reduce the dimensionality of the feature space without standard lemmatization procedures, and therefore the bag-of-NSWs should be considered for further Croatian texts categorization experiments.Comment: IEEE 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014), pp. 1415-1419, 201

    IIMI style guide

    Get PDF
    Documentation / Style manuals / Communication

    Product specification documentation standard and Data Item Descriptions (DID). Volume of the information system life-cycle and documentation standards, volume 3

    Get PDF
    This is the third of five volumes on Information System Life-Cycle and Documentation Standards which present a well organized, easily used standard for providing technical information needed for developing information systems, components, and related processes. This volume states the Software Management and Assurance Program documentation standard for a product specification document and for data item descriptions. The framework can be applied to any NASA information system, software, hardware, operational procedures components, and related processes

    Spanish named entity recognition in the biomedical domain

    Get PDF
    Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft

    Influence of a knee brace intervention on perceived pain and patellofemoral loading in recreational athletes

    Get PDF
    Background: The current investigation aimed to investigate the effects of an intervention using knee bracing on pain symptoms and patellofemoral loading in male and female recreational athletes. Methods: Twenty participants (11 males & 9 females) with patellofemoral pain were provided with a knee brace which they wore for a period of 2 weeks. Lower extremity kinematics and patellofemoral loading were obtained during three sport specific tasks, jog, cut and single leg hop. In addition their self-reported knee pain scoreswere examined using the Knee injury and Osteoarthritis Outcome Score. Datawere collected before and after wearing the knee brace for 2 weeks. Findings: Significant reductions were found in the run and cut movements for peak patellofemoral force/pressure and in all movements for the peak knee abduction moment when wearing the brace. Significant improvements were also shown for Knee injury and Osteoarthritis Outcome Score subscale symptoms (pre: male= 70.27, female = 73.22 & post: male = 85.64, female = 82.44), pain (pre: male = 72.36, female = 78.89 & post: male = 85.73, female = 84.20), sport (pre: male = 60.18, female = 59.33 & post: male = 80.91, female =79.11), function and daily living (pre: male = 82.18, female = 86.00 & post: male = 88.91, female = 90.00) and quality of life (pre: male= 51.27, female= 54.89 & post: male= 69.36, female= 66.89). Interpretation:Male and female recreational athleteswho suffer frompatellofemoral pain can be advised to utilise knee bracing as a conservative method to reduce pain symptoms

    NASA Publications Guide

    Get PDF
    The publication programs and management policies of NASA are described and the details that authors and publication specialists need to know to carry out the agency's mission of disseminating the scientific and technical information derived from its activities are highlighted. Topics covered include the various kinds of NASA formal publications; selection of publication medium; printing and distribution; and requirements concerning style and format standards, copyright transfers, the cover, color, and foldouts. The sections of a report are delineated and editorial and page make-up responsibilities are also discussed

    Automatic Matching and Expansion of Abbreviated Phrases without Context

    Get PDF
    International audienceIn many documents, like receipts or invoices, textual information is constrained by the space and organization of the document. The document information has no natural language context, and expressions are often abbreviated to respect the graphical layout, both at word level and phrase level. In order to analyze the semantic content of these types of document, we need to understand each phrase, and particularly each name of sold products. In this paper, we propose an approach to find the right expansion of abbreviations and acronyms, without context. First, we extract information about sold products from our receipts corpus and we analyze the different linguistic processes of abbreviation. Then, we retrieve a list of expanded names of products sold by the company that emitted receipts, and we propose an algorithm to pair extracted names of products with the corresponding expansions. We provide the research community with a unique document collection for abbreviation expansion

    ILRI style guide for editors and writers

    Get PDF

    OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents

    Get PDF
    Motivation: Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. Results: We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. Availability: The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. Contact: [email protected]
    corecore