14 research outputs found

    Public Opinion on National Exam Policies in Indonesia

    Get PDF
    Abstract Every new policy by Indonesian government in National Examination (NE) implementation always obtains different respond from public. Since the implementation, NE system already experienced many changes, but in recent years this system receives serious critiques. As a result, government then abolished this system as graduation determinant in 2014. This research analyzes public opinion, in the form of positive and negative sentiment toward NE policy, and factors that drive the opinions. Data in this research obtained from online news media from 2012 to 2015. The result shows that public sentiment fluctuating from year to year and depends on three important factors, i.e. political pressure, extreme events, and media coverage

    Journalistic transparency using CRFs to identify the reporter of newspaper articles in Spanish

    Full text link
    Journalistic transparency rises as a key issue against the lack of credibility to which journalists are exposed, as well as the media manipulators and fake news providers. With the use of Natural Language Processing (NLP) and Machine Learning (ML), it is possible to automate the extraction of information from newspaper articles to know what the sources of information are to verify their veracity. Along with this article, we present the application of Conditional Random Fields (CRFs) for a specific type of Entity Recognition (ER) task, namely, to identify what we have called the “reporter” in newspaper articles, i.e., who or what is the provider of the information. Thus, we have created a labelled corpus for the Spanish language and trained and analysed several CRFs models with a set of specific features. The obtained results suppose a solid baseline for our goal.This research work has been co-funded by Display Connectors S.L. through the project entitled \Identi- fying relevant entities in newspaper articles"(in Spanish \Identi caci on de entidades relevantes en noticias period sticas"), and by the Madrid Regional Government through the project e-Madrid-CM (P2018/TCS- 4307). The e-Madrid-CM project is also co- nanced by the Structural Funds (FSE and FEDER). Also, we give special thanks to the people from the P ublico online newspaper for their work and support

    Information extraction framework for disability determination using a mental functioning use-case

    Get PDF
    Natural language processing (NLP) in health care enables transformation of complex narrative information into high value products such as clinical decision support and adverse event monitoring in real time via the electronic health record (EHR). However, information technologies for mental health have consistently lagged because of the complexity of measuring and modeling mental health and illness. The use of NLP to support management of mental health conditions is a viable topic that has not been explored in depth. This paper provides a framework for the advanced application of NLP methods to identify, extract, and organize information on mental health and functioning to inform the decision-making process applied to assessing mental health. We present a use-case related to work disability, guided by the disability determination process of the US Social Security Administration (SSA). From this perspective, the following questions must be addressed about each problem that leads to a disability benefits claim: When did the problem occur and how long has it existed? How severe is it? Does it affect the person’s ability to work? and What is the source of the evidence about the problem? Our framework includes 4 dimensions of medical information that are central to assessing disability—temporal sequence and duration, severity, context, and information source. We describe key aspects of each dimension and promising approaches for application in mental functioning. For example, to address temporality, a complete functional timeline must be created with all relevant aspects of functioning such as intermittence, persistence, and recurrence. Severity of mental health symptoms can be successfully identified and extracted on a 4-level ordinal scale from absent to severe. Some NLP work has been reported on the extraction of context for specific cases of wheelchair use in clinical settings. We discuss the links between the task of information source assessment and work on source attribution, coreference resolution, event extraction, and rule-based methods. Gaps were identified in NLP applications that directly applied to the framework and in existing relevant annotated data sets. We highlighted NLP methods with the potential for advanced application in the field of mental functioning. Findings of this work will inform the development of instruments for supporting SSA adjudicators in their disability determination process. The 4 dimensions of medical information may have relevance for a broad array of individuals and organizations responsible for assessing mental health function and ability. Further, our framework with 4 specific dimensions presents significant opportunity for the application of NLP in the realm of mental health and functioning beyond the SSA setting, and it may support the development of robust tools and methods for decision-making related to clinical care, program implementation, and other outcomes

    A Dutch coreference resolution system with an evaluation on literary fiction

    Get PDF
    Coreference resolution is the task of identifying descriptions that refer to the same entity. In this paper we consider the task of entity coreference resolution for Dutch with a particular focus on literary texts. We make three main contributions. First, we propose a simplified annotation scheme to reduce annotation effort. This scheme is used for the annotation of a corpus of 107k tokens from 21 contemporary works of literature. Second, we present a rule-based coreference resolution system for Dutch based on the Stanford deterministic multi-sieve coreference architecture and heuristic rules for quote attribution. Our system (dutchcoref) forms a simple but strong baseline and improves on previous systems in shared task evaluations. Finally, we perform an evaluation and error analysis on literary texts which highlights difficult cases of coreference in general, and the literary domain in particular. The code of our system is made available at https://github.com/andreasvc/dutchcoref

    Extracting and Attributing Quotes in Text and Assessing them as Opinions

    Get PDF
    News articles often report on the opinions that salient people have about important issues. While it is possible to infer an opinion from a person's actions, it is much more common to demonstrate that a person holds an opinion by reporting on what they have said. These instances of speech are called reported speech, and in this thesis we set out to detect instances of reported speech, attribute them to their speaker, and to identify which instances provide evidence of an opinion. We first focus on extracting reported speech, which involves finding all acts of communication that are reported in an article. Previous work has approached this task with rule-based methods, however there are several factors that confound these approaches. To demonstrate this, we build a corpus of 965 news articles, where we mark all instances of speech. We then show that a supervised token-based approach outperforms all of our rule-based alternatives, even in extracting direct quotes. Next, we examine the problem of finding the speaker of each quote. For this task we annotate the same 965 news articles with links from each quote to its speaker. Using this, and three other corpora, we develop new methods and features for quote attribution, which achieve state-of-the-art accuracy on our corpus and strong results on the others. Having extracted quotes and determined who spoke them, we move on to the opinion mining part of our work. Most of the task definitions in opinion mining do not easily work with opinions in news, so we define a new task, where the aim is to classify whether quotes demonstrate support, neutrality, or opposition to a given position statement. This formulation improved annotator agreement when compared to our earlier annotation schemes. Using this we build an opinion corpus of 700 news documents covering 7 topics. In this thesis we do not attempt this full task, but we do present preliminary results

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation
    corecore