8,108 research outputs found
Recommended from our members
Coreference resolution in clinical discharge summaries, progress notes, surgical and pathology reports: a unified lexical approach
We developed a lexical rule-based system that uses a unified approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA) provided for the fifth i2b2/VA shared task. Taking the unweighted mean between 4 coreference metrics, validation of the system against the i2b2/VA corpus attained an overall F-score of 87.7% across all mention classes, with a maximum of 93.1% for coreference of persons, and a minimum of 77.2% for coreference of tests. For the ODIE corpus the overall F-score across all mention classes was 79.4%, with a maximum of 82.0% for coreference of persons and a minimum of 13.1% for coreference of diagnostic reagents. For the ODIE corpus our results are comparable to the mean reported inter-annotator agreement with the gold standard. We discuss the four categories of errors we identified, and how these might be addressed. The system uses a number of reusable modules and techniques that may be of benefit to the research community
Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance
Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor
2013 curriculum is a new curriculum in the Indonesian education system which has been enacted by the government to replace KTSP curriculum. The implementation of this curriculum in the last few years has sparked various opinions among students, teachers, and public in general, especially on social media twitter. In this study, a sentimental analysis on 2013 curriculum is conducted. Ensemble of several feature sets were used twitter specific features, textual features, Parts of Speech (POS) features, lexicon based features, and Bag of Words (BOW) features for the sentiment classification using K-Nearest Neighbor method. The experiment result showed that the the ensemble features have the best performance of sentiment classification compared to only using individual features. The best accuracy using ensemble features is 96% when k=5 is used
Europe in the shadow of financial crisis: Policy Making via Stance Classification
Since 2009, the European Union (EU) is phasing a multiâyear financial crisis affecting the stability of its involved countries. Our goal is to gain useful insights on the societal impact of such a strong political issue through the exploitation of topic modeling and stance classification techniques. \ \ To perform this, we unravel publicâs stance towards this event and empower citizensâ participation in the decision making process, taking policyâs life cycle as a baseline. The paper introduces and evaluates a bilingual stance classification architecture, enabling a deeper understanding of how citizensâ sentiment polarity changes based on the critical political decisions taken among European countries. \ \ Through three novel empirical studies, we aim to explore and answer whether stance classification can be used to: i) determine citizensâ sentiment polarity for a series of political events by observing the diversity of opinion among European citizens, ii) predict political decisions outcome made by citizens such as a referendum call, ii) examine whether citizensâ sentiments agree with governmental decisions during each stage of a policy life cycle.
Context Aware Textual Entailment
In conversations, stories, news reporting, and other forms of natural language, understanding requires participants to make assumptions (hypothesis) based on background knowledge, a process called entailment. These assumptions may then be supported, contradicted, or refined as a conversation or story progresses and additional facts become known and context changes. It is often the case that we do not know an aspect of the story with certainty but rather believe it to be the case; i.e., what we know is associated with uncertainty or ambiguity. In this research a method has been developed to identify different contexts of the input raw text along with specific features of the contexts such as time, location, and objects. The method includes a two-phase SVM classifier along with a voting mechanism in the second phase to identify the contexts. Rule-based algorithms were utilized to extract the context elements. This research also develops a new contextËaware text representation. This representation maintains semantic aspects of sentences, as well as textual contexts and context elements. The method can offer both graph representation and First-Order-Logic representation of the text. This research also extracts a First-Order Logic (FOL) and XML representation of a text or series of texts. The method includes entailment using background knowledge from sources (VerbOcean and WordNet), with resolution of conflicts between extracted clauses, and handling the role of context in resolving uncertain truth
Decolonizing the Brazilian EFL Classroom: Creating Space for Afro-Brazilian Students of English
ABSTRACT
Afro-Brazilians constitute the majority of Brazilâs total population. When compared to White Brazilians, Afro-Brazilians are more than twice as likely to live in abject poverty. These striking disparities have significant implications for this community and the socioeconomic well-being of the entire country. Securing access to quality secondary education is imperative for the Black communities of Brazil to ascend out of poverty and hardship.
Completing a foreign language program, typically English, followed by successfully passing a rigorous competency exam, is a prerequisite to obtaining a postsecondary degree in Brazil\u27s university system. This assessment can present a dilemma for Black Brazilians that lack the benefits of private education and tutoring enjoyed by many of their White counterparts. The glaring absence of English language pedagogy that reflects the lives of the Afro-Brazilian community further complicates this predicament. By adopting a Content-Based Instruction framework, this project seeks to deliver a culturally sustaining pedagogy that centers the African descendants of Brazil and the United States. Further, the project aims to promote and accelerate English language acquisition by lowering the affective filter among Afro-Brazilian students.
This four-unit English language curriculum traverses the historical and cultural roots of the two largest African Diaspora populations by providing instruction in vocabulary, grammar, pronunciation, reading, speaking, and writing. The project offers the Afro-Brazilian student an immersive and communicative learning experience that utilizes a multimedia approach-print, video, music, and poetry. By mirroring the lived realities of these learners in the TESOL curriculum, this project seeks to bring more Afro-Brazilian students, educators, and researchers into the study of language and linguistics
Predicting the Vote Using Legislative Speech
As most dedicated observers of voting bodies like the U.S. Supreme Court can attest, it is possible to guess vote outcomes based on statements made during deliberations or questioning by the voting members. In most forms of representative democracy, citizens can actively petition or lobby their representatives, and that often means understanding their intentions to vote for or against an issue of interest. In some U.S. state legislators, professional lobby groups and dedicated press members are highly informed and engaged, but the process is basically closed to ordinary citizens because they do not have enough background and familiarity with the issue, the legislator or the entire process. Our working hypothesis is that verbal utterances made during the legislative process by elected representatives can indicate their intent on a future vote, and therefore can be used to automatically predict said vote to a significant degree. In this research, we examine thousands of hours of legislative deliberations from the California state legislatureâs 2015-2016 session to form models of voting behavior for each legislator and use them to train classifiers and predict the votes that occur subsequently. We can achieve legislator vote prediction accuracies as high as 83%. For bill vote prediction, our model can achieve 76% accuracy with an F1 score of 0.83 for balanced bill training data
- âŠ