Search CORE

687,722 research outputs found

Recommended from our members

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Author: Butte Atul J
Fan Xuancheng
Glicksberg Benjamin S
Goldstein Theodore
Ludwig Dana
Muenzen Kathleen
Norgeot Beau
Oskotsky Boris
Peterson Thomas A
Rutenberg Eugenia
Schenk Gundolf
Schmajuk Gabriela
Sirota Marina
Yazdany Jinoos
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods

eScholarship - University of California

Classification of Occluded Objects using Fast Recurrent Processing

Author: Yilmaz Ozgur
Publication venue
Publication date: 06/05/2015
Field of study

Recurrent neural networks are powerful tools for handling incomplete data problems in computer vision, thanks to their significant generative capabilities. However, the computational demand for these algorithms is too high to work in real time, without specialized hardware or software solutions. In this paper, we propose a framework for augmenting recurrent processing capabilities into a feedforward network without sacrificing much from computational efficiency. We assume a mixture model and generate samples of the last hidden layer according to the class decisions of the output layer, modify the hidden layer activity using the samples, and propagate to lower layers. For visual occlusion problem, the iterative procedure emulates feedforward-feedback loop, filling-in the missing hidden layer activity with meaningful representations. The proposed algorithm is tested on a widely used dataset, and shown to achieve 2

\times

improvement in classification accuracy for occluded objects. When compared to Restricted Boltzmann Machines, our algorithm shows superior performance for occluded object classification.Comment: arXiv admin note: text overlap with arXiv:1409.8576 by other author

arXiv.org e-Print Archive

Crossref

Problematika dan Analisis Kecurangan untuk Menurunkan Similarity yang Tidak Terdeteksi oleh similarity tool

Author: Hidayati Ika Septi
Putri Prihastini Oktasari
Widodo Sri Adi
Publication venue: 'Universitas Sarjanawiyata Tamansiswa - Faculty of Economics'
Publication date: 13/05/2023
Field of study

This research aims to describe the problem of cheating in reducing similarity using Turnitin software in writing scientific articles, as well as the factors causing cheating problems carried out by students. The research method used is qualitative with a case study approach. Data collection techniques include observation, documentation, and interviews with students taking the scientific paper course. The sample consists of students taking the scientific paper course with focus on research mathematics in Open University, Study Program X, using purposive sampling technique. The research findings indicate, there are cheating techniques used to reduce similarity that are undetectable by Turnitin: (1) adjusting spaces in a text that is too large or too small, (2) converting text files into images, (3) adding specific letters to the manuscript, (4) inserting specific small-sized numbers that are almost invisible, (5) intentionally making typing errors, and (6) adding specific symbols to the scientific article manuscript

Journal Universitas Sarjanawiyata Tamansiswa (UST)

Matching of Descriptive Labels to Glossary Descriptions

Author: Takahashi Toshihiro
Tateishi Takaaki
Tatsubori Michiaki
Publication venue
Publication date: 27/10/2023
Field of study

Semantic text similarity plays an important role in software engineering tasks in which engineers are requested to clarify the semantics of descriptive labels (e.g., business terms, table column names) that are often consists of too short or too generic words and appears in their IT systems. We formulate this type of problem as a task of matching descriptive labels to glossary descriptions. We then propose a framework to leverage an existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization where the former is a method to retrieve sentences relevant to a given label and the latter is a method to compute similarity between two contexts each of which is derived from a set of texts (e.g., column names in the same table). We performed an experiment on two datasets derived from publicly available data sources. The result indicated that the proposed methods helped the underlying STS correctly match more descriptive labels with the descriptions

arXiv.org e-Print Archive

Estudio sobre la información de texto contenida en imágenes web

Author: Robles Sergi
Universitat Autònoma de Barcelona. Escola d'Enginyeria
Universitat Autònoma de Barcelona. Escola Tècnica Superior d'Enginyeria
Publication venue
Publication date: 01/01/2009
Field of study

La indexació i la recerca de pàgines web es basa en l'anàlisi de text. La tecnologia actual encara no pot processar d'una manera eficient i suficientment ràpida el text contingut a les imatges de les pàgines web. Aquest fet planteja un problema important d'indexació però també d'inaccessibilitat. Per poder quantificar aquest problema hem desenvolupat una aplicació software que ens permet realitzar un estudi sobre aquesta situació. Hem utilitzat aquest software per analitzar un conjunt de pàgines web representatives de la situació actual a Internet. Aquests resultats obtinguts s'han analitzat i comparat amb estudis anteriors.La indexación y la búsqueda de páginas web se basan en el análisis de texto. La tecnología actual, aún no puede procesar de una manera eficiente y suficientemente rápida el texto contenido en las imágenes de las páginas WWW. Este hecho plantea un problema importante de indexación pero también de inaccesibilidad. Para poder cuantificar este problema hemos desarrollado una aplicación software que nos permite realizar un estudio sobre esta situación. Hemos utilizado este software para analizar un conjunto de páginas web representativas de la situación actual en Internet. Estos resultados obtenidos se han analizado y comparado con estudios anteriores.Indexing and searching for WWW pages is relying on analyzing text. Current technology cannot process in an efficient way and quickly enough the text embedded in images on WWW pages. This fact is a significant indexing problem but inaccessibility too. To quantify this problem we have developed a software application that allows us to conduct a study on this. We have used this software to analyze a set of web pages representing the current Internet situation. These results have been analyzed and compared with previous studies.Nota: Aquest document conté originàriament altre material i/o programari només consultable a la Biblioteca de Ciència i Tecnologia

Diposit Digital de Documents de la UAB

Analysis of the Correlation Between the Lexical Profile and Coh-Metrix 3.0 Text Easability and Readability Indices of the Korean CSAT From 1994–2022

Author: Howie Andrew
Publication venue: ScholarWorks@CWU
Publication date: 01/01/2022
Field of study

The Korean College Scholastic Ability Test (CSAT) is a highly competitive standardized assessment that graduating high-school seniors complete in the hope of getting a good score which will improve their chances of admission to a university of choice. The CSAT contains an English Section that has been described by scholars and educators alike as being far too difficult for the official English language curriculum to serve as sufficient preparation. The test’s lack of construct validity has been the basis for calls to revise the test to be better reflective of the school curriculum so that it can serve the evaluative purpose for which it is intended. Use of automated text evaluation methods with the software Coh-Metrix 3.0 in recent years has allowed scholars to quantify different dimensions of the text of the CSAT English Section, such as cohesion and syntactic complexity, that contribute to its reading difficulty. Older research conducted before the introduction of this software into the field used word frequency counts in large corpora such as the British National Corpus (BNC) as a measure of word familiarity or unfamiliarity, which was thought to directly contribute to difficulty because as the proportion of low-frequency words in a text increases against the proportion of high-frequency words, the word knowledge burden of the text increases in proportion. Since the introduction of automated software-based tools like Coh-Metrix 3.0 and Lexical Complexity Analyzer (LCA), these corpus-based research methods have largely fallen by the wayside. In this paper, I maintain that despite its lower sophistication, corpus-based lexical analysis can still produce uniquely meaningful findings because of the degree of manual control the researcher is afforded in calibrating the parameters of the text base and, most importantly, in selecting the ranges of word family frequency that are best tailored to a text rather than having the ranges or functions of frequency assigned automatically by software. This study reports correlations between the outputs of these two methodologies that both inform us about the validity of Coh-Metrix 3.0’s use in CSAT studies and quantify the strength of the role of word frequency in causing the excessive difficulty of the CSAT English Section

ScholarWorks at Central Washington University

Japanese-English Parallel Corpora in the Classroom : Applications and Challenges

Author: Michael P. McGuire
Publication venue: 関西外国語大学・関西外国語大学短期大学部
Publication date: 01/03/2018
Field of study

Computerized corpora have given linguists crucial new insights on the usage of language. With the help of software, it is possible to index the words which appear in a large collection of text and analyze word usage and frequency. Data Driven Learning looks at how students can benefit from their own direct use of corpora. While monolingual corpora have a steep learning curve and are often too difficult for language learners, a solution to this problem may be found in bilingual parallel corpora, which are built from authentically translated text. This article looks at Eijiro on the WEB and Weblio, two online Japanese-English parallel corpus based websites. Some guided practice exercises developed by the author for use in university level English language writing classes in Japan are discussed, and some of the challenges in training students to use these resources to improve their English language writing are presented

Kansai Gaidai University Repository

Institutional Repositories DataBase (IRDB)

Mobile touch interfaces for the elderly

Author: Roger Stone (1256976)
Publication venue
Publication date: 01/01/2008
Field of study

Elderly people are not averse to buying and using electronic gadgets. However regarding certain devices there is a persistent complaint about the "buttons being too small". Therefore the arrival of mobile touch devices like the iPhone and iPod Touch should be able to circumvent that problem because the button size and arrangement is under software control. However these devices have some accessibility issues which are identified. The accessibility issues stem from the one-size-fits-all concept. A solution is proposed which involves having a range of interface styles. A new user gesture called the shake is proposed to switch between interface styles. A separate investigation is made into the different possibilities for free-text entry

Loughborough University Institutional Repository