Search CORE

129,905 research outputs found

Getting Started in Text Mining

Author: Ananiadou
Blaschke
Cohen
Cohen
Craven
Hersh
Hu
Hunter
Jackson
Jenssen
Jurafsky
K. Bretonnel Cohen
Lawrence Hunter
See-Kiong
Shatkay
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Getting Started in Text Mining: Part Two

Author: Gerstein Mark B.
Rzhetsky Andrey
Seringhaus Michael
Publication venue
Publication date: 26/09/2023
Field of study

No abstrac

Knowledge UChicago

Getting Started in Text Mining: Part Two

Author: Andrey Rzhetsky
CE Crangle
DR Swanson
I Spasic
JD Kim
JW Huss III
KB Cohen
L Hirschman
M Fleischman
Mark B. Gerstein
Michael Seringhaus
MV Blagosklonny
NH Shah
Olga G. Troyanskaya
R Kanagasabai
R Mitkov
S Aerts
SM Douglas
W Hersh
Y Sasaki
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Clustering of twitter technology tweets and the impact of stopwords on clusters

Author: Bhagvat Surya
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2011
Field of study

Year of 2010 could be termed as the year in which Twitter became completely mainstream. Twitter, which started as a means of communicating with friends, became much more than its beginning. Now Twitter is used by companies to promote their new products, used by movie industry to promote movies. A lot of advertising and branding is now tied to Twitter and most importantly any breaking news that happens, the first place one goes and tries to find is to search it on Twitter. Be it the Mumbai attacks that happened in 2008, or the minor earthquakes that happened in Bay Area in 2010 or the twitter revolution cause of the Iran elections, most of the tech and not so tech savvy viewers were following twitter rather than any main stream news channels. In fact most of the breaking news now comes on Twitter because of the huge number of user base rather than the traditional mainstream media. The focus of this paper is clustering with the TF-IDF weighted mechanism of daily technology news tweets of prominent bloggers and news sites using Apache Mahout and to evaluate the effects of introducing and removing stop words on the quality of clustering. This project restricts itself to only tweets in the English language

SJSU ScholarWorks

Mining Images in Biomedical Publications: Detection and Analysis of Gel Diagrams

Author: Krauthammer Michael
Kuhn Tobias
Luong ThaiBinh
Nagy Mate Levente
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Authors of biomedical publications use gel images to report experimental results such as protein-protein interactions or protein expressions under different conditions. Gel images offer a concise way to communicate such findings, not all of which need to be explicitly discussed in the article text. This fact together with the abundance of gel images and their shared common patterns makes them prime candidates for automated image mining and parsing. We introduce an approach for the detection of gel images, and present a workflow to analyze them. We are able to detect gel segments and panels at high accuracy, and present preliminary results for the identification of gene names in these images. While we cannot provide a complete solution at this point, we present evidence that this kind of image mining is feasible.Comment: arXiv admin note: substantial text overlap with arXiv:1209.148

arXiv.org e-Print Archive

Repository for Publications and Research Data

Springer - Publisher Connector

PubMed Central