159 research outputs found
Do Open Access Articles Have a Greater Research Impact?
While many authors believe that their work has a greater research impact if it is freely available, studies to demonstrate that impact are few. This study looks at articles in four disciplines at varying stages of adoption of open accessâphilosophy, political science, electrical and electronic engineering and mathematicsâto see if they have a greater impact, as measured by citations in the ISI Web of Science database, if their authors make them freely available on the Internet. The finding is that, across all four disciplines, freely available articles do have a greater research impact. Shedding light on this category of open access reveals that scholars in diverse disciplines are both adopting open access practices and being rewarded for it
The Most Influential Paper Gerard Salton Never Wrote
Gerard Salton is often credited with developing the vector space model
(VSM) for information retrieval (IR). Citations to Salton give the impression
that the VSM must have been articulated as an IR model sometime between
1970 and 1975. However, the VSM as it is understood today evolved over a
longer time period than is usually acknowledged, and an articulation of the
model and its assumptions did not appear in print until several years after
those assumptions had been criticized and alternative models proposed. An
often cited overview paper titled ???A Vector Space Model for Information
Retrieval??? (alleged to have been published in 1975) does not exist, and
citations to it represent a confusion of two 1975 articles, neither of which
were overviews of the VSM as a model of information retrieval. Until the
late 1970s, Salton did not present vector spaces as models of IR generally
but rather as models of specifi c computations. Citations to the phantom
paper refl ect an apparently widely held misconception that the operational
features and explanatory devices now associated with the VSM must have
been introduced at the same time it was fi rst proposed as an IR model.published or submitted for publicatio
Do Open Access Articles Have a Greater Research Impact?
While many authors believe that their work has a greater research impact if it is freely available, studies to demonstrate that impact are few. This study looks at articles in four disciplines at varying stages of adoption of open accessâphilosophy, political science, electrical and electronic engineering and mathematicsâto see if they have a greater impact, as measured by citations in the ISI Web of Science database, if their authors make them freely available on the Internet. The finding is that, across all four disciplines, freely available articles do have a greater research impact. Shedding light on this category of open access reveals that scholars in diverse disciplines are both adopting open access practices and being rewarded for it
Social impact retrieval: measuring author inïŹuence on information retrieval
The increased presence of technologies collectively referred to as Web 2.0 mean the entire process of new media production and dissemination has moved away from an
authorcentric approach. Casual web users and browsers are increasingly able to play a more active role in the information creation process. This means that the traditional ways in which information sources may be validated and scored must adapt accordingly.
In this thesis we propose a new way in which to look at a user's contributions to the network in which they are present, using these interactions to provide a measure of
authority and centrality to the user. This measure is then used to attribute an query-independent interest score to each of the contributions the author makes, enabling us
to provide other users with relevant information which has been of greatest interest to a community of like-minded users. This is done through the development of two
algorithms; AuthorRank and MessageRank.
We present two real-world user experiments which focussed around multimedia annotation and browsing systems that we built; these systems were novel in themselves, bringing together video and text browsing, as well as free-text annotation. Using these systems as examples of real-world applications for our approaches, we then look at a
larger-scale experiment based on the author and citation networks of a ten year period of the ACM SIGIR conference on information retrieval between 1997-2007. We use the
citation context of SIGIR publications as a proxy for annotations, constructing large social networks between authors. Against these networks we show the eïŹectiveness of
incorporating user generated content, or annotations, to improve information retrieval
Argumentative zoning information extraction from scientific text
Let me tell you, writing a thesis is not always a barrel of laughsâand strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope
The Role of Document Structure and Citation Analysis in Literature Information Retrieval
Literature Information Retrieval (IR) is the task of searching relevant publications given a particular information need expressed as a set of queries. With the staggering growth of scientific literature, it is critical to design effective retrieval solutions to facilitate efficient access to them. We hypothesize that particular genre specific characteristics of scientific literature such as metadata and citations are potentially helpful for enhancing scientific literature search. We conducted systematic and extensive IR experiments on open information retrieval test collections to investigate their roles in enhancing literature information retrieval effectiveness. This thesis consists of three major parts of studies. First, we examined the role of document structure in literature search through comprehensive studies on the retrieval effectiveness of a set of structure-aware retrieval models on ad hoc scientific literature search tasks. Second, under the language modeling retrieval framework, we studied exploiting citation and co-citation analysis results as sources of evidence for enhancing literature search. Specifically, we examined relevant document distribution patterns over partitioned clusters of document citation and co-citation graphs; we examined seven ways of modeling document prior probabilities of being relevant based on document citation and co-citation analysis; we studied the effectiveness of boosting retrieved documents with scores of their neighborhood documents in terms co-citation counts, co-citation similarities and Howard White's pennant scores. Third, we combined both structured retrieval features and citation related features in developing machine learned retrieval models for literatures search and assessed the effectiveness of learning to rank algorithms and various literature-specific features. Our major findings are as follows. State-of-the-art structure-ware retrieval models though reportedly perform well in known item finding tasks do not significantly outperform non-fielded baseline retrieval models in ad hoc literature information retrieval. Though relevant document distributions over citation and co-citation network graph partitions reveal favorable pattern, citation and co-citation analysis results on the current iSearch test collection only modestly improve retrieval effectiveness. However, priors derived from co-citation analysis outperform that derived from citation analysis, and pennant score for document expansion outperforms raw co-citation count or cosine similarity of co-citation counts. Our learning to rank experiments show that in a heterogeneous collection setting, citation related features can significantly outperform baselines.Ph.D., Information Studies -- Drexel University, 201
NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.
This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd
Evaluation of the Effectiveness of Cosine Similarity in Predicting Relevance between Paired Citing and Cited Sentences.
Citation analysis has a long history in Information Science. We examined the potential of cosine similarity to predict relevance between citing sentences and the articles they cite. An expert evaluated 22,697 pairs of cited and citing sentences, and marked 544 as relevant to one another. Cosine similarity gave 8386 of these pairs a similarity score over zero, which included 339 relevant pairs. (4% precision, 65% recall). Under 0.01% of each cited article was relevant to the citing sentence, making precise retrieval challenging. We performed a detailed error analysis. Cosine similarity performance was reduced by insufficient window size, affixes, hyphenation, acronyms and abbreviations. The following preprocessing steps would improve retrieval performance: using a stemming algorithm that accounts for prefixes, expanding the window of comparison from sentences to paragraphs, identifying synonyms and expanding abbreviations. Further investigation of the possibilities of cosine similarity is necessary, but such investigation is worth pursuit
- âŠ