46,398 research outputs found
Using IR techniques to improve Automated Text Classification
This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively.
The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures)
Report on the Information Retrieval Festival (IRFest2017)
The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Grand Challenges of Traceability: The Next Ten Years
In 2007, the software and systems traceability community met at the first
Natural Bridge symposium on the Grand Challenges of Traceability to establish
and address research goals for achieving effective, trustworthy, and ubiquitous
traceability. Ten years later, in 2017, the community came together to evaluate
a decade of progress towards achieving these goals. These proceedings document
some of that progress. They include a series of short position papers,
representing current work in the community organized across four process axes
of traceability practice. The sessions covered topics from Trace Strategizing,
Trace Link Creation and Evolution, Trace Link Usage, real-world applications of
Traceability, and Traceability Datasets and benchmarks. Two breakout groups
focused on the importance of creating and sharing traceability datasets within
the research community, and discussed challenges related to the adoption of
tracing techniques in industrial practice. Members of the research community
are engaged in many active, ongoing, and impactful research projects. Our hope
is that ten years from now we will be able to look back at a productive decade
of research and claim that we have achieved the overarching Grand Challenge of
Traceability, which seeks for traceability to be always present, built into the
engineering process, and for it to have "effectively disappeared without a
trace". We hope that others will see the potential that traceability has for
empowering software and systems engineers to develop higher-quality products at
increasing levels of complexity and scale, and that they will join the active
community of Software and Systems traceability researchers as we move forward
into the next decade of research
Grand Challenges of Traceability: The Next Ten Years
In 2007, the software and systems traceability community met at the first
Natural Bridge symposium on the Grand Challenges of Traceability to establish
and address research goals for achieving effective, trustworthy, and ubiquitous
traceability. Ten years later, in 2017, the community came together to evaluate
a decade of progress towards achieving these goals. These proceedings document
some of that progress. They include a series of short position papers,
representing current work in the community organized across four process axes
of traceability practice. The sessions covered topics from Trace Strategizing,
Trace Link Creation and Evolution, Trace Link Usage, real-world applications of
Traceability, and Traceability Datasets and benchmarks. Two breakout groups
focused on the importance of creating and sharing traceability datasets within
the research community, and discussed challenges related to the adoption of
tracing techniques in industrial practice. Members of the research community
are engaged in many active, ongoing, and impactful research projects. Our hope
is that ten years from now we will be able to look back at a productive decade
of research and claim that we have achieved the overarching Grand Challenge of
Traceability, which seeks for traceability to be always present, built into the
engineering process, and for it to have "effectively disappeared without a
trace". We hope that others will see the potential that traceability has for
empowering software and systems engineers to develop higher-quality products at
increasing levels of complexity and scale, and that they will join the active
community of Software and Systems traceability researchers as we move forward
into the next decade of research
- …