10,140 research outputs found
Legal Information and the Development of American Law: Writings on the Form and Structure of the Published Law
Robert C. Berring\u27s writings about the impacts of electronic databases, the Internet, and other communications technologies on legal research and practice are an essential part of a larger literature that explores the ways in which the forms and structures of published legal information have influenced how American lawyers think about the law. This paper reviews Berring\u27s writings, along with those of other writers concerned with these questions, focusing on the implications of Berring\u27s idea that in the late nineteenth century American legal publishers created a conceptual universe of thinkable thoughts through which U.S. lawyers came to view the law. It concludes that, spurred by Berring and others, the literature of legal information has become far reaching in scope and interdisciplinary in approach, while the themes struck in Berring\u27s work continue to inform the scholarship of newer writers
Tagging, Folksonomy & Co - Renaissance of Manual Indexing?
This paper gives an overview of current trends in manual indexing on the Web.
Along with a general rise of user generated content there are more and more
tagging systems that allow users to annotate digital resources with tags
(keywords) and share their annotations with other users. Tagging is frequently
seen in contrast to traditional knowledge organization systems or as something
completely new. This paper shows that tagging should better be seen as a
popular form of manual indexing on the Web. Difference between controlled and
free indexing blurs with sufficient feedback mechanisms. A revised typology of
tagging systems is presented that includes different user roles and knowledge
organization systems with hierarchical relationships and vocabulary control. A
detailed bibliography of current research in collaborative tagging is included.Comment: Preprint. 12 pages, 1 figure, 54 reference
Challenging Ubiquitous Inverted Files
Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach
Universal Indexes for Highly Repetitive Document Collections
Indexing highly repetitive collections has become a relevant problem with the
emergence of large repositories of versioned documents, among other
applications. These collections may reach huge sizes, but are formed mostly of
documents that are near-copies of others. Traditional techniques for indexing
these collections fail to properly exploit their regularities in order to
reduce space.
We introduce new techniques for compressing inverted indexes that exploit
this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar
compression of the differential inverted lists, instead of the usual practice
of gap-encoding them. We show that, in this highly repetitive setting, our
compression methods significantly reduce the space obtained with classical
techniques, at the price of moderate slowdowns. Moreover, our best methods are
universal, that is, they do not need to know the versioning structure of the
collection, nor that a clear versioning structure even exists.
We also introduce compressed self-indexes in the comparison. These are
designed for general strings (not only natural language texts) and represent
the text collection plus the index structure (not an inverted index) in
integrated form. We show that these techniques can compress much further, using
a small fraction of the space required by our new inverted indexes. Yet, they
are orders of magnitude slower.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Special Libraries, December 1977
Volume 68, Issue 12https://scholarworks.sjsu.edu/sla_sl_1977/1008/thumbnail.jp
- …