2 research outputs found
Automatic building up documents taxonomy through metadata analysis
In cooperation with CNIPA (the Italian Authority for the use
of ICT’s in the Public Administration), we studied and developed
a new solution for the effective access to legal data, especially
law texts, norms and rules. Such information represented in XML
based and structured documents - is available also at the section
or paragraph level. We are experiencing this kind of system
within the civil data status, because a project of vertical research,
structured on a semantic level, allows the collection of information
and the building of a body of uniform rules. The system is
based on a statistical similarities relationship and it gives to user
the capability to consider also information which, even if not immediately
returned as a result of the query resolution, could however
be interesting related to the user information needs, because
it discovers new information and relationships with in the set of
documents. The system provides the usual functionalities of ad
hoc retrieval of laws, sections and paragraphs of interest, implemented
by means of XML-retrieval techniques, but it also, given
the text of a certain law, applies document similarity algorithms
to derive section or paragraph, the set of paragraphs where the
sections and laws are included which, probably, treat the same
subject. Furthermore, by performing a suitable text parsing, the
system extracts from each document all explicit references to different
laws (and even the references to sections and paragraphs).
In this way the system is able, in response to a given query, to return
not only all laws (and the corresponding sections and paragraphs)
which may be relevant to the specified subject, but also,
for each returned law, a set of laws (sections, paragraphs) which
are either explicitly (by means of explicit reference in the text) or
implicitly (by statistical similarity) related to it. Then these items
are ranked by applying a suitable, user tunable, function of both
explicit (in a link analysis style) and implicit referent. Applying
iteratively the same approach to each considered law, section or
paragraph, the user is able to browse within the given document
corpus, moving according to the presence of significant (explicit
or implicit) relationships among text items. This search technology
employs a new class of database designed for exploring information,
not just managing transactions, but it lets users prioritize
and personalize their choices, rather than directing them down a
classification path. Now users can find what they are looking for,
and discover new information and relationships