19 research outputs found
Retrievability in an Integrated Retrieval System: An Extended Study
Retrievability measures the influence a retrieval system has on the access to
information in a given collection of items. This measure can help in making an
evaluation of the search system based on which insights can be drawn. In this
paper, we investigate the retrievability in an integrated search system
consisting of items from various categories, particularly focussing on
datasets, publications \ijdl{and variables} in a real-life Digital Library
(DL). The traditional metrics, that is, the Lorenz curve and Gini coefficient,
are employed to visualize the diversity in retrievability scores of the
\ijdl{three} retrievable document types (specifically datasets, publications,
and variables). Our results show a significant popularity bias with certain
items being retrieved more often than others. Particularly, it has been shown
that certain datasets are more likely to be retrieved than other datasets in
the same category. In contrast, the retrievability scores of items from the
variable or publication category are more evenly distributed. We have observed
that the distribution of document retrievability is more diverse for datasets
as compared to publications and variables.Comment: To appear in International Journal on Digital Libraries (IJDL). arXiv
admin note: substantial text overlap with arXiv:2205.0093
A Comparative Analysis of Retrievability and PageRank Measures
The accessibility of documents within a collection holds a pivotal role in
Information Retrieval, signifying the ease of locating specific content in a
collection of documents. This accessibility can be achieved via two distinct
avenues. The first is through some retrieval model using a keyword or other
feature-based search, and the other is where a document can be navigated using
links associated with them, if available. Metrics such as PageRank, Hub, and
Authority illuminate the pathways through which documents can be discovered
within the network of content while the concept of Retrievability is used to
quantify the ease with which a document can be found by a retrieval model. In
this paper, we compare these two perspectives, PageRank and retrievability, as
they quantify the importance and discoverability of content in a corpus.
Through empirical experimentation on benchmark datasets, we demonstrate a
subtle similarity between retrievability and PageRank particularly
distinguishable for larger datasets.Comment: Accepted at FIRE 202
Automated Attribute Extraction from Legal Proceedings
The escalating number of pending cases is a growing concern world-wide.
Recent advancements in digitization have opened up possibilities for leveraging
artificial intelligence (AI) tools in the processing of legal documents.
Adopting a structured representation for legal documents, as opposed to a mere
bag-of-words flat text representation, can significantly enhance processing
capabilities. With the aim of achieving this objective, we put forward a set of
diverse attributes for criminal case proceedings. We use a state-of-the-art
sequence labeling framework to automatically extract attributes from the legal
documents. Moreover, we demonstrate the efficacy of the extracted attributes in
a downstream task, namely legal judgment prediction.Comment: Presented in Mining and Learning in the Legal Domain (MLLD) workshop
202
Leveraging hierarchical self-assembly pathways for realizing colloidal photonic crystals
Colloidal open crystals are attractive materials,
especially for their photonic applications. Self-assembly appeals
as a bottom-up route for structure fabrication, but self-assembly
of colloidal open crystals has proven to be elusive for their
mechanical instability due to being low-coordinated. For such a
bottom-up route to yield a desired colloidal open crystal, the
target structure is required to be thermodynamically favored for
designer building blocks and also kinetically accessible via self-
assembly pathways in preference to metastable structures. Additionally, the selection of a particular polymorph poses a challenge for certain much sought-after colloidal open crystals for their applications as photonic crystals. Here, we devise hierarchical self-assembly pathways, which, starting from designer triblock patchy particles, yield in a cascade of well-separated associations first tetrahedral clusters and then tetrastack crystals. The designed pathways avoid trapping into an amorphous phase. Our analysis reveals how such a two-stage self-assembly pathway via tetrahedral clusters promotes crystallization by suppressing five- and seven-membered rings that hinder the emergence of the ordered structure. We also find that slow annealing promotes a bias toward the cubic polymorph relative to the hexagonal counterpart. Finally, we calculate the photonic band structures, showing that the cubic polymorph exhibits a complete photonic band gap for the dielectric filling fraction directly realizable from the designer triblock patchy particles. Unexpectedly, we find that the hexagonal polymorph also supports a complete photonic band gap, albeit only for an increased filling fraction, which can be realized via postassembly processing
LeDA: a system for legal data annotation
This paper presents LeDA, a system for Legal Data Annotation. The system offers the functionality of annotating and categorising text spans representing legal concepts that capture the topic of a document, and also supports a meta-annotator to adjudicate the ground truth created by different annotators. Notably, our system supports a dynamic update of the ontology by enabling the creation of new legal concepts. Currently employed to annotate key legal concepts, LeDA aims to construct concept-based semantic representations for tasks such as similar case retrieval, and judgment prediction
Genome wide association study of uric acid in Indian population and interaction of identified variants with type 2 diabetes
Abnormal level of Serum Uric Acid (SUA) is an important marker and risk factor for complex diseases including Type 2 diabetes. Since genetic determinant of uric acid in Indians is totally unexplored, we tried to identify common variants associated with SUA in Indians using Genome Wide Association Study (GWAS). Association of five known variants in SLC2A9 and SLC22A11 genes with SUA level in 4,834 normoglycemics (1,109 in discovery and 3,725 in validation phase) was revealed with different effect size in Indians compared to other major ethnic population of the world. Combined analysis of 1,077 T2DM subjects (772 in discovery and 305 in validation phase) and normoglycemics revealed additional GWAS signal in ABCG2 gene. Differences in effect sizes of ABCG2 and SLC2A9 gene variants were observed between normoglycemics and T2DM patients. We identified two novel variants near long non-coding RNA genes AL356739.1 and AC064865.1 with nearly genome wide significance level. Meta-analysis and in silico replication in 11,745 individuals from AUSTWIN consortium improved association for rs12206002 in AL356739.1 gene to sub-genome wide association level. Our results extends association of SLC2A9, SLC22A11 and ABCG2 genes with SUA level in Indians and enrich the assemblages of evidence for SUA level and T2DM interrelationship