19 research outputs found

    Retrievability in an Integrated Retrieval System: An Extended Study

    Full text link
    Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this paper, we investigate the retrievability in an integrated search system consisting of items from various categories, particularly focussing on datasets, publications \ijdl{and variables} in a real-life Digital Library (DL). The traditional metrics, that is, the Lorenz curve and Gini coefficient, are employed to visualize the diversity in retrievability scores of the \ijdl{three} retrievable document types (specifically datasets, publications, and variables). Our results show a significant popularity bias with certain items being retrieved more often than others. Particularly, it has been shown that certain datasets are more likely to be retrieved than other datasets in the same category. In contrast, the retrievability scores of items from the variable or publication category are more evenly distributed. We have observed that the distribution of document retrievability is more diverse for datasets as compared to publications and variables.Comment: To appear in International Journal on Digital Libraries (IJDL). arXiv admin note: substantial text overlap with arXiv:2205.0093

    A Comparative Analysis of Retrievability and PageRank Measures

    Full text link
    The accessibility of documents within a collection holds a pivotal role in Information Retrieval, signifying the ease of locating specific content in a collection of documents. This accessibility can be achieved via two distinct avenues. The first is through some retrieval model using a keyword or other feature-based search, and the other is where a document can be navigated using links associated with them, if available. Metrics such as PageRank, Hub, and Authority illuminate the pathways through which documents can be discovered within the network of content while the concept of Retrievability is used to quantify the ease with which a document can be found by a retrieval model. In this paper, we compare these two perspectives, PageRank and retrievability, as they quantify the importance and discoverability of content in a corpus. Through empirical experimentation on benchmark datasets, we demonstrate a subtle similarity between retrievability and PageRank particularly distinguishable for larger datasets.Comment: Accepted at FIRE 202

    Automated Attribute Extraction from Legal Proceedings

    Full text link
    The escalating number of pending cases is a growing concern world-wide. Recent advancements in digitization have opened up possibilities for leveraging artificial intelligence (AI) tools in the processing of legal documents. Adopting a structured representation for legal documents, as opposed to a mere bag-of-words flat text representation, can significantly enhance processing capabilities. With the aim of achieving this objective, we put forward a set of diverse attributes for criminal case proceedings. We use a state-of-the-art sequence labeling framework to automatically extract attributes from the legal documents. Moreover, we demonstrate the efficacy of the extracted attributes in a downstream task, namely legal judgment prediction.Comment: Presented in Mining and Learning in the Legal Domain (MLLD) workshop 202

    Leveraging hierarchical self-assembly pathways for realizing colloidal photonic crystals

    Get PDF
    Colloidal open crystals are attractive materials, especially for their photonic applications. Self-assembly appeals as a bottom-up route for structure fabrication, but self-assembly of colloidal open crystals has proven to be elusive for their mechanical instability due to being low-coordinated. For such a bottom-up route to yield a desired colloidal open crystal, the target structure is required to be thermodynamically favored for designer building blocks and also kinetically accessible via self- assembly pathways in preference to metastable structures. Additionally, the selection of a particular polymorph poses a challenge for certain much sought-after colloidal open crystals for their applications as photonic crystals. Here, we devise hierarchical self-assembly pathways, which, starting from designer triblock patchy particles, yield in a cascade of well-separated associations first tetrahedral clusters and then tetrastack crystals. The designed pathways avoid trapping into an amorphous phase. Our analysis reveals how such a two-stage self-assembly pathway via tetrahedral clusters promotes crystallization by suppressing five- and seven-membered rings that hinder the emergence of the ordered structure. We also find that slow annealing promotes a bias toward the cubic polymorph relative to the hexagonal counterpart. Finally, we calculate the photonic band structures, showing that the cubic polymorph exhibits a complete photonic band gap for the dielectric filling fraction directly realizable from the designer triblock patchy particles. Unexpectedly, we find that the hexagonal polymorph also supports a complete photonic band gap, albeit only for an increased filling fraction, which can be realized via postassembly processing

    LeDA: a system for legal data annotation

    Get PDF
    This paper presents LeDA, a system for Legal Data Annotation. The system offers the functionality of annotating and categorising text spans representing legal concepts that capture the topic of a document, and also supports a meta-annotator to adjudicate the ground truth created by different annotators. Notably, our system supports a dynamic update of the ontology by enabling the creation of new legal concepts. Currently employed to annotate key legal concepts, LeDA aims to construct concept-based semantic representations for tasks such as similar case retrieval, and judgment prediction

    Genome wide association study of uric acid in Indian population and interaction of identified variants with type 2 diabetes

    Get PDF
    Abnormal level of Serum Uric Acid (SUA) is an important marker and risk factor for complex diseases including Type 2 diabetes. Since genetic determinant of uric acid in Indians is totally unexplored, we tried to identify common variants associated with SUA in Indians using Genome Wide Association Study (GWAS). Association of five known variants in SLC2A9 and SLC22A11 genes with SUA level in 4,834 normoglycemics (1,109 in discovery and 3,725 in validation phase) was revealed with different effect size in Indians compared to other major ethnic population of the world. Combined analysis of 1,077 T2DM subjects (772 in discovery and 305 in validation phase) and normoglycemics revealed additional GWAS signal in ABCG2 gene. Differences in effect sizes of ABCG2 and SLC2A9 gene variants were observed between normoglycemics and T2DM patients. We identified two novel variants near long non-coding RNA genes AL356739.1 and AC064865.1 with nearly genome wide significance level. Meta-analysis and in silico replication in 11,745 individuals from AUSTWIN consortium improved association for rs12206002 in AL356739.1 gene to sub-genome wide association level. Our results extends association of SLC2A9, SLC22A11 and ABCG2 genes with SUA level in Indians and enrich the assemblages of evidence for SUA level and T2DM interrelationship
    corecore