361 research outputs found

    A Framework for Comparing Groups of Documents

    Full text link
    We present a general framework for comparing multiple groups of documents. A bipartite graph model is proposed where document groups are represented as one node set and the comparison criteria are represented as the other node set. Using this model, we present basic algorithms to extract insights into similarities and differences among the document groups. Finally, we demonstrate the versatility of our framework through an analysis of NSF funding programs for basic research.Comment: 6 pages; 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15

    Topic Similarity Networks: Visual Analytics for Large Document Sets

    Full text link
    We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014

    Mining Measured Information from Text

    Full text link
    We present an approach to extract measured information from text (e.g., a 1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such extractions are critically important across a wide range of domains - especially those involving search and exploration of scientific and technical documents. We first propose a rule-based entity extractor to mine measured quantities (i.e., a numeric value paired with a measurement unit), which supports a vast and comprehensive set of both common and obscure measurement units. Our method is highly robust and can correctly recover valid measured quantities even when significant errors are introduced through the process of converting document formats like PDF to plain text. Next, we describe an approach to extracting the properties being measured (e.g., the property "pixel pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we present MQSearch: the realization of a search engine with full support for measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15

    CausalNLP: A Practical Toolkit for Causal Inference with Text

    Full text link
    The vast majority of existing methods and systems for causal inference assume that all variables under consideration are categorical or numerical (e.g., gender, price, blood pressure, enrollment). In this paper, we present CausalNLP, a toolkit for inferring causality from observational data that includes text in addition to traditional numerical and categorical variables. CausalNLP employs the use of meta-learners for treatment effect estimation and supports using raw text and its linguistic properties as both a treatment and a "controlled-for" variable (e.g., confounder). The library is open-source and available at: https://github.com/amaiya/causalnlp.Comment: 7 page

    Exploratory Analysis of Highly Heterogeneous Document Collections

    Full text link
    We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    New mixed ligand complexes of ruthenium(II) that incorporate a modified phenanthroline ligand: synthesis, spectral characterization and DNA binding

    Get PDF
    The hexafluorophosphate and chloride salts of two ruthenium(II) complexes, viz. [Ru(phen)(ptzo)2]2 and [Ru(ptzo)3]2+, where ptzo = 1,10-phenanthrolino[5,6-e]1,2,4-triazine-3-one (ptzo) - a new modified phenanthroline (phen) ligand, have been synthesised. These complexes have been characterised by infrared, UV-Vis, steady-state emission and1H NMR spectroscopic methods. Results of absorption and fluorescence titration as well as thermal denaturation studies reveal that both thebis- and tris-complexes of ptzo show moderately strong affinity for binding with calf thymus (CT) DNA with the binding constants being close to 105M-1 in each case. An intercalative mode of DNA binding has been suggested for both the complexes. Emission studies carried out in non-aqueous solvents and in aqueous media without DNA reveal that both [Ru(phen)(ptzo)2]2+ and [Ru(ptzo)3]2+ are weakly luminescent under these solution conditions. Successive addition of CT DNA to buffered aqueous solutions containing [Ru(phen)(ptzo)2]2+results in an enhancement of the emission. These results have been discussed in the light of the dependence of the structure-specific deactivation processes of the MLCT state of the metallo-intercalator with the characteristic features of its DNA interaction. In doing so, attempts have been made to compare and contrast its properties with those of the analogous phenanthroline-based complexes including the ones reported by us previously

    Correlation of the Shift of the Axis of the Tibia with the Knee Insability

    Get PDF
    Mechanical Engineerin

    Cholate-interspersed porphyrin-anthraquinone conjugates: photonuclease activity of large sized, 'tweezer-like' molecules

    Get PDF
    In a new approach towards the development of a 'dual-wavelength dual-mechanism' type of photosensitizer for use in photodynamic therapy (PDT), covalently linked bichromophoric systems comprising of porphyrin (P) and anthraquinone (AnQ) subunits have been synthesized and fully characterized by FAB-MS, IR, UV-Visible and 1H NMR methods. The porphyrin donor and the anthraquinone acceptor subunits of these mono- or bis-intercalating hybrid molecules are interspersed with either cholate or polymethylene spacers. There exists minimal ground- and singlet-state interaction between the porphyrin and anthraquinone subunits in the giant-sized, cholate-interspersed P-AnQ systems as revealed by a comparison of their spectroscopic and electrochemical properties with those of the corresponding individual reference compounds. On the other hand, quenching of fluorescence observed for the P-AnQ systems endowed with polymethylene spacers has been interpreted in terms of a possible intramolecular electron transfer between the singlet porphyrin and the anthraquinone acceptor. When excited into their porphyrin absorption band maxima, each new P-AnQ system could generate singlet molecular oxygen in good-to-moderate yield. Wavelength-dependent photonuclease activity of these new bis-intercalating species has been examined

    A Hybrid Computational Intelligence based Technique for Automatic Cryptanalysis of Playfair Ciphers

    Get PDF
    The Playfair cipher is a symmetric key cryptosystem-based on encryption of digrams of letters. The cipher shows higher cryptanalytic complexity compared to mono-alphabetic cipher due to the use of 625 different letter-digrams in encryption instead of 26 letters from Roman alphabets. Population-based techniques like Genetic algorithm (GA) and Swarm intelligence (SI) are more suitable compared to the Brute force approach for cryptanalysis of cipher because of specific and unique structure of its Key Table. This work is an attempt to automate the process of cryptanalysis using hybrid computational intelligence. Multiple particle swarm optimization (MPSO) and GA-based hybrid technique (MPSO-GA) have been proposed and applied in solving Playfair ciphers. The authors have attempted to find the solution key applied in generating Playfair crypts by using the proposed hybrid technique to reduce the exhaustive search space. As per the computed results of the MPSO-GA technique, correct solution was obtained for the Playfair ciphers of 100 to 200 letters length. The proposed technique provided better results compared to either GA or PSO-based technique. Furthermore, the technique was also able to recover partial English text message for short Playfair ciphers of 80 to 120 characters length
    • …
    corecore