Search CORE

966 research outputs found

A Survey on Index Support for Item Set Mining

Author: Dr.P K Singhal
Senthil Prakash.T
Publication venue: Global Journals Inc. (US)
Publication date: 30/06/2011
Field of study

It is very difficult to handle the huge amount of information stored in modern databases. To manage with these databases association rule mining is currently used, which is a costly process that involves a significant amount of time and memory. Therefore, it is necessary to develop an approach to overcome these difficulties. A suitable data structures and algorithms must be developed to effectively perform the item set mining. An index includes all necessary characteristics potentially needed during the mining task; the extraction can be executed with the help of the index, without accessing the database. A database index is a data structure that enhances the speed of information retrieval operations on a database table at very low cost and increased storage space. The use index permits user interaction, in which the user can specify different attributes for item set extraction. Therefore, the extraction can be completed with the use index and without accessing the original database. Index also supports for reusing concept to mine item sets with the use of any support threshold. This paper also focuses on the survey of index support for item set mining which are proposed by various authors

Retirement Wealth Across Cohorts: The Role of Earnings Inequality and Pension Changes

Author: Ann Huff Stevens
Publication venue
Publication date
Field of study

Changes in labor markets over the past 30 years suggest upcoming changes in the distribution of wealth at retirement. Baby boom cohorts have spent the majority of their prime earnings years in a labor market with increased earnings inequality. This paper investigates how changes in lifetime earnings distributions affect the distribution of retirement wealth among cohorts retiring over the next decade. I use data from the Health and Retirement Study from 1992 to 2004 to estimate the relationship between lifetime earnings, pre-retirement private wealth and Social Security wealth. I show that changes in the lower half of the male earnings distribution explain a substantial portion of changes in the distribution of pre-retirement wealth. Growth in women’s earnings across the cohorts do not offset these declines in wealth associated with male earnings. When pensions are added to the measure of wealth, the role of earnings is even larger, reflecting a strong correlation between changes in earnings across these cohorts and changes in the values of their employer-provided pensions. These pension changes do not appear to operate via changes in pension structures (defined benefit versus defined contribution). The present value of wealth from future Social Security benefits, in contrast, grows in real terms throughout most of the distribution. At the bottom of the male distribution of Social Security wealth, reductions in lifetime earnings limit this growth in real benefits, while at the top of the distribution earnings growth amplifies expected growth in Social Security wealth.

A framework for clustering and adaptive topic tracking on evolving text and social media data streams.

Author: Nutakki Gopi Chand
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2017
Field of study

Recent advances and widespread usage of online web services and social media platforms, coupled with ubiquitous low cost devices, mobile technologies, and increasing capacity of lower cost storage, has led to a proliferation of Big data, ranging from, news, e-commerce clickstreams, and online business transactions to continuous event logs and social media expressions. These large amounts of online data, often referred to as data streams, because they get generated at extremely high throughputs or velocity, can make conventional and classical data analytics methodologies obsolete. For these reasons, the issues of management and analysis of data streams have been researched extensively in recent years. The special case of social media Big Data brings additional challenges, particularly because of the unstructured nature of the data, specifically free text. One classical approach to mine text data has been Topic Modeling. Topic Models are statistical models that can be used for discovering the abstract ``topics\u27\u27 that may occur in a corpus of documents. Topic models have emerged as a powerful technique in machine learning and data science, providing a great balance between simplicity and complexity. They also provide sophisticated insight without the need for real natural language understanding. However they have not been designed to cope with the type of text data that is abundant on social media platforms, but rather for traditional medium size corpora consisting of longer documents, adhering to a specific language and typically spanning a stable set of topics. Unlike traditional document corpora, social media messages tend to be very short, sparse, noisy, and do not adhere to a standard vocabulary, linguistic patterns, or stable topic distributions. They are also generated at high velocity that impose high demands on topic modeling; and their evolving or dynamic nature, makes any set of results from topic modeling quickly become stale in the face of changes in the textual content and topics discussed within social media streams. In this dissertation, we propose an integrated topic modeling framework built on top of an existing stream-clustering framework called Stream-Dashboard, which can extract, isolate, and track topics over any given time period. In this new framework, Stream Dashboard first clusters the data stream points into homogeneous groups. Then data from each group is ushered to the topic modeling framework which extracts finer topics from the group. The proposed framework tracks the evolution of the clusters over time to detect milestones corresponding to changes in topic evolution, and to trigger an adaptation of the learned groups and topics at each milestone. The proposed approach to topic modeling is different from a generic Topic Modeling approach because it works in a compartmentalized fashion, where the input document stream is split into distinct compartments, and Topic Modeling is applied on each compartment separately. Furthermore, we propose extensions to existing topic modeling and stream clustering methods, including: an adaptive query reformulation approach to help focus on the topic discovery with time; a topic modeling extension with adaptive hyper-parameter and with infinite vocabulary; an adaptive stream clustering algorithm incorporating the automated estimation of dynamic, cluster-specific temporal scales for adaptive forgetting to help facilitate clustering in a fast evolving data stream. Our experimental results show that the proposed adaptive forgetting clustering algorithm can mine better quality clusters; that our proposed compartmentalized framework is able to mine topics of better quality compared to competitive baselines; and that the proposed framework can automatically adapt to focus on changing topics using the proposed query reformulation strategy

University of Louisville

The LOFAR Transients Pipeline

Current and future astronomical survey facilities provide a remarkably rich opportunity for transient astronomy, combining unprecedented fields of view with high sensitivity and the ability to access previously unexplored wavelength regimes. This is particularly true of LOFAR, a recently-commissioned, low-frequency radio interferometer, based in the Netherlands and with stations across Europe. The identification of and response to transients is one of LOFAR's key science goals. However, the large data volumes which LOFAR produces, combined with the scientific requirement for rapid response, make automation essential. To support this, we have developed the LOFAR Transients Pipeline, or TraP. The TraP ingests multi-frequency image data from LOFAR or other instruments and searches it for transients and variables, providing automatic alerts of significant detections and populating a lightcurve database for further analysis by astronomers. Here, we discuss the scientific goals of the TraP and how it has been designed to meet them. We describe its implementation, including both the algorithms adopted to maximize performance as well as the development methodology used to ensure it is robust and reliable, particularly in the presence of artefacts typical of radio astronomy imaging. Finally, we report on a series of tests of the pipeline carried out using simulated LOFAR observations with a known population of transients.Comment: 30 pages, 11 figures; Accepted for publication in Astronomy & Computing; Code at https://github.com/transientskp/tk

arXiv.org e-Print Archive

HAL-INSU

Hal-Diderot

RustHorn: CHC-based Verification for Rust Programs (full version)

Author: Kobayashi Naoki
Matsushita Yusuke
Tsukada Takeshi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/06/2020
Field of study

Reduction to the satisfiability problem for constrained Horn clauses (CHCs) is a widely studied approach to automated program verification. The current CHC-based methods for pointer-manipulating programs, however, are not very scalable. This paper proposes a novel translation of pointer-manipulating Rust programs into CHCs, which clears away pointers and memories by leveraging ownership. We formalize the translation for a simplified core of Rust and prove its correctness. We have implemented a prototype verifier for a subset of Rust and confirmed the effectiveness of our method.Comment: Full version of the same-titled paper in ESOP202

arXiv.org e-Print Archive