27,268 research outputs found

    Data Management and Mining in Astrophysical Databases

    Full text link
    We analyse the issues involved in the management and mining of astrophysical data. The traditional approach to data management in the astrophysical field is not able to keep up with the increasing size of the data gathered by modern detectors. An essential role in the astrophysical research will be assumed by automatic tools for information extraction from large datasets, i.e. data mining techniques, such as clustering and classification algorithms. This asks for an approach to data management based on data warehousing, emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Clustering and classification techniques, on large datasets, pose additional requirements: computational and memory scalability with respect to the data size, interpretability and objectivity of clustering or classification results. In this study we address some possible solutions.Comment: 10 pages, Late

    A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters

    Full text link
    Keyword-based web queries with local intent retrieve web content that is relevant to supplied keywords and that represent points of interest that are near the query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, namely the top-k spatial textual clusters (k-STC) query that returns the top-k clusters that (i) are located the closest to a given query location, (ii) contain the most relevant objects with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a basic algorithm that relies on on-line density-based clustering and exploits an early stop condition. To improve the response time, we design an advanced approach that includes three techniques: (i) an object skipping rule, (ii) spatially gridded posting lists, and (iii) a fast range query algorithm. An empirical study on real data demonstrates that the paper's proposals offer scalability and are capable of excellent performance

    RECIPE SUGGESTION TOOL

    Get PDF
    ABSTRACTThere is currently a great need for a tool to search cooking recipes based on ingredients. Current search engines do not provide this feature. Most of the recipe search results in current websites are not efficiently clustered based on relevance or categories resulting in a user getting lost in the huge search results presented.Clustering in information retrieval is used for higher efficiency and better presentation of information to the user. Clustering puts similar documents in the same cluster. If a document is relevant to a query, then the documents in the same cluster are also relevant.The goal of this project is to implement clustering on recipes. The user can search for recipes based on ingredient

    Does land use and landscape contribute to self-harm? A sustainability cities framework

    Get PDF
    Self-harm has become one of the leading causes of mortality in developed countries. The overall rate for suicide in Canada is 11.3 per 100,000 according to Statistics Canada in 2015. Between 2000 and 2007 the lowest rates of suicide in Canada were in Ontario, one of the most urbanized regions in Canada. However, the interaction between land use, landscape and self-harm has not been significantly studied for urban cores. It is thus of relevance to understand the impacts of land-use and landscape on suicidal behavior. This paper takes a spatial analytical approach to assess the occurrence of self-harm along one of the densest urban cores in the country: Toronto. Individual self-harm data was gathered by the National Ambulatory Care System (NACRS) and geocoded into census tract divisions. Toronto’s urban landscape is quantified at spatial level through the calculation of its land use at di erent levels: (i) land use type, (ii) sprawl metrics relating to (a) dispersion and (b) sprawl/mix incidence; (iii) fragmentation metrics of (a) urban fragmentation and (b) density and (iv) demographics of (a) income and (b) age. A stepwise regression is built to understand the most influential factors leading to self-harm from this selection generating an explanatory model.This research was supported by the Canadian Institutes of Health Research Strategic Team Grant in Applied Injury Research # TIR-103946 and the Ontario Neurotrauma Foundation grantinfo:eu-repo/semantics/publishedVersio

    A cluster driven log-volatility factor model: a deepening on the source of the volatility clustering

    Get PDF
    We introduce a new factor model for log volatilities that performs dimensionality reduction and considers contributions globally through the market, and locally through cluster structure and their interactions. We do not assume a-priori the number of clusters in the data, instead using the Directed Bubble Hierarchical Tree (DBHT) algorithm to fix the number of factors. We use the factor model and a new integrated non parametric proxy to study how volatilities contribute to volatility clustering. Globally, only the market contributes to the volatility clustering. Locally for some clusters, the cluster itself contributes statistically to volatility clustering. This is significantly advantageous over other factor models, since the factors can be chosen statistically, whilst also keeping economically relevant factors. Finally, we show that the log volatility factor model explains a similar amount of memory to a Principal Components Analysis (PCA) factor model and an exploratory factor model
    corecore