994 research outputs found

    Mining Knowledge in Astrophysical Massive Data Sets

    Full text link
    Modern scientific data mainly consist of huge datasets gathered by a very large number of techniques and stored in very diversified and often incompatible data repositories. More in general, in the e-science environment, it is considered as a critical and urgent requirement to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise. In the last decade, Astronomy has become an immensely data rich field due to the evolution of detectors (plates to digital to mosaics), telescopes and space instruments. The Virtual Observatory approach consists into the federation under common standards of all astronomical archives available worldwide, as well as data analysis, data mining and data exploration applications. The main drive behind such effort being that once the infrastructure will be completed, it will allow a new type of multi-wavelength, multi-epoch science which can only be barely imagined. Data Mining, or Knowledge Discovery in Databases, while being the main methodology to extract the scientific information contained in such MDS (Massive Data Sets), poses crucial problems since it has to orchestrate complex problems posed by transparent access to different computing environments, scalability of algorithms, reusability of resources, etc. In the present paper we summarize the present status of the MDS in the Virtual Observatory and what is currently done and planned to bring advanced Data Mining methodologies in the case of the DAME (DAta Mining & Exploration) project.Comment: Pages 845-849 1rs International Conference on Frontiers in Diagnostics Technologie

    Alexandria: Extensible Framework for Rapid Exploration of Social Media

    Full text link
    The Alexandria system under development at IBM Research provides an extensible framework and platform for supporting a variety of big-data analytics and visualizations. The system is currently focused on enabling rapid exploration of text-based social media data. The system provides tools to help with constructing "domain models" (i.e., families of keywords and extractors to enable focus on tweets and other social media documents relevant to a project), to rapidly extract and segment the relevant social media and its authors, to apply further analytics (such as finding trends and anomalous terms), and visualizing the results. The system architecture is centered around a variety of REST-based service APIs to enable flexible orchestration of the system capabilities; these are especially useful to support knowledge-worker driven iterative exploration of social phenomena. The architecture also enables rapid integration of Alexandria capabilities with other social media analytics system, as has been demonstrated through an integration with IBM Research's SystemG. This paper describes a prototypical usage scenario for Alexandria, along with the architecture and key underlying analytics.Comment: 8 page

    The DAME/VO-Neural Infrastructure: an Integrated Data Mining System Support for the Science Community

    Get PDF
    Astronomical data are gathered through a very large number of heterogeneous techniques and stored in very diversified and often incompatible data repositories. Moreover in the e-science environment, it is needed to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise and/or external resource sharing and service provider relationships. The DAME/VONeural project, run jointly by the University Federico II, INAF (National Institute of Astrophysics) Astronomical Observatories of Napoli and the California Institute of Technology, aims at creating a single, sustainable, distributed e-infrastructure for data mining and exploration in massive data sets, to be offered to the astronomical (but not only) community as a web application. The framework makes use of distributed computing environments (e.g. S.Co.P.E.) and matches the international IVOA standards and requirements. The integration process is technically challenging due to the need of achieving a specific quality of service when running on top of different native platforms. In these terms, the result of the DAME/VO-Neural project effort will be a service-oriented architecture, obtained by using appropriate standards and incorporating Grid paradigms and restful Web services frameworks where needed, that will have as main target the integration of interdisciplinary distributed systems within and across organizational domains.Comment: 10 pages, Proceedings of the Final Workshop of the Grid Projects of the Italian National Operational Programme 2000-2006 Call 1575; Edited by Cometa Consortium, 2009, ISBN: 978-88-95892-02-

    Internet of things

    Get PDF
    Manual of Digital Earth / Editors: Huadong Guo, Michael F. Goodchild, Alessandro Annoni .- Springer, 2020 .- ISBN: 978-981-32-9915-3Digital Earth was born with the aim of replicating the real world within the digital world. Many efforts have been made to observe and sense the Earth, both from space (remote sensing) and by using in situ sensors. Focusing on the latter, advances in Digital Earth have established vital bridges to exploit these sensors and their networks by taking location as a key element. The current era of connectivity envisions that everything is connected to everything. The concept of the Internet of Things(IoT)emergedasaholisticproposaltoenableanecosystemofvaried,heterogeneous networked objects and devices to speak to and interact with each other. To make the IoT ecosystem a reality, it is necessary to understand the electronic components, communication protocols, real-time analysis techniques, and the location of the objects and devices. The IoT ecosystem and the Digital Earth (DE) jointly form interrelated infrastructures for addressing today’s pressing issues and complex challenges. In this chapter, we explore the synergies and frictions in establishing an efficient and permanent collaboration between the two infrastructures, in order to adequately address multidisciplinary and increasingly complex real-world problems. Although there are still some pending issues, the identified synergies generate optimism for a true collaboration between the Internet of Things and the Digital Earth

    Praxis Market Drift

    Get PDF
    Over the last decade, digital data has been growing exponentially. Unstructured data is rapidly outgrowing structured data, and so managing unstructured data is increasing as a challenge for many organisations. Consequently, text mining has been gaining traction as a way to deal with unstructured data. Text mining is a form of data mining that deals with text and is the process of transforming unstructured text into meaningful and actionable information. Praxis is a platform that implements a virtual market for project/internship offers. organisations submit their project/internship offers that become available for search, and students search the platform using keywords that express their interest. Praxis has loads of unexplored data from which they can extract useful information to obtain more insights on their internships market. This dissertation proposes a solution based on text mining techniques, that displays the necessary information to analyse the evolution of users’ interests and internship offers submitted in Praxis, over time.Durante a última década, a quantidade de dados digitais tem vindo a crescer exponencialmente. A quantidade de dados não estruturados está rapidamente a superar a quantidade de dados estruturados, e portanto, a gerência de dados não estruturados está a crescer como um desafio para várias organizações. Consequentemente, text mining tem vindo a ganhar tração como forma de lidar com dados não estruturados. Text mining é uma forma de data mining que trabalha com texto e é o processo the transformar dados não estruturados em informação significativa. O Praxis é uma plataforma que implementa um mercado virtual para ofertas de projetos/estágios. Organizações submetem as suas propostas de projeto/estágio que ficam disponíveis para pesquisa, e os estudantes pesquisam na plataforma, com recurso a palavraschave que expressam os seus interesses. O Praxis tem muitos dados inexplorados dos quais eles podem extrair informações úteis para obter uma melhor perceção do seu mercado de estágios académicos. Esta dissertação propõe uma solução baseada em técnicas de text mining, que apresenta a informação necessária para analisar a evolução dos interesses dos utilizadores e das ofertas de estágio no Praxis, ao longo do tempo

    GENE EXPRESSION PROSPECTIVE SIMULATION AND ANALYSIS USING DATA MINING AND IMMERSIVE VIRTUAL REALITY VISUALIZATION

    Get PDF
    Biological exploration on genetic expression and protein synthesis in living organisms is used to discover causal and interactive relationships in biological processes. Current GeneChip microarray technology provides a platform to an- alyze up to 500,000 molecular reactions on a single chip, providing thousands of genetic and protein expression results per test. Using visualization tools and priori knowledge of genetic and protein interactions, visual networks are used to model and analyze the results. The virtual reality environment designed and implemented for this project provides visualization and data modeling tools commonly used in genetic ex- pression data analysis. The software processes normalized genetic profile data from microarray testing results and association information from protein-to- protein databases. The data is modeled using a network of nodes to represent data points and edges to show relationships. This information is visualized in virtual reality and modeled using force directed networking algorithms in a fully explorable environment

    A survey of exploratory search systems based on LOD resources

    Get PDF
    The fact that the existing Web allows people to effortlessly share data over the Internet has resulted in the accumulation of vast amounts of information available on the Web.Therefore, a powerful search technology that will allow retrieval of relevant information is one of the main requirements for the success of the Web which is complicated further due to use of many different formats for storing information. Semantic Web technology plays a major role in resolving this problem by permitting the search engines to retrieve meaningful information. Exploratory search system, a special information seeking and exploration approach, supports users who are unfamiliar with a topic or whose search goals are vague and unfocused to learn and investigate a topic through a set of activities. In order to achieve exploratory search goals Linked Open Data (LOD) can be used to help search systems in retrieving related data, so the investigation task runs smoothly.This paper provides an overview of the Semantic Web Technology, Linked Data and search strategies, followed by a survey of the state of the art Exploratory Search Systems based on LOD.Finally the systems are compared in various aspects such as algorithms, result rankings and explanations

    Self-adaptive mobile web service discovery framework for dynamic mobile environment

    Get PDF
    The advancement in mobile technologies has undoubtedly turned mobile web service (MWS) into a significant computing resource in a dynamic mobile environment (DME). The discovery is one of the critical stages in the MWS life cycle to identify the most relevant MWS for a particular task as per the request's context needs. While the traditional service discovery frameworks that assume the world is static with predetermined context are constrained in DME, the adaptive solutions show potential. Unfortunately, the effectiveness of these frameworks is plagued by three problems. Firstly, the coarse-grained MWS categorization approach that fails to deal with the proliferation of functionally similar MWS. Secondly, context models constricted by insufficient expressiveness and inadequate extensibility confound the difficulty in describing the DME, MWS, and the user’s MWS needs. Thirdly, matchmaking requires manual adjustment and disregard context information that triggers self-adaptation, leading to the ineffective and inaccurate discovery of relevant MWS. Therefore, to address these challenges, a self-adaptive MWS discovery framework for DME comprises an enhanced MWS categorization approach, an extensible meta-context ontology model, and a self-adaptive MWS matchmaker is proposed. In this research, the MWS categorization is achieved by extracting the goals and tags from the functional description of MWS and then subsuming k-means in the modified negative selection algorithm (M-NSA) to create categories that contain similar MWS. The designing of meta-context ontology is conducted using the lightweight unified process for ontology building (UPON-Lite) in collaboration with the feature-oriented domain analysis (FODA). The self-adaptive MWS matchmaking is achieved by enabling the self-adaptive matchmaker to learn MWS relevance using a Modified-Negative Selection Algorithm (M-NSA) and retrieve the most relevant MWS based on the current context of the discovery. The MWS categorization approach was evaluated, and its impact on the effectiveness of the framework is assessed. The meta-context ontology was evaluated using case studies, and its impact on the service relevance learning was assessed. The proposed framework was evaluated using a case study and the ProgrammableWeb dataset. It exhibits significant improvements in terms of binary relevance, graded relevance, and statistical significance, with the highest average precision value of 0.9167. This study demonstrates that the proposed framework is accurate and effective for service-based application designers and other MWS clients

    TextRWeb: Large-Scale Text Analytics with R on the Web

    Get PDF
    As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, NLP, and other text analysis techniques. R is a popular and powerful text analytics tool; however, it needs to run in parallel and re- quires special handling to protect copyrighted content against full access (consumption). The HathiTrust Research Center (HTRC) currently has 11 million volumes (books) where 7 million volumes are copyrighted. In this paper we propose HTRC TextRWeb, an interactive R software environment which employs complexity hiding interfaces and automatic code generation to allow large-scale text analytics in a non-consumptive means. For our principal test case of copyrighted data in HathiTrust Digital Library, TextRWeb permits us to code, edit, and submit text analytics methods empowered by a family of interactive web user interfaces. All these methods combine to reveal a new interactive paradigm for large-scale text analytics on the web

    Cyber–Physical–Social Frameworks for Urban Big Data Systems: A Survey

    Get PDF
    The integration of things’ data on the Web and Web linking for things’ description and discovery is leading the way towards smart Cyber–Physical Systems (CPS). The data generated in CPS represents observations gathered by sensor devices about the ambient environment that can be manipulated by computational processes of the cyber world. Alongside this, the growing use of social networks offers near real-time citizen sensing capabilities as a complementary information source. The resulting Cyber–Physical–Social System (CPSS) can help to understand the real world and provide proactive services to users. The nature of CPSS data brings new requirements and challenges to different stages of data manipulation, including identification of data sources, processing and fusion of different types and scales of data. To gain an understanding of the existing methods and techniques which can be useful for a data-oriented CPSS implementation, this paper presents a survey of the existing research and commercial solutions. We define a conceptual framework for a data-oriented CPSS and detail the various solutions for building human–machine intelligence
    corecore