6 research outputs found

    How you move reveals who you are: understanding human behavior by analyzing trajectory data

    Get PDF
    The widespread use of mobile devices is producing a huge amount of trajectory data, making the discovery of movement patterns possible, which are crucial for understanding human behavior. Significant advances have been made with regard to knowledge discovery, but the process now needs to be extended bearing in mind the emerging field of behavior informatics. This paper describes the formalization of a semantic-enriched KDD process for supporting meaningful pattern interpretations of human behavior. Our approach is based on the integration of inductive reasoning (movement pattern discovery) and deductive reasoning (human behavior inference). We describe the implemented Athena system, which supports such a process, along with the experimental results on two different application domains related to traffic and recreation management

    Spatial Big Data Analytics: Classification Techniques for Earth Observation Imagery

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2016. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); xi, 120 pages.Spatial Big Data (SBD), e.g., earth observation imagery, GPS trajectories, temporally detailed road networks, etc., refers to geo-referenced data whose volume, velocity, and variety exceed the capability of current spatial computing platforms. SBD has the potential to transform our society. Vehicle GPS trajectories together with engine measurement data provide a new way to recommend environmentally friendly routes. Satellite and airborne earth observation imagery plays a crucial role in hurricane tracking, crop yield prediction, and global water management. The potential value of earth observation data is so significant that the White House recently declared that full utilization of this data is one of the nation's highest priorities. However, SBD poses significant challenges to current big data analytics. In addition to its huge dataset size (NASA collects petabytes of earth images every year), SBD exhibits four unique properties related to the nature of spatial data that must be accounted for in any data analysis. First, SBD exhibits spatial autocorrelation effects. In other words, we cannot assume that nearby samples are statistically independent. Current analytics techniques that ignore spatial autocorrelation often perform poorly such as low prediction accuracy and salt-and-pepper noise (i.e., pixels predicted as different from neighbors by mistake). Second, spatial interactions are not isotropic and vary across directions. Third, spatial dependency exists in multiple spatial scales. Finally, spatial big data exhibits heterogeneity, i.e., identical feature values may correspond to distinct class labels in different regions. Thus, learned predictive models may perform poorly in many local regions. My thesis investigates novel SBD analytics techniques to address some of these challenges. To date, I have been mostly focusing on the challenges of spatial autocorrelation and anisotropy via developing novel spatial classification models such as spatial decision trees for raster SBD (e.g., earth observation imagery). To scale up the proposed models, I developed efficient learning algorithms via computational pruning. The proposed techniques have been applied to real world remote sensing imagery for wetland mapping. I also had developed spatial ensemble learning framework to address the challenge of spatial heterogeneity, particularly the class ambiguity issues in geographical classification, i.e., samples with the same feature values belong to different classes in different spatial zones. Evaluations on three real world remote sensing datasets confirmed that proposed spatial ensemble learning outperforms current approaches such as bagging, boosting, and mixture of experts when class ambiguity exists

    Flexibility in Data Management

    Get PDF
    With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators. This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology

    Flexibility in Data Management

    Get PDF
    With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators. This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology

    Continuum : un modèle spatio-temporel et sémantique pour la découverte de phénomènes dynamiques au sein d’environnements géospatiaux

    Get PDF
    There is a need for decision-makers to be provided with both an overview of existing knowledge, and information which is as complete and up-to-date as possible on changes in certain features of the biosphere. Another objective is to bring together all the many attempts which have been made over the years at various levels (international, Community, national and regional) to obtain more information on the environment and the way it is changing. As a result, remote sensing tools monitor large amount of land cover informations enabling study of dynamic processes. However the size of the dataset require new tools to identify pattern and extract knowledge. We propose a model to discover knowledge on parcel data allowing analysis of dynamic geospatial phenomena using time, spatial and thematic data. The model is called Continuum and is able to track the evolution of spatial entities along time. Based on semantic web technologies, the model allows users to specify and to query spatio-temporal informations based on semantic de?nitions. The semantic of spatial relationships are of interest to qualify filiation relationships. The result of this process permit to identify evolutive patterns as a basis for studying the dynamics of the geospatial environment. To this end, we use CORINE datasets to study changes in a speci?c part of France. In our approach, we consider entities as having several representations during their lifecycle. Each representation includes identity, spatial and descriptives properties that evolve over time.Les gérants de territoires souhaitent avoir un aperçu des connaissances en cours et de l’évolution de certaines caractéristiques du territoire. Pour ce faire, les outils de télédétection enregistrent une grande quantité d’informations liées à la couverture territoriale permettant l’étude de processus dynamiques. La littérature du domaine de la modélisation spatio-temporelle est vaste et a donnée lieu à de nombreux modèles. Chacun révélant une certaine aptitude pour capturer l’évolution des différentes caractéristiques d’un environnement géospatial. Toutefois, les données nécessitent de nouveaux outils pour identifier des motifs et extraire de la connaissance. Nous proposons un modèle capable de découvrir la connaissance sur des données parcellaires et permettant l’analyse des phénomènes dynamiques à l’aide de données temporelles, spatiales et thématiques. Le modèle est appelé Continuum et se base sur les technologies du Web Sémantique pour permettre une représentation accrue du contexte de l’environnement géospatial et fournir des résultats d’analyses proches de ceux des experts du domaine via des opérations de raisonnement automatiques et valides. En définitive, ce modèle permet d’améliorer notre compréhension de la dynamique des territoires

    Security for Mobile Grid Systems

    Get PDF
    Grid computing technology is used as inexpensive systems to gather and utilize computational capability. This technology enhances applications services by arranging machines and distributed resources in a single huge computational entity. A Grid is a system that has the ability to organize resources which are not under the subject of centralized domain, utilize protocols and interfaces, and supply high quality of service. The Grid should have the ability to enhance not only the systems performance and job throughput of the applications participated but also increase the utilization scale of resources by employing effective resource management methods to the huge amount of its resources. Grid mobility appears as a technology to facilitate the accomplishment of requirements for Grid jobs as well as Grid users. This idea depends on migrating or relocating jobs, data and application software among Grid nodes. However, making use of mobility technology leads to data confidentiality problems within the Grid. Data confidentiality is the protection of data from intruders’ attacks. The data confidentiality can be addressed by limiting the mobility to trusted parts of the Grid, but this solution leads to the notion of Virtual Organizations (VOs). Also as a result of mobility technology the need for a tool to organize and enforce policies while applying the mobility has been increased. To date, not enough attention has been paid to policies that deal with data movements within the Grid. Most existing Grid systems have support only limited types of policies (e.g. CPU resources). A few designs consider enforcing data policies in their architecture. Therefore, we propose a policy-managed Grid environment that addresses these issues (user-submitted policy, data policy, and multiple VOs). In this research, a new policy management tool has been introduced to solve the mobility limitation and data confidentiality especially in the case of mobile sharing and data movements within the Grid. We present a dynamic and heterogeneous policy management framework that can give a clear policy definition about the ability to move jobs, data and application software from nodes to nodes during jobs’ execution in the Grid environment. This framework supports a multi-organization environment with different domains, supports the external Grid user preferences along with enforces policies for data movements and the mobility feature within different domains. The results of our research have been evaluated using Jade simulator, which is a software framework fully implemented in Java language and allows agents to execute tasks defined according to the agent policy. The simulation results have verified that the research aims enhance the security and performance in the Grid environments. They also show enhanced control over data and services distribution and usage and present practical evidence in the form of scenario test-bed data as to the effectiveness of our architecture
    corecore