1,130 research outputs found

    Coping with new Challenges in Clustering and Biomedical Imaging

    Get PDF
    The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people

    Providing Diversity in K-Nearest Neighbor Query Results

    Full text link
    Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.Comment: 20 pages, 11 figure

    Conservation of the critically endangered frog Telmatobufo bullocki in fragmented temperate forests of Chile : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Conservation Biology at Massey University, Albany, New Zealand

    Get PDF
    Amphibians are currently facing several threats and are suffering severe population declines and extinction worldwide. Telmatobufo bullocki (Anura: Calyptocephalellidae) is one of the rarest and most endangered amphibian species in Chile's temperate forests. It is the fifth most evolutionarily distinct and globally endangered (EDGE) amphibian in the world, and one of the world's top 100 priority species for conservation (Zoological Society of London, 2011).This stream-breeding frog is micro-endemic to the coastal Nahuelbuta mountain range in central-south Chile (37°C - 38°50' S), a hot-spot for conservation. This area has suffered severe loss and fragmentation of native forest, which has been replaced by extensive commercial plantations of exotic pines and eucalyptus. Despite its potential detrimental effects, the impact of native forest loss on this species has not been studied before. Furthermore, few historical observations exist, and the ecology and behaviour of the species is poorly known. In addition, current status and location of extant populations are uncertain, which makes conservation and targeted habitat protection difficult. Through the use of different approaches and modern conservation tools this thesis aims to make a significant contribution to the conservation of T.bullocki and its habitat. Historical and new locations were surveyed to identify extant populations. A distribution modeling approach (i.e. Maxent) was used to infer the species’ distribution within Nahuelbuta, generate a predictive habitat suitability map, identify important environmental associations, and assess the impact of main environmental threats (i.e. native forest loss, climate change).Field-based research (e.g. surveys, radio-tracking) was done to extend the ecological and behavioural knowledge of the species (e.g. movement patterns and habitat use), and identify critical aquatic and terrestrial habitat for protection (i.e. core habitat). Mitochondrial and specifically developed microsatellite genetic markers were used to measure levels of intra-specific genetic variability, define genetic population structure and connectivity, infer evolutionary history (phylogeography), estimate effective population size and detect demographic changes (e.g. bottlenecks). Finally, a landscape genetics approach was used to relate landscape characteristics to contemporary patterns of gene flow, and identify important landscape features facilitating (i.e. corridors) or hindering (i.e. barriers) genetic connectivity between populations. Telmatobufo bullocki was found in nine basins within Nahuelbuta, including historic and new locations. Presence of T. bullocki was positively related to the amount of native forests in the landscape. However, some populations persist in areas dominated by exotic plantations. Some frogs were found living under mature pine plantation adjacent to native forest, but no frogs were found in core plantation areas.T. bullocki makes extensive use of terrestrial habitat adjacent to breeding streams during the post-breeding season, moving up to 500 m away from streams. A core terrestrial habitat of at least 220 m from streams is proposed for the protection of populations. Population genetics and phylogeography revealed significant population structure. The northernmost and disjunct population of Chivilingo is geographically and genetically isolated from all other sampled populations and was identified as a separate evolutionary significant unit (ESU). The population of Los Lleulles was also identified as a separate management unit, while the remaining populations were grouped into two clusters forming a larger and more connected metaC population. Connectivity within groups was high, suggesting individuals are able to disperse between neighbouring basins. Levels of genetic diversity were not homogeneous, and were lowest at Los Lleulles and highest at Caramávida. Results suggest disjunct populations are at highest risk and should be prioritised for restoration and habitat protection, while management of metaCpopulations should aim at maintaining and improving connectivity among basins. Landscape genetic results identified streams and riparian habitat as dispersal pathways, and least-cost-path analysis was used to identify a potential connectivity network

    A SURVEY ON CHURN ANALYSIS AND PREDICTION IN VIDEO ON DEMAND

    Get PDF
    Consumer loyalty is a key measure of achievement. Despondent clients will not be staying around the service. When there are unhappy clients once in a while voice their disappointment before leaving. Streaming administration, motion pictures, and television shows are gushing over the internet, not being downloaded, so we should be associated with the internet all through your watch instantly experience. Hence, to help them recognize to disappointed clients†from the get-go in their relationship. Doing as such would permit streaming administration to find a way to enhance client's joy†before it's excessively late. To distinguish the purposes behind clients who are producing from the spilling administration furthermore anticipating what number of clients will get stick around the gushing administration.Â

    Threshold interval indexing techniques for complicated uncertain data

    Get PDF
    Uncertain data is an increasingly prevalent topic in database research, given the advance of instruments which inherently generate uncertainty in their data. In particular, the problem of indexing uncertain data for range queries has received considerable attention. To efficiently process range queries, existing approaches mainly focus on reducing the number of disk I/Os. However, due to the inherent complexity of uncertain data, processing a range query may incur high computational cost in addition to the I/O cost. In this paper, I present a novel indexing strategy focusing on one-dimensional uncertain continuous data, called threshold interval indexing. Threshold interval indexing is able to balance I/O cost and computational cost to achieve an optimal overall query performance. A key ingredient of the proposed indexing structure is a dynamic interval tree. The dynamic interval tree is much more resistant to skew than R-trees, which are widely used in other indexing structures. This interval tree optimizes pruning by storing x-bounds, or pre-calculated probability boundaries, at each node. In addition to the basic threshold interval index, I present two variants, called the strong threshold interval index and the hyper threshold interval index, which leverage x-bounds not only for pruning but also for accepting results. Furthermore, I present a more efficient memory-loaded versions of these indexes, which reduce the storage size so the primary interval tree can be loaded into memory. Each index description includes methods for querying, parallelizing, updating, bulk loading, and externalizing. I perform an extensive set of experiments to demonstrate the effectiveness and efficiency of the proposed indexing strategies

    A novel cloud services recommendation system based on automatic learning techniques

    Get PDF
    The Cloud Computing technology is evolving constantly but essence remains the same that is to offer distinct cost saving opportunities by consolidating and restructuring information technology as a service. With the continuously increasing cloud provisions, cloud consumers start to have difficulties to find the best relevant services that suit their requirements. Therefore, selecting best services by cloud users is becoming a greater challenge. In this paper, we present a framework of services' recommendation system in a Cloud environment, using automatic learning techniques. The system aims at finding the services that suit the interests and preferences of cloud consumers by combining content based and behaviour based recommendations. In this paper, we present, USTHBCLOUD, a cloud services recommendation prototype evaluated with an experimental study. © 2017 IEEE

    Planning tiger recovery: Understanding intraspecific variation for effective conservation

    Get PDF
    Although significantly more money is spent on the conservation of tigers than on any other threatened species, today only 3200 to 3600 tigers roam the forests of Asia, occupying only 7% of their historical range. Despite the global significance of and interest in tiger conservation, global approaches to plan tiger recovery are partly impeded by the lack of a consensus on the number of tiger subspecies or management units, because a comprehensive analysis of tiger variation is lacking. We analyzed variation among all nine putative tiger subspecies, using extensive data sets of several traits [morphological (craniodental and pelage), ecological, molecular]. Our analyses revealed little variation and large overlaps in each trait among putative subspecies, and molecular data showed extremely low diversity because of a severe Late Pleistocene population decline. Our results support recognition of only two subspecies: the Sunda tiger, Panthera tigris sondaica, and the continental tiger, Panthera tigris tigris, which consists of two (northern and southern) management units. Conservation management programs, such as captive breeding, reintroduction initiatives, or trans-boundary projects, rely on a durable, consistent characterization of subspecies as taxonomic units, defined by robust multiple lines of scientific evidence rather than single traits or ad hoc descriptions of one or few specimens. Our multiple-trait data set supports a fundamental rethinking of the conventional tiger taxonomy paradigm, which will have profound implications for the management of in situ and ex situ tiger populations and boost conservation efforts by facilitating a pragmatic approach to tiger conservation management worldwid

    The visual preferences for forest regeneration and field afforestation : four case studies in Finland

    Get PDF
    The overall aim of this dissertation was to study the public's preferences for forest regeneration fellings and field afforestations, as well as to find out the relations of these preferences to landscape management instructions, to ecological healthiness, and to the contemporary theories for predicting landscape preferences. This dissertation includes four case studies in Finland, each based on the visualization of management options and surveys. Guidelines for improving the visual quality of forest regeneration and field afforestation are given based on the case studies. The results show that forest regeneration can be connected to positive images and memories when the regeneration area is small and some time has passed since the felling. Preferences may not depend only on the management alternative itself but also on the viewing distance, viewing point, and the scene in which the management options are implemented. The current Finnish forest landscape management guidelines as well as the ecological healthiness of the studied options are to a large extent compatible with the public's preferences. However, there are some discrepancies. For example, the landscape management instructions as well as ecological hypotheses suggest that the retention trees need to be left in groups, whereas people usually prefer individually located retention trees to those trees in groups. Information and psycho-evolutionary theories provide some possible explanations for people's preferences for forest regeneration and field afforestation, but the results cannot be consistently explained by these theories. The preferences of the different stakeholder groups were very similar. However, the preference ratings of the groups that make their living from forest - forest owners and forest professionals - slightly differed from those of the others. These results provide support for the assumptions that preferences are largely consistent at least within one nation, but that knowledge and a reference group may also influence preferences.Väitöskirjassa tutkittiin ihmisten maisemapreferenssejä (maisemallisia arvostuksia) metsänuudistamishakkuiden ja pellonmetsitysten suhteen sekä analysoitiin näiden preferenssien yhteyksiä maisemanhoito-ohjeisiin, vaihtoehtojen ekologiseen terveyteen ja preferenssejä ennustaviin teorioihin. Väitöskirja sisältää neljä tapaustutkimusta, jotka perustuvat hoitovaihtoehtojen visualisointiin ja kyselytutkimuksiin. Tapaustutkimusten pohjalta annetaan ohjeita siitä, kuinka uudistushakkuiden ja pellonmetsitysten visuaalista laatua voidaan parantaa. Väitöskirjan tulokset osoittavat, että uudistamishakkuut voivat herättää myös myönteisiä mielikuvia ja muistoja, jos uudistusala on pieni ja hakkuun välittömät jäljet ovat jo peittyneet. Preferensseihin vaikuttaa hoitovaihtoehdon lisäksi mm. katseluetäisyys, katselupiste ja ympäristö, jossa vaihtoehto on toteutettu. Eri viiteryhmien (metsäammattilaiset, pääkaupunkiseudun asukkaat, ympäristönsuojelijat, tutkimusalueiden matkailijat, paikalliset asukkaat sekä metsänomistajat) maisemapreferenssit olivat hyvin samankaltaisia. Kuitenkin ne ryhmät, jotka saavat ainakin osan elannostaan metsästä - metsänomistajat ja metsäammattilaiset - pitivät metsänhakkuita esittävistä kuvista hieman enemmän kuin muut ryhmät. Nämä tulokset tukevat oletusta, että maisemapreferenssit ovat laajalti yhteneväisiä ainakin yhden kansan tai kulttuurin keskuudessa, vaikka myös viiteryhmä saattaa vaikuttaa preferensseihin jonkin verran. Nykyiset metsämaisemanhoito-ohjeet ovat pitkälti samankaltaisia tässä väitöskirjassa havaittujen maisemapreferenssien kanssa. Myöskään tutkittujen vaihtoehtoisten hoitotapojen ekologisen paremmuuden ja niihin kohdistuvien maisemallisten arvostusten välillä ei ollut suurta ristiriitaa. Kuitenkin joitakin eroavaisuuksia oli; esimerkiksi sekä maisemanhoito-ohjeiden että ekologisten hypoteesien mukaan säästöpuut tulisi jättää ryhmiin, kun taas ihmiset pitivät eniten yksittäin jätetyistä puista. Informaatiomalli ja psyko-evolutionaarinen teoria tarjoavat mahdollisia selityksiä uudistushakkuisiin ja pellonmetsitykseen kohdistuville preferensseille, vaikkakaan tutkimuksen tuloksia ei voida täysin selittää näillä teorioilla
    corecore