102 research outputs found

    A PAC-Theory of Clustering with Advice

    Get PDF
    In the absence of domain knowledge, clustering is usually an under-specified task. For any clustering application, one can choose among a variety of different clustering algorithms, along with different preprocessing techniques, that are likely to result in dramatically different answers. Any of these solutions, however, can be acceptable depending on the application, and therefore, it is critical to incorporate prior knowledge about the data and the intended semantics of clustering into the process of clustering model selection. One scenario that we study is when the user (i.e., the domain expert) provides a clustering of a (relatively small) random subset of the data set. The clustering algorithm then uses this kind of ``advice'' to come up with a data representation under which an application of a fixed clustering algorithm (e.g., k-means) results in a partition of the full data set that is aligned with the user's knowledge. We provide ``advice complexity'' of learning a representation in this paradigm. Another form of ``advice'' can be obtained by allowing the clustering algorithm to interact with a domain expert by asking same-cluster queries: ``Do these two instances belong to the same cluster?''. The goal of the clustering algorithm will then be finding a partition of the data set that is consistent with the domain expert's knowledge (yet using only a small number of queries). Aside from studying the ``advice complexity'' (i.e., query complexity) of learning in this model, we investigate the trade-offs between computational and advice complexities of learning, showing that using a little bit of advice can turn an otherwise computationally hard clustering problem into a tractable one. In the second part of this dissertation we study the problem of learning mixture models, where we are given an i.i.d. sample generated from an unknown target from a family of mixture distributions, and want to output a distribution that is close to the target in total variation distance. In particular, given a sample-efficient learner for a base class of distributions (e.g., Gaussians), we show how one can come up with a sample-efficient method for learning mixtures of the base class (e.g., mixtures of k Gaussians). As a byproduct of this analysis, we are able to prove tighter sample complexity bounds for learning various mixture models. We also investigate how having access to the same-cluster queries (i.e., whether two instances were generated from the same mixture component) can help reducing the computational burden of learning within this model. Finally, we take a further step and introduce a novel method for distribution learning via a form of compression. In particular, we ask whether one can compress a large-enough sample set generated from a target distribution (by picking only a few instances from it) in a way that allows recovery of (an approximation to) the target distribution. We prove that if this is the case for all members of a class of distributions, then there is a sample-efficient way of distribution learning with respect to this class. As an application of this novel notion, we settle the sample complexity of learning mixtures of k axis-aligned Gaussian distributions (within logarithmic factors)

    Climatic effect of light-absorbing impurities on snow : experimental and field observations

    Get PDF
    Snow and ice are essential components of the Earth system, modulating the energy budget by reflecting sunlight back into the atmosphere, and through its importance in the hydrological cycle by being a reservoir for fresh water. Light-absorbing impurities (LAI), such as black carbon (BC) and mineral dust (MD), have a unique role in influencing the reflectance of the cryosphere. Deposition of the anthropogenic and natural LAI constituents onto these bright surfaces initiates powerful albedo feedbacks that will accelerate melt. This is important globally, but especially for regions such as the Arctic and the Himalaya. In this thesis, observations from both ambient and laboratory experiments are presented. The overarching research goal has been to better understand the climatic effect of LAI on snow. More specifically, an emphasis has been placed on exploring the process-level interactions between LAI and snow, which will enable better comprehension of LAI affecting the cryosphere. Key findings in this thesis involves the investigations on the horizontal variability of BC concentrations in the surface snow that indicate a larger variability on the order of meter scale at a pristine Arctic site compared to a polluted site nearby a major urban area. In outdoor experiments, where LAI were used to artificially dope natural snow surfaces, the snow albedo was observed to decrease following LAI deposition. The albedo decrease was on the same order as in situ measurements of LAI and albedo conducted elsewhere. As snow melted during the experiment, the snow density was observed to decrease with increasing LAI concentration, while this effect was not observed in non-melting snow. The water retention capacity in melting snow is likely to be decreased by the presence of LAI. Measurements examining the absorption of BC indicate that BC particles in the snow have less absorbing potential compared to BC particles generated in the laboratory. The LAI content of snow pit investigations from two glaciers in the Sunderdhunga valley, northern India, an area not previously examined for LAI, presented high BC and MD content, affecting the radiative balance of the glacier snow. At different points, MD may be greater than BC in absorbing light at the snow surface. A continued monitoring of LAI in the cryosphere, both on the detailed scale explored here, as well as on the larger modelling perspective is needed in order to understand the overall response of the cryosphere to climate change

    Industrial Applications: New Solutions for the New Era

    Get PDF
    This book reprints articles from the Special Issue "Industrial Applications: New Solutions for the New Age" published online in the open-access journal Machines (ISSN 2075-1702). This book consists of twelve published articles. This special edition belongs to the "Mechatronic and Intelligent Machines" section

    Tracer and Timescale Methods for Passive and Reactive Transport in Fluid Flows

    Get PDF
    Geophysical, environmental, and urban fluid flows (i.e., flows developing in oceans, seas, estuaries, rivers, aquifers, reservoirs, etc.) exhibit a wide range of reactive and transport processes. Therefore, identifying key phenomena, understanding their relative importance, and establishing causal relationships between them is no trivial task. Analysis of primitive variables (e.g., velocity components, pressure, temperature, concentration) is not always conducive to the most fruitful interpretations. Examining auxiliary variables introduced for diagnostic purposes is an option worth considering. In this respect, tracer and timescale methods are proving to be very effective. Such methods can help address questions such as, "where does a fluid-born dissolved or particulate substance come from and where will it go?" or, "how fast are the transport and reaction phenomena controlling the appearance and disappearance such substances?" These issues have been dealt with since the 19th century, essentially by means of ad hoc approaches. However, over the past three decades, methods resting on solid theoretical foundations have been developed, which permit the evaluation of tracer concentrations and diagnostic timescales (age, residence/exposure time, etc.) across space and time and using numerical models and field data. This book comprises research and review articles, introducing state-of-the-art diagnostic theories and their applications to domains ranging from shallow human-made reservoirs to lakes, river networks, marine domains, and subsurface flow

    Phytosociology applied to wildlife management - a study on the potentiality for the reintroduction of cervids in the Montemuro-Freita-Arada mountain range

    Get PDF
    Doutoramento em Engenharia Florestal e dos Recursos Naturais - Instituto Superior de AgronomiaThe aim of the present thesis was to assess the use of phytosociology in wildlife management. In Section II, as a case study, I investigated red deer (Cervus elaphus hispanicus) and roe deer (Capreolus capreolus) free ranging populations occurring in the Natural Park of Montesinho, northeast Portugal, using faecal-pellet counts to assess deer use of semi-natural meadows (lameiros) and forest communities. Phytosociological classification contributed to explain red deer spring selective use of meadows at finer scales and performed better than other clustering criteria for classifying vegetation patches. At the landscape level, composition of the neighbouring vegetation mosaic, topography, and meadow’s characteristics, as management status and dominant phytosociology, produced the best models for deer seasonal use of meadows. The forest use analysis revealed red and roe deer preference for oak forests over pine plantations, and habitat use overlapping between red and roe deer all year round. In Section III, I extrapolated the information gathered in Section II on deer use to build, for the Montemuro-Freita-Arada massif, a predictive map for roe deer use of meadows, showing a generally low use, with exception of isolated meadows closer to oak forest patche

    Semantics-Driven Aspect-Based Sentiment Analysis

    Get PDF
    People using the Web are constantly invited to share their opinions and preferences with the rest of the world, which has led to an explosion of opinionated blogs, reviews of products and services, and comments on virtually everything. This type of web-based content is increasingly recognized as a source of data that has added value for multiple application domains. While the large number of available reviews almost ensures that all relevant parts of the entity under review are properly covered, manually reading each and every review is not feasible. Aspect-based sentiment analysis aims to solve this issue, as it is concerned with the development of algorithms that can automatically extract fine-grained sentiment information from a set of reviews, computing a separate sentiment value for the various aspects of the product or service being reviewed. This dissertation focuses on which discriminants are useful when performing aspect-based sentiment analysis. What signals for sentiment can be extracted from the text itself and what is the effect of using extra-textual discriminants? We find that using semantic lexicons or ontologies, can greatly improve the quality of aspect-based sentiment analysis, especially with limited training data. Additionally, due to semantics driving the analysis, the algorithm is less of a black box and results are easier to explain
    • …
    corecore