391 research outputs found

    Handling Imbalanced Data through Re-sampling: Systematic Review

    Get PDF
    Handling imbalanced data is an important issue that can affect the validity and reliability of the results. One common approach to addressing this issue is through re-sampling the data. Re-sampling is a technique that allows researchers to balance the class distribution of their dataset by either over-sampling the minority class or under-sampling the majority class. Over-sampling involves adding more copies of the minority class examples to the dataset in order to balance out the class distribution. On the other hand, under-sampling involves removing some of the majority class examples from the dataset in order to balance out the class distribution. It's also common to combine both techniques, usually called hybrid sampling. It is important to note that re-sampling techniques can have an impact on the model's performance, and it is essential to evaluate the model using different evaluation metrics and to consider other techniques such as cost-sensitive learning and anomaly detection. In addition, it is important to keep in mind that increasing the sample size is always a good idea to improve the performance of the model. In this systematic review, we aim to provide an overview of existing methods for re-sampling imbalanced data. We will focus on methods that have been proposed in the literature and evaluate their effectiveness through a thorough examination of experimental results. The goal of this review is to provide practitioners with a comprehensive understanding of the different re-sampling methods available, as well as their strengths and weaknesses, to help them make informed decisions when dealing with imbalanced data

    Fuzzy Set Methods for Object Recognition in Space Applications

    Get PDF
    Progress on the following four tasks is described: (1) fuzzy set based decision methodologies; (2) membership calculation; (3) clustering methods (including derivation of pose estimation parameters), and (4) acquisition of images and testing of algorithms

    Digital Library Services for Three-Dimensional Models

    Get PDF
    With the growth in computing, storage and networking infrastructure, it is becoming increasingly feasible for multimedia professionals—such as graphic designers in commercial, manufacturing, scientific and entertainment areas—to work with 3D digital models of the objects with which they deal in their domain. Unfortunately most of these models exist in individual repositories, and are not accessible to geographically distributed professionals who are in need of them. Building an efficient digital library system presents a number of challenges. In particular, the following issues need to be addressed: (1) What is the best way of representing 3D models in a digital library, so that the searches can be done faster? (2) How to compress and deliver the 3D models to reduce the storage and bandwidth requirements? (3) How can we represent the user\u27s view on similarity between two objects? (4) What search types can be used to enhance the usability of the digital library and how can we implement these searches, what are the trade-offs? In this research, we have developed a digital library architecture for 3D models that addresses the above issues as well as other technical issues. We have developed a prototype for our 3D digital library (3DLIB) that supports compressed storage, along with retrieval of 3D models. The prototype also supports search and discovery services that are targeted for 3-D models. The key to 3DLIB is a representation of a 3D model that is based on “surface signatures”. This representation captures the shape information of any free-form surface and encodes it into a set of 2D images. We have developed a shape similarity search technique that uses the signature images to compare 3D models. One advantage of the proposed technique is that it works in the compressed domain, thus it eliminates the need for uncompressing in content-based search. Moreover, we have developed an efficient discovery service consisting of a multi-level hierarchical browsing service that enables users to navigate large sets of 3D models. To implement this targeted browsing (find an object that is similar to a given object in a large collection through browsing) we abstract a large set of 3D models to a small set of representative models (key models). The abstraction is based on shape similarity and uses specially tailored clustering techniques. The browsing service applies clustering recursively to limit the number of key models that a user views at any time. We have evaluated the performance of our digital library services using the Princeton Shape Benchmark (PSB) and it shows significantly better precision and recall, as compared to other approaches

    Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science

    Full text link
    The purpose of the New York Workshop on Computer, Earth and Space Sciences is to bring together the New York area's finest Astronomers, Statisticians, Computer Scientists, Space and Earth Scientists to explore potential synergies between their respective fields. The 2011 edition (CESS2011) was a great success, and we would like to thank all of the presenters and participants for attending. This year was also special as it included authors from the upcoming book titled "Advances in Machine Learning and Data Mining for Astronomy". Over two days, the latest advanced techniques used to analyze the vast amounts of information now available for the understanding of our universe and our planet were presented. These proceedings attempt to provide a small window into what the current state of research is in this vast interdisciplinary field and we'd like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011 in New York City, Goddard Institute for Space Studie

    Methods for fast and reliable clustering

    Get PDF

    Evidential Evolving Gustafson-Kessel Algorithm For Online Data Streams Partitioning Using Belief Function Theory.

    Get PDF
    International audienceA new online clustering method called E2GK (Evidential Evolving Gustafson-Kessel) is introduced. This partitional clustering algorithm is based on the concept of credal partition defined in the theoretical framework of belief functions. A credal partition is derived online by applying an algorithm resulting from the adaptation of the Evolving Gustafson-Kessel (EGK) algorithm. Online partitioning of data streams is then possible with a meaningful interpretation of the data structure. A comparative study with the original online procedure shows that E2GK outperforms EGK on different entry data sets. To show the performance of E2GK, several experiments have been conducted on synthetic data sets as well as on data collected from a real application problem. A study of parameters' sensitivity is also carried out and solutions are proposed to limit complexity issues

    Data clustering for circle detection

    Get PDF
    This paper considers a multiple-circle detection problem on the basis of given data. The problem is solved by application of the center-based clustering method. For the purpose of searching for a locally optimal partition modeled on the well-known k-means algorithm, the k-closest circles algorithm has been constructed. The method has been illustrated by several numerical examples

    Data clustering for circle detection

    Get PDF
    This paper considers a multiple-circle detection problem on the basis of given data. The problem is solved by application of the center-based clustering method. For the purpose of searching for a locally optimal partition modeled on the well-known k-means algorithm, the k-closest circles algorithm has been constructed. The method has been illustrated by several numerical examples

    Data clustering for circle detection

    Full text link

    Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review

    Get PDF
    Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory
    • …
    corecore