1,407 research outputs found

    Fast Bayesian Optimization of Needle-in-a-Haystack Problems using Zooming Memory-Based Initialization (ZoMBI)

    Full text link
    Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only 0.82%0.82\% out of 146146k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence to a global optimum or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI. ZoMBI actively extracts knowledge from the previously best-performing evaluated experiments to iteratively zoom in the sampling search bounds towards the global optimum "needle" and then prunes the memory of low-performing historical experiments to accelerate compute times by reducing the algorithm time complexity from O(n3)O(n^3) to O(Ď•3)O(\phi^3) for Ď•\phi forward experiments per activation, which trends to a constant O(1)O(1) over several activations. Additionally, ZoMBI implements two custom adaptive acquisition functions to further guide the sampling of new experiments toward the global optimum. We validate the algorithm's optimization performance on three real-world datasets exhibiting Needle-in-a-Haystack and further stress-test the algorithm's performance on an additional 174 analytical datasets. The ZoMBI algorithm demonstrates compute time speed-ups of 400x compared to traditional Bayesian optimization as well as efficiently discovering optima in under 100 experiments that are up to 3x more highly optimized than those discovered by similar methods MiP-EGO, TuRBO, and HEBO.Comment: Paper 16 pages; SI 6 page

    Hybrid-search and storage of semi-structured information

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 113-118).Given today's tangle of digital information, one of the hardest tasks for computer users of information systems is finding anything in the mess. For a number of well documented reasons including the amazing growth in the Internet's popularity and the drop in the cost of storage, the amount of information on the net as well as on a user's local computer, has increased dramatically in recent years. Although this readily available information should be extremely beneficial for computer users, paradoxically it is now much harder to find anything. Many different solutions have been proposed to the general information seeking task of users, but few if any have addressed the needs of individuals or have leveraged the benefit of single-user interaction. The Haystack project is an attempt to answer the needs of the individual user. Once the user's information is represented in Haystack, the types of questions users may ask are highly varied. In this thesis we will propose a means of representing information in a robust framework within Haystack. Once the information is represented we describe a mechanism by which the diverse questions of the individual can be answered. This novel method functions by using a combination of existing information systems. We will call this combined system a hybrid-search system.by Eytan Adar.M.Eng

    Vernaculars Cross-Dressed as Universals: Globalization as North Atlantic Hegemony

    Get PDF

    stringi: Fast and Portable Character String Processing in R

    Get PDF
    Effective processing of character strings is required at various stages of data analysis pipelines: from data cleansing and preparation, through information extraction, to report generation. Pattern searching, string collation and sorting, normalization, transliteration, and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package for fast and portable handling of string data based on ICU (International Components for Unicode), should be included in each statistician's or data scientist's repertoire to complement their numerical computing and data wrangling skills

    The Laws of Documentation - Engineering Document Control for Telecommunication Systems

    Get PDF
    The purpose of this field project is to create a documentation system that allows co-workers at a telecommunication company to store files on a shared LAN and retrieve that information quickly, easily and confidently when needed. A dictionary provides a good illustration of the laws of documentation that must be applied to a working documentation system. A solution for storing project-specific, site-specific, product-specific information as well as inter-company documents such as request for proposals, quotes and scopes of work is developed

    The Jenny Interviews and Other Sightings: Needle(s) in the Proverbial Haystack(s)

    Get PDF
    Article originally published in the Pitcairn Log in July 2021On April 28, 1789, acting Lieutenant Fletcher Christian disposed “Captain” William Bligh and 18 crew from the HMAV Bounty just of Tofua, South Pacific Ocean. Bligh’s successful open-boat journey to Timor ranks amongst the greatest survival stories in naval history. Christian’s returned to Tahiti, failed settlement at Tubuai, and eventual “rediscovery” of Pitcairn Island are well known among Bounty enthusiasts. Hundreds, if not thousands, of books and articles have been written on the Bounty/Pitcairn Island Saga over the last 230 years including those written by naval officers, early visitors, descendants (Rosalind Amelia Young, Glynn Christian), journalists, and scholars from most notably history, but also those with credentials in anthropology, sociology, geography, and even psychology. Prior to Henry Evans Maude’s (1958) article published in The Journal of the Polynesian Society (volume 11, 1964) titled “In search of a home: From the mutiny to Pitcairn Island (1789-1790),” the Bounty’s post-mutiny peregrinations from its return to Matavia Bay, Tahiti, on 6 June 1789 and the "rediscovery” of Pitcairn Island on January 15, 1790, were sketchy at best. Maude, a former colonial administrator and subsequent research fellow at the Australian National University, located two “lost” newspaper articles pertaining to the Bounty and Pitcairn Island. These articles contained interviews with Teehuteatuaona (aka Jenny), the consort initially of mutineers Alexander Smith (John Adams) then Isaac Martin. In these interviews Jenny provided geographic references and clues that elucidated the Bounty’s path post mutiny. Jenny’s accounts also illuminated life on Pitcairn Island, especially the violence that occurred during its first ten years

    Significance Relations for the Benchmarking of Meta-Heuristic Algorithms

    Full text link
    The experimental analysis of meta-heuristic algorithm performance is usually based on comparing average performance metric values over a set of algorithm instances. When algorithms getting tight in performance gains, the additional consideration of significance of a metric improvement comes into play. However, from this moment the comparison changes from an absolute to a relative mode. Here the implications of this paradigm shift are investigated. Significance relations are formally established. Based on this, a trade-off between increasing cycle-freeness of the relation and small maximum sets can be identified, allowing for the selection of a proper significance level and resulting ranking of a set of algorithms. The procedure is exemplified on the CEC'05 benchmark of real parameter single objective optimization problems. The significance relation here is based on awarding ranking points for relative performance gains, similar to the Borda count voting method or the Wilcoxon signed rank test. In the particular CEC'05 case, five ranks for algorithm performance can be clearly identified.Comment: 6 pages, 2 figures, 1 tabl
    • …
    corecore