207 research outputs found

    STATS - A Point Access Method for Multidimensional Clusters.

    Get PDF
    The ubiquity of high-dimensional data in machine learning and data mining applications makes its efficient indexing and retrieval from main memory crucial. Frequently, these machine learning algorithms need to query specific characteristics of single multidimensional points. For example, given a clustered dataset, the cluster membership (CM) query retrieves the cluster to which an object belongs. To efficiently answer this type of query we have developed STATS, a novel main-memory index which scales to answer CM queries on increasingly big datasets. Current indexing methods are oblivious to the structure of clusters in the data, and we thus, develop STATS around the key insight that exploiting the cluster information when indexing and preserving it in the index will accelerate look up. We show experimentally that STATS outperforms known methods in regards to retrieval time and scales well with dataset size for any number of dimensions

    First passage time exponent for higher-order random walks:Using Levy flights

    Full text link
    We present a heuristic derivation of the first passage time exponent for the integral of a random walk [Y. G. Sinai, Theor. Math. Phys. {\bf 90}, 219 (1992)]. Building on this derivation, we construct an estimation scheme to understand the first passage time exponent for the integral of the integral of a random walk, which is numerically observed to be 0.220±0.0010.220\pm0.001. We discuss the implications of this estimation scheme for the nthn{\rm th} integral of a random walk. For completeness, we also address the n=∞n=\infty case. Finally, we explore an application of these processes to an extended, elastic object being pulled through a random potential by a uniform applied force. In so doing, we demonstrate a time reparameterization freedom in the Langevin equation that maps nonlinear stochastic processes into linear ones.Comment: 4 figures, submitted to PR

    Modifying the Symbolic Aggregate Approximation Method to Capture Segment Trend Information

    Full text link
    The Symbolic Aggregate approXimation (SAX) is a very popular symbolic dimensionality reduction technique of time series data, as it has several advantages over other dimensionality reduction techniques. One of its major advantages is its efficiency, as it uses precomputed distances. The other main advantage is that in SAX the distance measure defined on the reduced space lower bounds the distance measure defined on the original space. This enables SAX to return exact results in query-by-content tasks. Yet SAX has an inherent drawback, which is its inability to capture segment trend information. Several researchers have attempted to enhance SAX by proposing modifications to include trend information. However, this comes at the expense of giving up on one or more of the advantages of SAX. In this paper we investigate three modifications of SAX to add trend capturing ability to it. These modifications retain the same features of SAX in terms of simplicity, efficiency, as well as the exact results it returns. They are simple procedures based on a different segmentation of the time series than that used in classic-SAX. We test the performance of these three modifications on 45 time series datasets of different sizes, dimensions, and nature, on a classification task and we compare it to that of classic-SAX. The results we obtained show that one of these modifications manages to outperform classic-SAX and that another one slightly gives better results than classic-SAX.Comment: International Conference on Modeling Decisions for Artificial Intelligence - MDAI 2020: Modeling Decisions for Artificial Intelligence pp 230-23

    Models of plastic depinning of driven disordered systems

    Full text link
    Two classes of models of driven disordered systems that exhibit history-dependent dynamics are discussed. The first class incorporates local inertia in the dynamics via nonmonotonic stress transfer between adjacent degrees of freedom. The second class allows for proliferation of topological defects due to the interplay of strong disorder and drive. In mean field theory both models exhibit a tricritical point as a function of disorder strength. At weak disorder depinning is continuous and the sliding state is unique. At strong disorder depinning is discontinuous and hysteretic.Comment: 3 figures, invited talk at StatPhys 2

    Addressing robustness in time-critical, distributed, task allocation algorithms.

    Get PDF
    The aim of this work is to produce and test a robustness module (ROB-M) that can be generally applied to distributed, multi-agent task allocation algorithms, as robust versions of these are scarce and not well-documented in the literature. ROB-M is developed using the Performance Impact (PI) algorithm, as this has previously shown good results in deterministic trials. Different candidate versions of the module are thus bolted on to the PI algorithm and tested using two different task allocation problems under simulated uncertain conditions, and results are compared with baseline PI. It is shown that the baseline does not handle uncertainty well; the task-allocation success rate tends to decrease linearly as degree of uncertainty increases. However, when PI is run with one of the candidate robustness modules, the failure rate becomes very low for both problems, even under high simulated uncertainty, and so its architecture is adopted for ROB-M and also applied to MIT’s baseline Consensus Based Bundle Algorithm (CBBA) to demonstrate its flexibility. Strong evidence is provided to show that ROB-M can work effectively with CBBA to improve performance under simulated uncertain conditions, as long as the deterministic versions of the problems can be solved with baseline CBBA. Furthermore, the use of ROB-M does not appear to increase mean task completion time in either algorithm, and only 100 Monte Carlo samples are required compared to 10,000 in MIT’s robust version of the CBBA algorithm. PI with ROB-M is also tested directly against MIT’s robust algorithm and demonstrates clear superiority in terms of mean numbers of solved tasks.N/

    Pharmacology of modulators of alternative splicing

    Get PDF
    More than 95% of genes in the human genome are alternatively spliced to form multiple transcripts, often encoding proteins with differing or opposing function. The control of alternative splicing is now being elucidated, and with this comes the opportunity to develop modulators of alternative splicing that can control cellular function. A number of approaches have been taken to develop compounds that can experimentally, and sometimes clinically, affect splicing control resulting in potential novel therapeutics. Here we develop the concepts that targeting alternative splicing can result in relatively specific pathway inhibitors/activators that result in dampening down of physiological or pathological processes, from changes in muscle physiology, to altering angiogenesis or pain. The targets and pharmacology of some of the current inhibitors/activators of alternative splicing are demonstrated and future directions discussed

    Depinning and plasticity of driven disordered lattices

    Full text link
    We review in these notes the dynamics of extended condensed matter systesm, such as vortex lattices in type-II superconductors and charge density waves in anisotropic metals, driven over quenched disorder. We focus in particular on the case of strong disorder, where topological defects are generated in the driven lattice. In this case the repsonse is plastic and the depinning transition may become discontinuous and hysteretic.Comment: 21 pages, 6 figures. Proceedings the XIX Sitges Conference on Jamming, Yielding, and Irreversible Deformations in Condensed Matter, Sitges, Barcelona, Spain, June 14-18, 200

    Computational complexity analysis of decision tree algorithms

    Get PDF
    YesDecision tree is a simple but powerful learning technique that is considered as one of the famous learning algorithms that have been successfully used in practice for various classification tasks. They have the advantage of producing a comprehensible classification model with satisfactory accuracy levels in several application domains. In recent years, the volume of data available for learning is dramatically increasing. As a result, many application domains are faced with a large amount of data thereby posing a major bottleneck on the computability of learning techniques. There are different implementations of the decision tree using different techniques. In this paper, we theoretically and experimentally study and compare the computational power of the most common classical top-down decision tree algorithms (C4.5 and CART). This work can serve as part of review work to analyse the computational complexity of the existing decision tree classifier algorithm to gain understanding of the operational steps with the aim of optimizing the learning algorithm for large datasets
    • …
    corecore