9,546 research outputs found

    Online Optimization in Dynamic Environments: A Regret Analysis for Sparse Problems

    Get PDF
    Time-varying systems are a challenge in many scientific and engineering areas. Usually, estimation of time-varying parameters or signals must be performed online, which calls for the development of responsive online algorithms. In this paper, we consider this problem in the context of the sparse optimization; specifically, we consider the Elastic-net model. Following the rationale in [1], we propose a novel online algorithm and we theoretically prove that it is successful in terms of dynamic regret. We then show an application to recursive identification of time-varying autoregressive models, in the case when the number of parameters to be estimated is unknown. Numerical results show the practical efficiency of the proposed method

    Multiple Media Correlation: Theory and Applications

    Get PDF
    This thesis introduces multiple media correlation, a new technology for the automatic alignment of multiple media objects such as text, audio, and video. This research began with the question: what can be learned when multiple multimedia components are analyzed simultaneously? Most ongoing research in computational multimedia has focused on queries, indexing, and retrieval within a single media type. Video is compressed and searched independently of audio, text is indexed without regard to temporal relationships it may have to other media data. Multiple media correlation provides a framework for locating and exploiting correlations between multiple, potentially heterogeneous, media streams. The goal is computed synchronization, the determination of temporal and spatial alignments that optimize a correlation function and indicate commonality and synchronization between media objects. The model also provides a basis for comparison of media in unrelated domains. There are many real-world applications for this technology, including speaker localization, musical score alignment, and degraded media realignment. Two applications, text-to-speech alignment and parallel text alignment, are described in detail with experimental validation. Text-to-speech alignment computes the alignment between a textual transcript and speech-based audio. The presented solutions are effective for a wide variety of content and are useful not only for retrieval of content, but in support of automatic captioning of movies and video. Parallel text alignment provides a tool for the comparison of alternative translations of the same document that is particularly useful to the classics scholar interested in comparing translation techniques or styles. The results presented in this thesis include (a) new media models more useful in analysis applications, (b) a theoretical model for multiple media correlation, (c) two practical application solutions that have wide-spread applicability, and (d) Xtrieve, a multimedia database retrieval system that demonstrates this new technology and demonstrates application of multiple media correlation to information retrieval. This thesis demonstrates that computed alignment of media objects is practical and can provide immediate solutions to many information retrieval and content presentation problems. It also introduces a new area for research in media data analysis

    Approximate Data Analytics Systems

    Get PDF
    Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications

    An experimental investigation of meniscus roll coating

    Get PDF
    A two-roll apparatus is used to explore experimentally the detailed fluid mechanics of meniscus roll coating in which inlets are starved and flow rates are small. Both forward and reverse modes of operation (with contra- and co-rotating rolls) are investigated using optical sectioning combined with dye injection and particle imaging techniques. That part of parameter space where meniscus coating occurs is identified by varying the roll separation and roll speeds and hence flow rate and capillary number. Key features of the flow structures identified in the forward mode include two large eddies (each with saddle point, separatrix and sub-eddies), a primary fluid transfer jet and the existence of two critical flow rates associated with the switching-on of a second fluid transfer jet and the switching-off of the primary transfer jet followed by a change in the flow structure. In the reverse mode, the key features are a single large eddy consisting of two sub-eddies, a saddle point and separatrix, a primary fluid transfer jet and once again two critical flow rates. These correspond to (i) the switching-on of a secondary transfer jet and (ii) the disappearance of a saddle point at the nip resulting in the merger of the primary and secondary transfer jets. Measurements of film thickness and meniscus location made over a range of speed ratios and capillary numbers are compared with theoretical predictions. A plate-roll apparatus is used to confirm the presence, for very small flow rates, of a sub-ambient, almost linear, pressure profile across the bead. Investigated also is the transition from inlet-starved to fully flooded roll coating as flow rate is increased and the changes in flow structure and pressure profile are observed

    Towards safer mining: the role of modelling software to find missing persons after a mine collapse

    Get PDF
    Purpose. The purpose of the study is to apply science and technology to determine the most likely location of a container in which three miners were trapped after the Lily mine disaster. Following the collapse of the Crown Pillar at Lily Mine in South Africa on the 5th of February 2016, there was a national outcry to find the three miners who were trapped in a surface container lamp room that disappeared in the sinkhole that formed during the surface col-lapse. Methods. At a visit to Lily Mine on the 9th of March, the Witwatersrand Mining Institute suggested a two-way strategy going forward to find the container in which the miners are trapped and buried. The first approach, which is the subject of this paper, is to test temporal 3D modeling software technology to locate the container, and second, to use scientific measurement and testing technologies. The overall methodology used was to first, request academia and research entities within the University to supply the WMI with ideas, which ideas list was compiled as responses came in. These were scrutinized and literature gathered for a conceptual study on which these ideas are likely to work. The software screening and preliminary testing of such software are discussed in this article. Findings. The findings are that software modeling is likely to locate the present position of the container, but accurate data and a combination of different advanced software packages will be required, but at tremendous cost. Originality. This paper presents original work on how software technology can be used to locate missing miners. Practical implications. The two approaches were not likely to recover the miners alive because of the considerable time interval, but will alert the rescue team and mine workers when they come in close proximity to them.Мета. Визначення можливого місця локалізації лампового приміщення контейнера, в якому опинилися три шахтаря після аварії на шахті Лілі (Барбертон, Мпумаланга) методом комп’ютерного моделювання. Після обвалення стельового цілика на шахті Лілі 5 лютого 2016 року почалася національна кампанія з порятунку трьох шахтарів, які залишилися у ламповому приміщенні поверхневого транспортного контейнера, що провалився в утворену після вибуху воронку. Методика. Співробітниками Гірничого Інституту (Уітуотерс) запропонована двостадійна стратегія пошуку контейнера, в якому існує ймовірність знаходження шахтарів. В рамках першого підходу (який розглядається у даній статті) для виявлення контейнера здійснювалось випробування комп’ютерної технології 3D-моделювання в часі. Другий підхід передбачав технологію проведення наукового вимірювання та експерименту. В цілому, методологія включала, насамперед, підключення викладацького та наукового складу університету до вирішення проблеми шляхом комплексної генерації ідей, які були об’єднані в загальний список, вивчені із залученням відповідних літературних джерел, і найбільш реалістичні ідеї були виділені із загального переліку. Дана стаття розглядає результати комп’ютерної експертизи цих ідей та перевірки надійності відповідного програмного забезпечення. Результати. Для зручності моделювання процес обвалення був розділений на три окремі фази: руйнування воронки, руйнування західного схилу та небезпека ковзання на південних схилах. Ідентифіковано програмні технології, які можуть імітувати рух контейнера у перших двох фазах обвалення. В результаті моделювання у програмному забезпеченні ParaView виявлено місце розташування даного контейнера. Виконано аналіз південного схилу за допомогою ArcGIS і складені карти небезпеки схилу для району, а також підземні карти порятунку з маршрутами евакуації. Встановлено, що комп’ютерне моделювання може визначити місцезнаходження контейнера, але для цього потрібні точні вихідні дані й комплекс дорогих високоефективних програмних пакетів. Наукова новизна. Вперше застосовано комплекс комп’ютерних технологій та програмного забезпечення для пошуку зниклих шахтарів після аварійних ситуацій у підземному просторі шахт. Практична значимість. При застосуванні двостадійної стратегії пошуку шахтарів, що опинилися під завалом порід, команда рятувальників отримає сигнал про наближення до їх місцезнаходження.Цель. Определение возможного места локализации лампового помещения контейнера, в котором оказались три шахтера после аварии на шахте Лили (Барбертон, Мпумаланга) методом компьютерного моделирования. После обрушения потолочного целика на шахте Лили 5 февраля 2016 года началась национальная кампания по спасению трех шахтеров, оставшихся в ламповом помещении поверхностного транспортного контейнера, который провалился в воронку, образовавшуюся после взрыва. Методика. Сотрудниками Горного Института (Уитуотерс) предложена двухстадийная стратегия поиска контейнера, в котором существует вероятность нахождения шахтеров. В рамках первого подхода (который рассматривается в данной статье) для обнаружения контейнера производилось испытание компьютерной технологии 3D-моделирования во времени. Второй подход предполагал технологию проведения научного измерения и эксперимента. В целом, методология включала, прежде всего, подключение преподавательского и научного состава университета к решению проблемы путем комплексной генерации идей, которые были объединены в общий список, изучены с привлечением соответствующих литературных источников, и наиболее реалистичные идеи были выделены из общего списка. Настоящая статья рассматривает результаты компьютерной экспертизы данных идей и проверки надежности соответствующего программного обеспечения. Результаты. Для удобства моделирования процесс обрушения был разделен на три отдельные фазы: разрушение воронки, разрушение западного склона и опасность скольжения на южных склонах. Идентифицированы программные технологии, которые могут имитировать движение контейнера в первых двух фазах обрушения. В результате моделирования в программном обеспечении ParaView выявлено местоположение данного контейнера. Выполнен анализа южного склона с помощью ArcGIS и составлены карты опасности склона для района, а также подземные карты спасения с маршрутами эвакуации. Установлено, что компьютерное моделирование может определить местонахождение контейнера, но для этого нужны точные исходные данные и комплекс дорогостоящих высокоэффективных программных пакетов. Научная новизна. Впервые применен комплекс компьютерных технологий и программного обеспечения для поиска пропавших шахтеров после аварийных ситуаций в подземном пространстве шахт. Практическая значимость. При применении двухстадийной стратегии поиска шахтеров, оказавшихся под завалом пород, команда горноспасателей получит сигнал о приближении к их местонахождению.The results of the article were obtained without the support of any of the projects or funding

    A PROCRUSTEAN APPROACH TO STREAM PROCESSING

    Get PDF
    The increasing demand for real-time data processing and the constantly growing data volume have contributed to the rapid evolution of Stream Processing Engines (SPEs), which are designed to continuously process data as it arrives. Low operational cost and timely delivery of results are both objectives of paramount importance for SPEs. Given the volatile and uncharted nature of data streams, achieving the aforementioned goals under fixed resources is a challenge. This calls for adaptable SPEs, which can react to fluctuations in processing demands. In the past, three techniques have been developed for improving an SPE’s ability to adapt. Those techniques are classified based on applications’ requirements on exact or approximate results: stream partitioning, and re-partitioning target exact, and load shedding targets approximate processing. Stream partitioning strives to balance load among processors, and previous techniques neglected hidden costs of distributed execution. Load Shedding lowers the accuracy of results by dropping part of the input, and previous techniques did not cope with evolving streams. Stream re-partitioning is used to reconfigure execution while processing takes place, and previous techniques did not fully utilize window semantics. In this dissertation, we put stream processing in a procrustean bed, in terms of the manner and the degree that processing takes place. To this end, we present new approaches, for window-based aggregate operators, which are applicable to both exact and approximate stream processing in modern SPEs. Our stream partitioning, re-partitioning, and load shedding solutions offer improvements in performance and accuracy on real-world data by exploiting the semantics of both data and operations. In addition, we present SPEAr, the design of an SPE that accelerates processing by delivering approximate results with accuracy guarantees and avoiding unnecessary load. Finally, we contribute a hybrid technique, ShedPart, which can further improve load balance and performance of an SPE

    Entropy-based parametric estimation of spike train statistics

    Full text link
    We consider the evolution of a network of neurons, focusing on the asymptotic behavior of spikes dynamics instead of membrane potential dynamics. The spike response is not sought as a deterministic response in this context, but as a conditional probability : "Reading out the code" consists of inferring such a probability. This probability is computed from empirical raster plots, by using the framework of thermodynamic formalism in ergodic theory. This gives us a parametric statistical model where the probability has the form of a Gibbs distribution. In this respect, this approach generalizes the seminal and profound work of Schneidman and collaborators. A minimal presentation of the formalism is reviewed here, while a general algorithmic estimation method is proposed yielding fast convergent implementations. It is also made explicit how several spike observables (entropy, rate, synchronizations, correlations) are given in closed-form from the parametric estimation. This paradigm does not only allow us to estimate the spike statistics, given a design choice, but also to compare different models, thus answering comparative questions about the neural code such as : "are correlations (or time synchrony or a given set of spike patterns, ..) significant with respect to rate coding only ?" A numerical validation of the method is proposed and the perspectives regarding spike-train code analysis are also discussed.Comment: 37 pages, 8 figures, submitte

    Outlier Detection In Big Data

    Get PDF
    The dissertation focuses on scaling outlier detection to work both on huge static as well as on dynamic streaming datasets. Outliers are patterns in the data that do not conform to the expected behavior. Outlier detection techniques are broadly applied in applications ranging from credit fraud prevention, network intrusion detection to stock investment tactical planning. For such mission critical applications, a timely response often is of paramount importance. Yet the processing of outlier detection requests is of high algorithmic complexity and resource consuming. In this dissertation we investigate the challenges of detecting outliers in big data -- in particular caused by the high velocity of streaming data, the big volume of static data and the large cardinality of the input parameter space for tuning outlier mining algorithms. Effective optimization techniques are proposed to assure the responsiveness of outlier detection in big data. In this dissertation we first propose a novel optimization framework called LEAP to continuously detect outliers over data streams. The continuous discovery of outliers is critical for a large range of online applications that monitor high volume continuously evolving streaming data. LEAP encompasses two general optimization principles that utilize the rarity of the outliers and the temporal priority relationships among stream data points. Leveraging these two principles LEAP not only is able to continuously deliver outliers with respect to a set of popular outlier models, but also provides near real-time support for processing powerful outlier analytics workloads composed of large numbers of outlier mining requests with various parameter settings. Second, we develop a distributed approach to efficiently detect outliers over massive-scale static data sets. In this big data era, as the volume of the data advances to new levels, the power of distributed compute clusters must be employed to detect outliers in a short turnaround time. In this research, our approach optimizes key factors determining the efficiency of distributed data analytics, namely, communication costs and load balancing. In particular we prove the traditional frequency-based load balancing assumption is not effective. We thus design a novel cost-driven data partitioning strategy that achieves load balancing. Furthermore, we abandon the traditional one detection algorithm for all compute nodes approach and instead propose a novel multi-tactic methodology which adaptively selects the most appropriate algorithm for each node based on the characteristics of the data partition assigned to it. Third, traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the most appropriate parameter setting or on the desired results. We thus design an interactive outlier exploration paradigm that is not only able to answer traditional outlier detection requests in near real-time, but also offers innovative outlier analytics tools to assist analysts to quickly extract, interpret and understand the outliers of interest. Our experimental studies including performance evaluation and user studies conducted on real world datasets including stock, sensor, moving object, and Geolocation datasets confirm both the effectiveness and efficiency of the proposed approaches
    corecore