27 research outputs found
Literature Study on Analyzing and Designing of Algorithms
The fundamental goal of problem solution under numerous limitations such as those imposed by issue size performance and cost in terms of both space and time Designing a quick effective and efficient solution to a problem domain is the objective Certain problems are simple to resolve while others are challenging To develop a quick and effective answer much intelligence is needed A new technology is required for system design and the foundation of the new technology is the improvement of an already existing algorithm The goal of algorithm research is to create effective algorithms that improve scalability dependability and availability in addi
Recommended from our members
Adaptive AI Algorithms for Generic Hardware and Unified Hardware Acceleration Architecture
We are now in an era of the Big Bang of artificial intelligence (AI). In this wave of revolution, both industry and academia have cast numerous funds and resources. Machine learning, especially Deep Learning, has been widely deployed to replace the traditional algorithms in many domains, from the euclidean data domain to the non-euclidean domain. As the complexity and scale of the AI algorithms increase, the system host these algorithms requires more computational power and resources than before. Using the design of the modules of the video analytic platform as the use cases, we analyze the workload cost for computational resource and memory allocation during the execution of the system. The video analytic platform is a complex system that comprises various computer vision and decision-making tasks. Every module accomplishing a specific task is a stage in the pipeline of the video analytic platform. With the analyses mentioned above, we synthesize the adaptive AI algorithms from availability and variability perspectives, such as optimization with tensorization or matricization. We conceive the sparse Transformer and segmented linear Transformer as the critical components for the human action recognition task. The Constraint Satisfaction Problem is employed to assist the decision-making in the scene parsing stage. To facilitate this fulfillment of this task, we designed a hybrid model for graph learning-based SAT solver. Graph matching is employed at the final stage for the scene understanding task. We implemented a hybrid model of GNN and Transformer architecture. Finally, we design the unified hardware acceleration architecture for both dense and sparse data based on the optimizations of algorithms. Our design of the architecture targets the arithmetic operation kernels, such as matrix multiplications, with the help of data transformation and rearrangement. We first transform the inputs and weights with Winograd transform for dense convolution operations, then we feed the transformed data to the matrix multiplication accelerator. While for sparse data, we need to utilize the index to nonzero to fetch data; therefore, the indexation, scattering, and gathering are crucial components, effective implementation will dramatically improve the system's overall performance. To improve the matrix multiplication accelerator's efficiency and reduce the number of heavy arithmetic operations and the number of memory accesses, we also conduct the hardware-based recursive algorithm, i.e., Strassen's algorithm for matrix multiplication
Recommended from our members
Adaptive AI Algorithms for Generic Hardware and Unified Hardware Acceleration Architecture
We are now in an era of the Big Bang of artificial intelligence (AI). In this wave of revolution, both industry and academia have cast numerous funds and resources. Machine learning, especially Deep Learning, has been widely deployed to replace the traditional algorithms in many domains, from the euclidean data domain to the non-euclidean domain. As the complexity and scale of the AI algorithms increase, the system host these algorithms requires more computational power and resources than before. Using the design of the modules of the video analytic platform as the use cases, we analyze the workload cost for computational resource and memory allocation during the execution of the system. The video analytic platform is a complex system that comprises various computer vision and decision-making tasks. Every module accomplishing a specific task is a stage in the pipeline of the video analytic platform. With the analyses mentioned above, we synthesize the adaptive AI algorithms from availability and variability perspectives, such as optimization with tensorization or matricization. We conceive the sparse Transformer and segmented linear Transformer as the critical components for the human action recognition task. The Constraint Satisfaction Problem is employed to assist the decision-making in the scene parsing stage. To facilitate this fulfillment of this task, we designed a hybrid model for graph learning-based SAT solver. Graph matching is employed at the final stage for the scene understanding task. We implemented a hybrid model of GNN and Transformer architecture. Finally, we design the unified hardware acceleration architecture for both dense and sparse data based on the optimizations of algorithms. Our design of the architecture targets the arithmetic operation kernels, such as matrix multiplications, with the help of data transformation and rearrangement. We first transform the inputs and weights with Winograd transform for dense convolution operations, then we feed the transformed data to the matrix multiplication accelerator. While for sparse data, we need to utilize the index to nonzero to fetch data; therefore, the indexation, scattering, and gathering are crucial components, effective implementation will dramatically improve the system's overall performance. To improve the matrix multiplication accelerator's efficiency and reduce the number of heavy arithmetic operations and the number of memory accesses, we also conduct the hardware-based recursive algorithm, i.e., Strassen's algorithm for matrix multiplication
On Bilinear Techniques for Similarity Search and Boolean Matrix Multiplication
Algorithms are the art of efficient computation: it is by the power of algorithms that solving problems becomes feasible, and that we may harness the power of computing machinery. Efficient algorithms translate directly to savings in resources, such as time, storage space, and electricity, and thus money. With the end of the exponential increase in the computational power of hardware, the value of efficient algorithms may be greater than ever.
This thesis presents advancements in multiple fields of algorithms, related through the application of bilinear techniques. Functions that map elements from a pair of vector spaces to a third vector space with the property that they are linear in their arguments, or bilinear maps, are a ubiquitous and fundamental mathematical tool, the canonical example being the matrix multiplication. We address both the applications that make use of bilinear maps and the computation of the bilinear maps itself, Boolean matrix multiplication in particular.
In the field of similarity search, we improve on Valiant's randomized algorithm [FOCS 2012; J. ACM 2015] for finding correlated vectors by (i) presenting an improved sampling scheme that enables faster processing by using fast matrix multiplication, and (ii) derandomizing Valiant's algorithm. These results are mostly of theoretical nature since they rely on fast matrix multiplication.
We also present (iii) an adaptive prefix-assignment method for symmetry breaking. An instantiation of McKay's canonical extension framework [J. Algorithms 1998], the method produces a set of partial assignments with respect to a sequence of a prefix of variables in a system of constraints, such that all generated assignments are pairwise nonisomorphic. The method breaks the symmetries completely with respect to the prefix sequence, and can benefit from an auxiliary representation of symmetries in the form of a colored graph. We also provide an implementation that works as a preprocessor for Boolean satisfiability solvers, and show experimentally that the method is also of practical value and parallelizes well in a distributed computer cluster setting.We address matrix multiplication by (iv) introducing a probabilistic extension of the notions of rank and border rank, and show that, under this notion, the structural tensor for 2×2 matrix multiplication has strictly lower probabilistic tensor rank and border rank than the deterministic rank. We use this fact to derive a randomized algorithm for multiplying two Boolean matrices that is asymptotically faster than Strassen's algorithm [Numer. Math. 1969].
Finally, (v) using the recent result of Karstadt and Schwartz [SPAA 2017], we implement Strassen's multiplication over the binary field in an alternative basis for a multiple-GPU shared-memory system. We evaluate the implementation with one-tebibit input, and show that it exceeds the theoretical peak performance of the elementary algorithm in terms of bit operations, and also offers substantial savings in energy consumption.Tässä väitöskirjassa esitetään tutkimustuloksia useilta algoritmiikan eri osa-alueilta. Näitä tuloksia yhdistävät bilineaaritekniikoiden soveltaminen. Funktiot, jotka kuvaavat alkioita vektoriavaruuspareilta kolmansille vektoriavaruuksille siten, että ne ovat lineaarisia argumenttiensa suhteen, eli bilineaarikuvaukset, ovat läsnä kaikkialla ja keskeisiä matemaattisia työkaluja. Kanoninen esimerkki bilineaarikuvauksesta on matriisikertolasku. Tässä työssä käsitellään sekä sovelluksia, jotka hyödyntävät bilineaarikuvauksia että bilineaarikuvauksien itsensä laskemista, erityisesti Boolen matriisikertolaskun tapauksessa.
Samankaltaisuushaun osalta parannetaan Valiantin satunnaistettua korreloituneiden vektoreiden hakualgoritmia [FOCS 2012; J. ACM 2015] (i) esittämällä parannetun näytteistysjärjestelyn, joka mahdollistaa nopeamman käsittelyn nopean matriisikertolaskun avulla ja (ii) poistamalla Valiantin algoritmin satunnaisuuden. Nämä tulokset ovat lähinnä teoreettisia, koska ne nojaavat nopeaan matriisikertolaskuun.
Työssä esitetään myös (iii) adaptiivinen prefiksinsijoitusmenetelmä symmetrian särkemiseen. Kyseessä on McKayn kanonisen laajennuksen menetelmän [J. Algorithms 1998] sovellus, joka tuottaa joukon osittaissijoituksia rajoitejärjestelmän muuttujaprefiksisekvenssin suhteen siten, että kaikki generoidut sijoitukset ovat pareittain epäisomorfisia. Menetelmä särkee symmetriat täydellisesti prefiksisekvenssin suhteen ja pystyy hyödyntämään väritetyn graafin muodossa annetusta symmetrioiden apuesityksestä. Menetelmästä on tehty myös implementaatio, joka toimii Boolen toteutuvuusratkaisijoiden esikäsittelijänä, ja työssä osoitetaan kokeellisesti, että menetelmä on sovellettavissa käytännössä ja rinnakkaistuu hyvin hajautetussa laskentaklusterissa.
Työssä käsitellään matriisikertolaskua (iv) esittelemällä probabilistinen laajennus rankin ja border rankin käsitteille ja osoittamalla, että 2×2-matriisikertolaskutensorilla on aidosti pienempi probabilistinen rank ja border rank kuin deterministinen rank. Tämän tiedon avulla johdetaan kahden Boolen matriisin kertolaskuun satunnaistettu algoritmi, joka on asymptoottisesti nopeampi kuin Strassenin algoritmi [Numer. Math. 1969].
Lopuksi (v) hyödyntämällä Karstadtin ja Schwartzin tulosta [SPAA 2017] työssä implementoidaan Strassenin kertolasku binäärikunnan yli vaihtoehtoisessa kannassa usean GPU:n jaetun muistin järjestelmässä. Implementaation toimintaa arvioidaan yhden tebibitin syötteellä ja osoitetaan kokeellisesti, että se ylittää naiivin algoritmin teoreettisen huippusuorituskyvyn bittioperaatioiden suhteen ja tarjoaa myös merkittäviä säästöjä energiankulutuksen suhteen
A methodology for passenger-centred rail network optimisation
Optimising the allocation of limited resources, be they existing assets or
investment, is an ongoing challenge for rail network managers. Recently,
methodologies have been developed for optimising the timetable from the
passenger perspective. However, there is a gap for a decision support tool
which optimises rail networks for maximum passenger satisfaction, captures
the experience of individual passengers and can be adapted to different
networks and challenges. Towards building such a tool, this thesis develops a
novel methodology referred to as the Sheffield University Passenger Rail
Experience Maximiser (SUPREME) framework. First, a network assessment
metric is developed which captures the multi-stage nature of individual
passenger journeys as well as the effect of crowding upon passenger
satisfaction. Second, an agent-based simulation is developed to capture
individual passenger journeys in enough detail for the network assessment
metric to be calculated. Third, for the optimisation algorithm within SUPREME,
the Bayesian Optimisation method is selected following an experimental
investigation which indicates that it is well suited for ‘expensive-to-compute’
objective functions, such as the one found in SUPREME. Finally, in case studies
that include optimising the value engineering strategy of the proposed UK High
Speed Two network when saving £5 billion initial investment costs, the
SUPREME framework is found to improve network performance by the order
of 10%. This thesis shows that the SUPREME framework can find ‘good’
resource allocations for a ‘reasonable’ computational cost, and is sufficiently
adaptable for application to many rail network challenges. This indicates that a
decision support tool developed on the SUPREME framework could be widely
applied by network managers to improve passenger experience and increase
ticket revenue. Novel contributions made by this thesis are: the SUPREME
methodology, an international comparison between the Journey Time Metric
and Disutility Metric, and the application of the Bayesian Optimisation method
for maximising the performance of a rail network
An Investigation into the Performance Evaluation of Connected Vehicle Applications: From Real-World Experiment to Parallel Simulation Paradigm
A novel system was developed that provides drivers lane merge advisories, using vehicle trajectories obtained through Dedicated Short Range Communication (DSRC). It was successfully tested on a freeway using three vehicles, then targeted for further testing, via simulation. The failure of contemporary simulators to effectively model large, complex urban transportation networks then motivated further research into distributed and parallel traffic simulation. An architecture for a closed-loop, parallel simulator was devised, using a new algorithm that accounts for boundary nodes, traffic signals, intersections, road lengths, traffic density, and counts of lanes; it partitions a sample, Tennessee road network more efficiently than tools like METIS, which increase interprocess communications (IPC) overhead by partitioning more transportation corridors. The simulator uses logarithmic accumulation to synchronize parallel simulations, further reducing IPC. Analyses suggest this eliminates up to one-third of IPC overhead incurred by a linear accumulation model
Dynamic hashing technique for bandwidth reduction in image transmission
Hash functions are widely used in secure communication systems by generating the message digests for detection of unauthorized changes in the files. Encrypted hashed message or digital signature is used in many applications like authentication to ensure data integrity. It is almost impossible to ensure authentic messages when sending over large bandwidth in highly accessible network especially on insecure channels. Two issues that required to be addressed are the large size of hashed message and high bandwidth. A collaborative approach between encoded hash message and steganography provides a highly secure hidden data. The aim of the research is to propose a new method for producing a dynamic and smaller encoded hash message with reduced bandwidth. The encoded hash message is embedded into an image as a stego-image to avoid additional file and consequently the bandwidth is reduced. The receiver extracts the encoded hash and dynamic hashed message from the received file at the same time. If decoding encrypted hash by public key and hashed message from the original file matches the received file, it is considered as authentic. In enhancing the robustness of the hashed message, we compressed or encoded it or performed both operations before embedding the hashed data into the image. The proposed algorithm had achieved the lowest dynamic size (1 KB) with no fix length of the original file compared to MD5, SHA-1 and SHA-2 hash algorithms. The robustness of hashed message was tested against the substitution, replacement and collision attacks to check whether or not there is any detection of the same message in the output. The results show that the probability of the existence of the same hashed message in the output is closed to 0% compared to the MD5 and SHA algorithms. Amongst the benefits of this proposed algorithm is computational efficiency, and for messages with the sizes less than 1600 bytes, the hashed file reduced the original file up to 8.51%
Non-Standard Statistical Inference Under Short and Long Range Dependence.
The work discusses different non-standard problems under
different types of short and long range dependence.
In the first part we introduce new point-wise confidence interval estimates for monotone functions observed with additive and dependent noise. Existence of such monotone trend is quite common in time series data. We study both short- and long-range dependence regimes for the errors. The interval estimates are obtained via the method of inversion of certain discrepancy statistics. This approach avoids the estimation of nuisance parameters such as the derivative of the unknown function, which other methods are forced to deal with. The resulting estimates are therefore more accurate, stable, and widely applicable in practice under mild assumptions on the trend and error structure. While motivated by earlier work in the independent context, the dependence of the errors, especially long-range dependence leads to new phenomena and new universal limits based on convex minorant functionals of drifted fractional Brownian motion.
In the second part we investigate the problem of M-estimation, the technique of extracting a parameter estimate by minimizing a loss function is used in almost every statistical problems. We focus on the general theory of such estimators in the presence of dependence in data, a very common feature in time series or econometric applications. Unlike the case of independent and identically distributed observations, there is a lack of an overarching asymptotic theory for M-estimation under dependence. In order to develop a general theory, we have proved a new triangular version of functional central limit theorem for dependent observations, which is useful for broader applications beyond our current paper. We use this general CLT along with standard empirical process techniques to provide the rate and asymptotic distribution of minimizer of a general empirical process. We have used our theory to make inferences for many important problems like change point problems, excess-mass-baseline-inverse problem, different regression settings including maximum score estimator, least absolute deviation regression and censored regression among others.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113564/1/pramita_1.pd