Search CORE

42 research outputs found

Exploring Adaptive Implementation of On-Chip Networks

Author: Daneshtalab Masoud
Publication venue: Annales Universitatis Turkuensis A I 429
Publication date: 25/11/2011
Field of study

As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast

UTUPub

Contrastive Learning for Lane Detection via Cross-Similarity

Author: Abadijou Sadegh
Alibeigi Mina
Daneshtalab Masoud
Zoljodi Ali
Publication venue
Publication date: 21/08/2023
Field of study

Detecting road lanes is challenging due to intricate markings vulnerable to unfavorable conditions. Lane markings have strong shape priors, but their visibility is easily compromised. Factors like lighting, weather, vehicles, pedestrians, and aging colors challenge the detection. A large amount of data is required to train a lane detection approach that can withstand natural variations caused by low visibility. This is because there are numerous lane shapes and natural variations that exist. Our solution, Contrastive Learning for Lane Detection via cross-similarity (CLLD), is a self-supervised learning method that tackles this challenge by enhancing lane detection models resilience to real-world conditions that cause lane low visibility. CLLD is a novel multitask contrastive learning that trains lane detection approaches to detect lane markings even in low visible situations by integrating local feature contrastive learning (CL) with our new proposed operation cross-similarity. Local feature CL focuses on extracting features for small image parts, which is necessary to localize lane segments, while cross-similarity captures global features to detect obscured lane segments using their surrounding. We enhance cross-similarity by randomly masking parts of input images for augmentation. Evaluated on benchmark datasets, CLLD outperforms state-of-the-art contrastive learning, especially in visibility-impairing conditions like shadows. Compared to supervised learning, CLLD excels in scenarios like shadows and crowded scenes.Comment: 10 page

arXiv.org e-Print Archive

PR-DARTS: Pruning-Based Differentiable Architecture Search

Author: Alibeigi Mina
Daneshtalab Masoud
Loni Mohammad
Mousavi Hamid
Publication venue
Publication date: 10/10/2022
Field of study

The deployment of Convolutional Neural Networks (CNNs) on edge devices is hindered by the substantial gap between performance requirements and available processing power. While recent research has made large strides in developing network pruning methods for reducing the computing overhead of CNNs, there remains considerable accuracy loss, especially at high pruning ratios. Questioning that the architectures designed for non-pruned networks might not be effective for pruned networks, we propose to search architectures for pruning methods by defining a new search space and a novel search objective. To improve the generalization of the pruned networks, we propose two novel PrunedConv and PrunedLinear operations. Specifically, these operations mitigate the problem of unstable gradients by regularizing the objective function of the pruned networks. The proposed search objective enables us to train architecture parameters regarding the pruned weight elements. Quantitative analyses demonstrate that our searched architectures outperform those used in the state-of-the-art pruning networks on CIFAR-10 and ImageNet. In terms of hardware effectiveness, PR-DARTS increases MobileNet-v2's accuracy from 73.44% to 81.35% (+7.91% improvement) and runs 3.87

\times

faster.Comment: 18 pages with 11 figure

arXiv.org e-Print Archive

Preface from general co-chairs

Author: Aldinucci Marco
Daneshtalab Masoud
Lepp&#228
Lilius Johan
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

Institutional Research Information System University of Turin

Improving motion safety and efficiency of intelligent autonomous swarm of drones

Author: Daneshtalab Masoud
Loni Mohammad
Majd Amin
Sahebi Golnaz
Publication venue: 'MDPI AG'
Publication date: 28/10/2022
Field of study

Interest is growing in the use of autonomous swarms of drones in various mission-physical applications such as surveillance, intelligent monitoring, and rescue operations. Swarm systems should fulfill safety and efficiency constraints in order to guarantee dependable operations. To maximize motion safety, we should design the swarm system in such a way that drones do not collide with each other and/or other objects in the operating environment. On other hand, to ensure that the drones have sufficient resources to complete the required task reliably, we should also achieve efficiency while implementing the mission, by minimizing the travelling distance of the drones. In this paper, we propose a novel integrated approach that maximizes motion safety and efficiency while planning and controlling the operation of the swarm of drones. To achieve this goal, we propose a novel parallel evolutionary-based swarm mission planning algorithm. The evolutionary computing allows us to plan and optimize the routes of the drones at the run-time to maximize safety while minimizing travelling distance as the efficiency objective. In order to fulfill the defined constraints efficiently, our solution promotes a holistic approach that considers the whole design process from the definition of formal requirements through the software development. The results of benchmarking demonstrate that our approach improves the route efficiency by up to 10% route efficiency without any crashes in controlling swarms compared to state-of-the-art solutions. </p

UTUPub

Path-Based partitioning methods for 3D Networks-on-Chip with minimal adaptive routing

Author: Daneshtalab Masoud
Ebrahimi Masoumeh
Flich Cardo José
Liljeberg Pasi
Plosila Juha
Tenhunen Hannu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2014
Field of study

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in Chip Multiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and in various parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at the hardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs, each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore the efficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose the Minimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show that an advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the network until all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsets and the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performance improvement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent average and 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.Ebrahimi, M.; Daneshtalab, M.; Liljeberg, P.; Plosila, J.; Flich Cardo, J.; Tenhunen, H. (2014). Path-Based partitioning methods for 3D Networks-on-Chip with minimal adaptive routing. IEEE Transactions on Computers. 63(3):718-733. doi:10.1109/TC.2012.255S71873363

RiuNet

A novel highly adaptive routing for networks-on-chip

Author: Daneshtalab Masoud
Gaur Manoj Singh
Ko Seok-Bum
Kumar Manoj
Laxmi Vijay
Zwolinski Mark
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/12/2015
Field of study

The degree of adaptiveness has a major impact on the performance of an adaptive routing method. This research work presents a novel turn model based routing method that provides a high degree of adaptiveness for 2D mesh. The result is that the proposed method reduces restrictions on the routing turns significantly and hence can provide path diversity using additional routes (both minimal and non-minimal). Experimental results show that the proposed method provides better performance (average latency and throughput) in comparison with the recent routing methods

Southampton (e-Prints Soton)

Crossref