Current Issue : January - March Volume : 2012 Issue Number : 1 Articles : 5 Articles
The design of complex circuits as SoCs presents two great challenges to designers. One is the speeding up of system functionality modeling and the second is the implementation of the system in an architecture that meets performance and power consumption requirements. Thus, developing new high-level specification mechanisms for the reduction of the design effort with automatic architecture exploration is a necessity. This paper proposes an Electronic-System-Level (ESL) approach for system modeling and cache energy consumption analysis of SoCs called PCacheEnergyAnalyzer. It uses as entry a high-level UML-2.0 profile model of the system and it generates a simulation model of a multicore platform that can be analyzed for cache tuning. PCacheEnergyAnalyzer performs static/dynamic energy consumption analysis of caches on platforms that may have different processors. Architecture exploration is achieved by letting designers choose different processors for platform generation and different mechanisms for cache optimization. PCacheEnergyAnalyzer has been validated with several applications of Mibench, Mediabench, and PowerStone benchmarks, and results show that it provides analysis with reduced simulation effort....
Limits of instruction-level parallelism and higher transistor density sustain the increasing need for multiprocessor systems: they are rapidly taking over both general-purpose and embedded processor domains. Current multiprocessing systems are composed either of many homogeneous and simple cores or of complex superscalar, simultaneous multithread processing elements. As parallel applications are becoming increasingly present in embedded and general-purpose domains and multiprocessing systems must handle a wide range of different application classes, there is no consensus over which are the best hardware solutions to better exploit instruction-level parallelism (TLP) and thread-level parallelism (TLP) together. Therefore, in this work, we have expanded the DIM (dynamic instruction merging) technique to be used in a multiprocessing scenario, proving the need for an adaptable ILP exploitation even in TLP architectures. We have successfully coupled a dynamic reconfigurable system to an SPARC-based multiprocessor and obtained performance gains of up to 40%, even for applications that show a great level of parallelism at thread level....
We presented a resource- and configuration-aware floorplacement framework, tailored for Xilinx Virtex 4 and 5 FPGAs, using an objective function based on external wirelength. Our work aims at identifying groups of Reconfigurable Functional Units that are likely to be configured in the same chip area, identifying these areas based on resource requirements, device capabilities, and wirelength. Task graphs with few externally connected RRs lead to the biggest decrease, while external wirelength in task graphs with many externally connected RRs show lower improvement. The proposed approach results, as also demonstrated in the experimental results section, in a shorter external wirelength (an average reduction of 50%) with respect to purely area-driven approaches and a highly increased probability of reuse of existing links (90% reduction can be obtained in the best case)....
Despite significant performance and power advantages compared to microprocessors, widespread usage of FPGAs has been limited by increased design complexity. High-level synthesis (HLS) tools have reduced design complexity but provide limited support for verification, debugging, and timing analysis. Such tools generally rely on inaccurate software simulation or lengthy register-transfer-level simulations, which are unattractive to software developers. In this paper, we introduce HLS techniques that allow application designers to efficiently synthesize commonly used ANSI-C assertions into FPGA circuits, enabling verification and debugging of circuits generated from HLS tools, while executing in the actual FPGA environment. To verify that HLS-generated circuits meet execution timing constraints, we extend the in-circuit assertion support for testing of elapsed time for arbitrary regions of code. Furthermore, we generalize timing assertions to transparently provide hang detection that back annotates hang occurrences to source code. The presented techniques enable software developers to rapidly verify, debug, and analyze timing for FPGA applications, while reducing frequency by less than 3% and increasing FPGA resource utilization by 0.7% or less for several application case studies on the Altera Stratix-II EP2S180 and Stratix-III EP3SE260 using Impulse-C. The presented techniques reduced area overhead by as much as 3x and improved assertion performance by as much as 100% compared to unoptimized in-circuit assertions....
This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare the time Ã?â?? area efficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23?ms, and the multiplexed architecture executes the same operation in 4.36?ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance....
Loading....