Current Issue : April - June Volume : 2016 Issue Number : 2 Articles : 4 Articles
On-chip multiport memory cores are crucial primitives for many modern high-performance reconfigurable architectures and\nmulticore systems. Previous approaches for scaling memory cores come at the cost of operating frequency, communication\noverhead, and logic resources without increasing the storage capacity of the memory. In this paper, we present two approaches for\ndesigning multiport memory cores that are suitable for reconfigurable accelerators with substantial on-chip memory or complex\ncommunication. Our design approaches tackle these challenges by banking RAM blocks and utilizing interconnect networks which\nallows scaling without sacrificing logic resources.With banking, memory congestion is unavoidable and we evaluate our multiport\nmemory cores under different memory access patterns to gain insights about different design trade-offs. We demonstrate our\nimplementation with up to 256 memory ports using a Xilinx Virtex-7 FPGA. Our experimental results report high throughput\nmemories with resource usage that scales with the number of ports....
The Internal Configuration Access Port (ICAP) is the core component of any dynamic partial reconfigurable system implemented in\nXilinx SRAM-based Field ProgrammableGate Arrays (FPGAs).We developed a new high speed ICAP controller, named AC ICAP,\ncompletely implemented in hardware. In addition to similar solutions to accelerate the management of partial bitstreams and\nframes, AC ICAP also supports run-time reconfiguration of LUTs without requiring precomputed partial bitstreams. This last\ncharacteristic was possible by performing reverse engineering on the bitstream. Besides, we adapted this hardware-based solution\nto provide IP cores accessible from the MicroBlaze processor. To this end, the controller was extended and three versions were\nimplemented to evaluate its performance when connected to Peripheral Local Bus (PLB), Fast Simplex Link (FSL), and AXI\ninterfaces of the processor. In consequence, the controller can exploit the flexibility that the processor offers but taking advantage\nof the hardware speed-up. It was implemented in both Virtex-5 and Kintex7 FPGAs. Results of reconfiguration time showed that\nrun-time reconfiguration of single LUTs in Virtex-5 devices was performed in less than 5...
An FPGA has a finite routing capacity due to which a fair number of highly dense circuits fail to map on slightly under resourced\narchitecture.The high-interconnect demand in the congested regions is not met by the available resources as a result of which the\ncircuit becomes unroutable for that particular architecture. In this paper, we present a new placement approach which is based on a\nnatural process called diffusion. Our placer attempts to minimize the routing congestion by evenly disseminating the interconnect\ndemand across an FPGA chip. For the 20MCNC benchmark circuits, our algorithm reduced the channel width for 15 circuits. The\nresults showed on average âË?¼33% reduction in standard deviation of interconnect usage at an expense of an average âË?¼13% penalty\non critical path delay. Maximum channel width gain of âË?¼33% was also observed....
FPGAs are known to permit huge gains in performance and efficiency for suitable applications but still require reduced design efforts\nand shorter development cycles for wider adoption. In this work, we compare the resulting performance of two design concepts\nthat in different ways promise such increased productivity. As common starting point, we employ a kernel-centric design approach,\nwhere computational hotspots in an application are identified and individually accelerated on FPGA. By means of a complex\nstereo matching application, we evaluate two fundamentally different design philosophies and approaches for implementing the\nrequired kernels on FPGAs. In the first implementation approach, we designed individually specialized data flow kernels in a spatial\nprogramming language for a Maxeler FPGA platform; in the alternative design approach, we target a vector coprocessor with large\nvector lengths, which is implemented as a form of programmable overlay on the application FPGAs of a Convey HC-1. We assess\nboth approaches in terms of overall system performance, raw kernel performance, and performance relative to invested resources.\nAfter compensating for the effects of the underlying hardware platforms, the specialized dataflow kernels on the Maxeler platform\nare around 3x faster than kernels executing on the Convey vector coprocessor. In our concrete scenario, due to trade-off s between\nreconfiguration overheads and exposed parallelism, the advantage of specialized dataflow kernels is reduced to around 2.5x....
Loading....