Current Issue : October - December Volume : 2019 Issue Number : 4 Articles : 5 Articles
Reconfigurable computing is that the logical resources in the system can be reconfigured\naccording to the real-time changing data flow to achieve different calculation functions. The\nreconfigurable computing system has both high efficiency on hardware and universality of software.\nThe field programmable gate array (FPGA)-based real-time simulator for active distribution networks\n(ADNs) requires a long compilation time for case modification, with low efficiency and low versatility,\nmaking it inconvenient and difficult for users. To solve the problem of long compile time with a new\ncase, a universal design of the FPGA-based real-time simulator for ADNs based on reconfigurable\ncomputing is proposed in this paper. It includes the universal design of the simulation parameter\nconfiguration, the simulation initial value setting, the linear equations solving module and the\nsimulation result output module. The proposed universal design of the simulator makes the\nmodification and change of the cases and parameters without recompiling and further improves\nthe simulation efficiency. Simulation results are conducted and compared with PSCAD/EMTDC to\nvalidate the correctness and effectiveness of the universal design....
Three-dimensional (3D) deconvolution is widely used in many computer vision applications.\nHowever, most previous works have only focused on accelerating two-dimensional (2D)\ndeconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the\nacceleration of 3D DCNNs has not been well studied in depth as they have higher computational\ncomplexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D\nsparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on\na uniform architecture. Firstly, a pruning method is used to prune unimportant network connections\nand increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is\nreduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded\nin coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate\nthe effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation\nplatform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of\nDCNNs is reduced up to 3.9*. Results show that the performance of our method on the accelerator\noutperforms that of the our prior work by 2.5* to 3.6* in latency....
This paper proposes a hardware realization of the crossover module in the genetic\nalgorithm for the travelling salesman problem (TSP). In order to enhance performance, we employ\na combination of pipelining and parallelization with a genetic algorithm (GA) processor to improve\nprocessing speed, as compared to software implementation. Simulation results showed that the\nproposed architecture is six times faster than the similar existing architecture. The presented fieldprogrammable\ngate array (FPGA) implementation of PMX crossover operator is more than 400\ntimes faster than in software....
Sequential Minimal Optimization (SMO) is the traditional training algorithm for Support\nVector Machines (SVMs). However, SMO does not scale well with the size of the training set. For that\nreason, Stochastic Gradient Descent (SGD) algorithms, which have better scalability, are a better\noption for massive data mining applications. Furthermore, even with the use of SGD, training\ntimes can become extremely large depending on the data set. For this reason, accelerators such\nas Field-programmable Gate Arrays (FPGAs) are used. This work describes an implementation in\nhardware, using FPGA, of a fully parallel SVM using Stochastic Gradient Descent. The proposed\nFPGA implementation of an SVM with SGD presents speedups of more than 10,000* relative\nto software implementations running on a quad-core processor and up to 319* compared to\nstate-of-the-art FPGA implementations while requiring fewer hardware resources. The results show\nthat the proposed architecture is a viable solution for highly demanding problems such as those\npresent in big data analysis....
Content-addressablememory (CAM) is a type of associative memory, which returns the address\nof a given search input in one clock cycle. Many designs are available to emulate the CAM functionality\ninside the re-configurable hardware, field-programmable gate arrays (FPGAs), using static random-access\nmemory (SRAM) and flip-flops. FPGA-based CAMs are becoming popular due to the rapid growth in\nsoftware defined networks (SDNs), which uses CAM for packet classification. Emulated designs of CAM\nconsume much dynamic power owing to a high amount of switching activity and computation involved\nin finding the address of the search key. In this paper, we present a power and resource efficient binary\nCAM architecture, Zi-CAM, which consumes less power and uses fewer resources than the available\narchitectures of SRAM-based CAM on FPGAs. Zi-CAM consists of two main blocks. RAM block (RB) is\nactivated when there is a sequence of repeating zeros in the input search word; otherwise, lookup tables\n(LUT) block (LB) is activated. Zi-CAM is implemented on Xilinx Virtex-6 FPGA for the size 64*36\nwhich improved power consumption and hardware cost by 30 and 32%, respectively, compared to the\navailable FPGA-based CAMs....
Loading....