Previous studies have shown that the application of the M-coder in the H.264/AVC and H.265/HEVC video coding standards allows for highly parallel implementations without decreasing maximal frequencies. Although the primary limitation on throughput, originating from the range register update, can be eliminated, other limitations are associated with low register processing. Their negative impact is revealed at higher degrees of parallelism, leading to a gradual throughput saturation. This paper presents optimizations introduced to the generative hardware architecture to increase throughputs and hardware efficiencies. Firstly, it can process more than one bypass-mode subseries in one clock cycle. Secondly, aggregated contributions to the codestream are buffered before the low register update. Thirdly, the number of contributions used to update the low register in one clock cycle is decreased to save resources. Fourthly, the maximal one-clock-cycle renormalization shift of the low register is increased from 32 to 64 bit positions. As a result of these optimizations, the binary arithmetic coder, configured for series lengths of 27 and 2 symbols, increases the throughput from 18.37 to 37.42 symbols per clock cycle for high-quality H.265/HEVC compression. The logic consumption increases from 205.6k to 246.1k gates when synthesized on 90 nm TSMC technology. The design can operate at 570 MHz.
Loading....