Fast
Cyclic Convolution Algorithms
and their
Implementation in
Multiprocessor
Architectures
|
|
We are seeking an efficient algorithmic implementation of very large, one-dimensional, cyclic convolutions in multiprocessor workstations, clusters and others distributed computational environments. VLSI and FPGA implementations are also of interest. Certain algorithms, such as the Agarwal-Cooley cyclic convolution algorithm, transform the circulant matrix into a block circulant matrix where the sub-blocks are amenable to be independently processed in parallel. The technique is rather restrictive regarding the condition imposed to the length of the sequences (factors of the sequence length have to be mutually prime). There are also other techniques in which a one-dimensional convolution can be performed by means of a multi-dimensional convolution. The problem is that the number of points along the other dimensions has to be doubled by zero padding. We have
found, however, that if the length of the sequences is composite (no
need for the factors to be mutually prime) the circulant matrix can be
factored into a block pseudocirculant matrix. Each block is circulant
in itself and amenable to be independently processed [2], [5]. We would like to
gain further insight into these algorithms and to explore their
parallel implementation. Even
though block pseudocirculant [2], [7] and block circulant matrices [6] have been studied in the past,
we have explicitly established their relation to cyclic convolution,
stride permutations, and to common permutations of the Fourier matrix
(including those leading to the radix r decimation in time FFT [2] and
those leading to the Good Thomas Prime Factor algorithm [6]). We have also developed recursive formulations of block pseudocirculant matrices leading to reduced complexity parallel convolution and reduced complexity parallel FIR filtering [2], [3]. In the recursive process we defined the higher order block pseudocirculants as Super Block Pseudocirculants. Intrinsic to the previously mentioned architectures are the newly defined Block Pseudocyclic Shift Operator and Pseudo Delays. These constructs also lead to unfolded signal flow graphs of cyclic shifts and regular delays [1]. 1] Teixeira, M., “An Operator Based Approach to the Unfolding Transformation”. IEEE Midwest Symposium on Circuit and Systems (MWSCAS-08), Knoxville, Tennessee, August 2008. 2] Teixeira, M., and Y.I. Rodriguez, “Parallel cyclic convolution based on recursive formulations of block pseudocirculant matrices”. IEEE Transactions on Signal Processing. July 2008. (.pdf) 3] Teixeira, M., "Unified Architectures for Parallel Cyclic Convolution and Reduced Complexity Parallel FIR Filters". WORLDCOMP'08, The 2008 International Conference on Parallel and Distributed Computing and Techniques. Las Vegas, Nevada, July 13 -17, 2008. (.pdf) 4] Marvi Teixeira and Yamil Rodriguez; “A New Development for Cyclic Convolution: The Super Block Pseudocirculant Matrix”. 2007 IEEE Sarnoff Symposium, Princeton, New Jersey, May 2007. (.pdf) 5] Teixeira, M., and D. Rodriguez, 1995. A class of fast cyclic convolution algorithms based on block pseudocirculants. (pdf), (pdfv2) IEEE Signal Processing Letters, 5 (2), p 92. 6] Teixeira, M., and D. Rodriguez. A novel derivation of the Agarwal-Cooley fast cyclic convolution algorithm based on the Good Thomas prime factor algorithm. (pdf). Proceedings of the 38th Midwest Symposium on Circuits and Systems. IEEE, Brazil, August 1995. (pdfv2) 7] Teixeira, M., and D. Rodriguez. A new method mathematically links fast Fourier transform algorithm with a fast cyclic convolution algorithm. (pdf). Proceedings of the 37th Midwest Symposium on Circuits and Systems. IEEE, Lafayette, Lousiana, August 1994. (pdfv2) 8] Teixeira, M., and D. Rodriguez. An educational environment for the graphical manipulation of signal processing algorithms. (.pdf). Proceedings of the International Conference on Simulation in Engineering Education. SCS, ASEE, IEEE, Las Vegas, Jan 1995. 9] Teixeira, M. and D. Rodriguez. A Methodology for the Intelligent Implementation of VLSI Signal Processing Algorithms. (.pdf). The First World Congress on Intelligent Manufacturing. Mayaguez, Puerto Rico, 1995.
|
|
C. Cheng and K. K. Parhi, “Hardware efficient fast DCT based on novel cyclic convolution structures,” IEEE Trans. Signal Process., Vol. 54, NO. 11, pp. 4419-4434, Nov. 2006. C. Cheng and K. K. Parhi, “Low-Cost Fast VLSI Algorithm for Discrete Fourier Transform,” IEEE Trans. Circuits Syst. I, vol. 54, NO. 4, pp. 791-806, April, 2007. C. Cheng and K. K. Parhi, “A novel systolic array structure for DCT,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 52, pp. 366–369, Jul. 2005. R. C. Agarwal and C. H. Burrus," Fast One-Dimensional Digital Convolution by Multidimensional Techniques," IEEE Trans. on Acoustic, Speech, and Sig. Proc., Vol. ASSP-22, No. 1, Feb. 1974. H. K. Kwan and M. T. Tsim, " High Speed 1-D FIR Digital Filtering Architectures using Polynomial Convolution," in Proc. IEEE Int. Conf. Acoust., Speech, and Sig. Proc., Dallas, TX, Apr. 1987, pp. 1863-1866. Z. J. Mou and P. Duhamel, " Short-Length FIR Filters and Their Use in Fast Nonrecursive Filtering," IEEE Trans. on Sig. Proc. Vol. 39, No. 6, June 1991. D. A. Pitassi, " Fast Convolution using Walsh Transforms," in Proc. Symp. Application of Walsh Functions, Washington D.C. , April 1971, pp.130-133. W. F. Davis, " A Class of Efficient Convolution Algorithms," in Proc. Symp. Applications of Walsh Functions, Washington D. C., April 1971, pp 318-329. P. J. Rayner, " A Fast Cyclic Convolution Algorithm," presented at Symp. Digital Filtering, Imperial College, London, England, Aug. 1971. I. Pitas and M. G. Strintzis, " Multidimensional Cyclic Convolution Algorithms with Miminimal Multiplicative Complexity," IEEE Transactions on Acoustic, Speech, and Sig. Proc., Vol. ASSP-35, No. 3, March 1987. P.P. Vaidyanathan and S. K. Mitra, " Polyphase Networks, Block Digital Filtering, LPTV Systems, and Alias-Free QMF Banks: A Unified Approach Based on Pseudocirculants, IEEE Trans. on Acoustic, Speech, and Sig. Proc., Vol. 36, No. 3, March 1988. |