Carl Ingemarsson
I took my master at Linköping University in 2009 and then was
enrolled as a PhD student at the Electronics Systems division at
Dept. of Electrical Engineering (ISY). Since summer 2014 I am instead
enrolled at division of Computer Engineering at dept. of Electrical
Engineering at Linköping University.
Research
My main research focus lies in the implementation of signal
processing algorithms and arithmetics in hardware. I have studied the
hardware implementation of matrix inversion, and is currently working
more with this in connection with the possible application of
matrix inversion in MIMO
decoding. I have also spent effort studying the mapping of hardware
architectures towards FPGA
technology.
Teaching
During my time as PhD student have been involved as either lab
assistant, student project supervisor, or tutorial session teacher in
many different undergraduate course, for instance:
TSTE12 - Konstruktion av digitala system
TSIU05 - Digitalteknik
TSEA51 - Digitalteknik
TSEA57/82 - Datorteknik
TSTE20 - Elektronik
TSTE87 - Application specific Integrated Circuits for Digital
Signal Processing
TMEL08 - Elektrotekniska system
TMMI04 - Elektroteknik
Links to the pages with my tutorial problem solutions:
Solutions - TSTE20
Solutions - TMEL08
Solutions - TMMI04
Publications
Show/hide year headlines.
Journal papers
2018
Abstract
In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.
Keywords
Fast Fourier transform (FFT), Field-programmable gate arrays (FPGAs), Pipeline FFT, FPGA optimization, Single-stream FFT, Engineering and Technology, Signal Processing, Embedded Systems
BIBTEX
@article{diva2:1245539,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{SFF--The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture}},
journal = {Journal of Signal Processing Systems},
year = {2018},
volume = {90},
number = {11},
pages = {1583--1592},
}
2017
Abstract
In this paper, an efficient mapping of the pipeline single-path delay feedback (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed. By considering the architectural features of the target FPGA, significantly better implementation results are obtained. This is illustrated by mapping an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a better mapping are proposed, resulting in implementation achievements that by far outperforms earlier published work. For Virtex-4, the results show a 350% increase in throughput per slice and 25% reduction in block RAM (BRAM) use, with the same amount of DSP48 resources, compared with the best earlier published result. The resulting Virtex-6 design sees even larger increases in throughput per slice compared with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The results clearly show that the FPGA mapping is crucial, not only the architecture and algorithm choices.
Keywords
Algorithmic transformations; fast Fourier transform (FFT); field-programmable gate arrays (FPGAs); hardware mapping; single-path delay feedback (SDF), Natural Sciences
BIBTEX
@article{diva2:1140992,
author = {Ingemarsson, Carl and Källström, Petter and Qureshi, Fahad and Gustafsson, Oscar},
title = {{Efficient FPGA Mapping of Pipeline SDF FFT Cores}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2017},
volume = {25},
number = {9},
pages = {2486--2497},
}
Conference papers
2021
Abstract
In this work, the effect of latency for three different positive definite matrix inversion algorithms when implemented on parallel and pipelined processing elements is considered. The work is motivated by the fact that in a massive MIMO system, matrix inversion needs to be performed between estimating the channels and producing the transmitted downlink signal, which means that the latency of the matrix inversion has a significant impact on the system performance. It is shown that, despite the algorithms having different complexity, all three algorithms can have the lowest latency for different number of processing elements and pipeline levels. Especially, in systems with many processing elements, the algorithm with the highest complexity has the lowest latency.
Keywords
Engineering and Technology, Telecommunications
BIBTEX
@inproceedings{diva2:1636880,
author = {Bertilsson, Erik and Ingemarsson, Carl and Gustafsson, Oscar},
title = {{Low-Latency Parallel Hermitian Positive-Definite Matrix Inversion for Massive MIMO}},
booktitle = {2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021)},
year = {2021},
series = {IEEE International Symposium on Biomedical Imaging},
pages = {23--28},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
2017
Oscar Gustafsson, Erik Bertilsson, Johannes Klasson, Carl Ingemarsson,
"Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)",
Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017, Proceedings Symposium on Computer Arithmetic,
Vol. 2017,
62-63,
2017.
Abstract
Approximate matrix inversion based on Neumann series has seen a recent increased interest motivated by massive MIMO systems. There, the matrices are in many cases diagonally dominant, and, hence, a reasonable approximation can be obtained within a few iterations of a Neumann series. In this work, we clarify that the complexity of exact methods are about the same as when three terms are used for the Neumann series, so in this case, the complexity is not lower as often claimed. The second common argument for Neumann series approximation, higher parallelism, is indeed correct. However, in most current practical use cases, such a high degree of parallelism is not required to obtain a low latency realization. Hence, we conclude that a careful evaluation, based on accuracy and latency requirements must be performed and that exact matrix inversion is in fact viable in many more cases than the current literature claims.
Keywords
matrix inversion, complexity, parallel processing, massive MIMO, Engineering and Technology, Signal Processing, Communication Systems
BIBTEX
@inproceedings{diva2:1121300,
author = {Gustafsson, Oscar and Bertilsson, Erik and Klasson, Johannes and Ingemarsson, Carl},
title = {{Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)}},
booktitle = {Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017},
year = {2017},
series = {Proceedings Symposium on Computer Arithmetic},
volume = {2017},
pages = {62--63},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
2016
Abstract
In this paper we propose an efficient hardware architecture for computation of matrix inversion of positive definite matrices. The algorithm chosen is LDL decomposition followed directly by equation system solving using back substitution. The architecture combines a high throughput with an efficient utilization of its hardware units. We also report FPGA implementation results that show that the architecture is well tailored for implementation in real-time applications.
BIBTEX
@inproceedings{diva2:1135176,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{Hardware Architecture for Positive Definite Matrix Inversion Based on LDL Decomposition and Back-Substitution}},
booktitle = {2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS},
year = {2016},
series = {Conference Record of the Asilomar Conference on Signals Systems and Computers},
pages = {859--863},
publisher = {IEEE COMPUTER SOC},
}
2015
Abstract
In this work we explore the trade-offs between established algorithms for symmetric matrix inversion for fixed-point hardware implementation. Inversion of symmetric positive definite matrices finds applications in many areas, e.g. in MIMO detection and adaptive filtering. We explore computational complexity and show simulation results where numerical properties are analyzed. We show that LDLT decomposition combined with equation system solving are the most promising algorithm for fixed-point hardware implementation. We further show that simply counting the number of operations does not establish a valid comparison between the algorithms as the required word lengths differ significantly.
Keywords
matrix inversion, fixed point arithmetic, symmetric matrix, Engineering and Technology
BIBTEX
@inproceedings{diva2:974167,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{On fixed-point implementation of symmetric matrix inversion}},
booktitle = {Proceedings of the European Conference on Circuit Theory and Design (ECCTD)},
year = {2015},
pages = {440--443},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
Abstract
In this work we explore the trade-offs between established algorithms for symmetric matrix inversion for fixed-point hardware implementation. Inversion of symmetric positive definite matrices finds applications in many areas, e.g. in MIMO detection and adaptive filtering. We explore computational complexity and show simulation results where numerical properties are analyzed. We show that LDLT decomposition combined with equation system solving are the most promising algorithm for fixed-point hardware implementation. We further show that simply counting the number of operations does not establish a valid comparison between the algorithms as the required word lengths differ significantly.
Keywords
matrix inversion, fixed point arithmetic, symmetric matrix, Engineering and Technology
BIBTEX
@inproceedings{diva2:898187,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{On fixed-point implementation of symmetric matrix inversion}},
booktitle = {Proceedings of the European Conference on Circuit Theory and Design (ECCTD)},
year = {2015},
pages = {1--4},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
2012
Abstract
Many contemporary FPGAs have introduced a pre-adder before the hard multipliers, primarily aimed at linear-phase FIR filters. In this work, structural modifications are proposed with the aim of reducing the LUT resource utilization and, finally, using the pre-adder for implementing single path delay feedback pipeline FFTs. The results show that two thirds of the LUT resources can be saved when the pre-adder has bypass functionality, as in the Xilinx 6 and 7 series, compared to a direct mapping.
Keywords
Engineering and Technology
BIBTEX
@inproceedings{diva2:618605,
author = {Ingemarsson, Carl and Källström, Petter and Gustafsson, Oscar},
title = {{Using DSP block pre-adders in pipeline SDF FFT implementations in contemporary FPGAs}},
booktitle = {22nd International Conference on Field Programmable Logic and Applications (FPL)},
year = {2012},
pages = {71--74},
publisher = {IEEE Communications Society},
address = {Piscataway, NJ, USA},
}
2011
Abstract
Matrix inversion is sensitive towards the number representation used. In this paper simulations of matrix inversion with numbers represented in the fixed-point and logarithmic number systems (LNS) are presented. A software framework has been implemented to allow extensive simulation of finite wordlength matrix inversion. Six different algorithms have been used and results on matrix condition number, wordlength, and to some extent matrix size are presented. The simulations among other things show that the wordlength requirements differ significantly between different algorithms in both fixed-point and LNS representations. The results can be used as a starting point for a matrix inversion hardware implementation.
Keywords
Engineering and Technology
BIBTEX
@inproceedings{diva2:461965,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{Finite wordlength properties of matrix inversion algorithms in fixed-point and logarithmic number system}},
booktitle = {2011 20th European Conference on Circuit Theory and Design (ECCTD)},
year = {2011},
pages = {673--676},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
Abstract
Matrix inversion is a key operation in for instance adaptivefilters and MIMO communication system receivers. For ill-conditionedchannel matrices long wordlengths are required for fixed-point implementationof matrix inversion. In this work, the wordlength/error tradeoffsfor matrix inversion using different algorithms with fixed-point andlogarithmic number systems (LNS) are considered. LNS provides higherresolution for small numbers and a larger dynamic range. Also, it willalter the cost of the basic operations in the algorithms. The results showthat also the wordlength required to achieve a comparable error differsignificantly between different algorithms and for most algorithms isreduced for LNS compared to fixed-point.
Keywords
Matrix Inversion, Logarithmic number system, LNS, Engineering and Technology
BIBTEX
@inproceedings{diva2:447684,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{On Using the Logarithmic Number System for Finite Wordlength Matrix Inversion}},
booktitle = {The 54th IEEE International Midwest Symposium on Circuits and Systems},
year = {2011},
series = {Midwest Symposium on Circuits and Systems. Conference Proceedings},
pages = {1--4},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}