Publikationer Datorteknik
Journal papers
In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.
@article{diva2:1740275,
author = {Nunez-Yanez, Jose Luis and Otero, Andres and de la Torre, Eduardo},
title = {{Dynamically reconfigurable variable-precision sparse-dense matrix acceleration in Tensorflow Lite}},
journal = {Microprocessors and microsystems},
year = {2023},
volume = {98},
}
In this paper we present a hardware architecture optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate CPU and FPGA resources. The FADES (Fused Architecture for DEnse and Sparse matrices) design offers multiple configuration options that trade-off parallelism and complexity and uses a dataflow model to create four stages that read, compute, scale and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values and current scaling specifications. The configurable accelerator is integrated with the TensorFlow Lite inference engine running on the ARMv8 processor. We compare performance/power/energy with the state-of-the-art RUY software multiplication library showing up to 18x acceleration and 48x in dense and sparse modes respectively. The sparse mode benefits from structural pruning to fully utilize the DSP blocks present in the FPGA device.
@article{diva2:1699269,
author = {Nunez-Yanez, Jose Luis},
title = {{Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite}},
journal = {IEEE Micro},
year = {2022},
volume = {42},
number = {6},
pages = {55--66},
}
Distributed arithmetic (DA) is an efficient look-up table (LUT) based approach. The throughput of DA based implementation is limited by the LUT size. This paper presents two high-throughput architectures (Type I and II) of non-pipelined DA based least-mean-square (LMS) adaptive filters (ADFs) using twos complement (TC) and offset-binary coding (OBC) respectively. We formulate the LMS algorithm using the steepest descent approach with possible extension to its power-normalized LMS version and followed by its convergence properties. The coefficient update equation of LMS algorithm is then transformed via TC DA and OBC DA to design and develop non-pipelined architectures of ADFs. The proposed structures employ the LUT pre-decomposition technique to increase the throughput performance. It enables the same mapping scheme for concurrent update of the decomposed LUTs. An efficient fixed-point quantization model for the evaluation of proposed structures from a realistic point-of-view is also presented. It is found that Type II structure provides higher throughput than Type I structure at the expense of slow convergence rate with almost the same steady-state mean square error. Unlike existing non-pipelined LMS ADFs, the proposed structures offer very high throughput performance, especially with large order DA base units. Furthermore, they are capable of performing less number of additions in every filter cycle. Based on the simulation results, it is found that 256th order filter with 8th order DA base unit using Type I structure provides 9 :41 x higher throughput while Type II structure provides 16 :68 x higher throughput as compared to the best existing design. Synthesis results show that 32nd order filter with 8th order DA base unit using Type I structure achieves 38 :76% less minimum sampling period (MSP), occupies 28 :62% more area, consumes 67 :18% more power, utilizes 49 :06% more slice LUTs and 3 :31% more flip-flops (FFs), whereas Type II structure achieves 51 :25% less MSP, occupies 21 :42% more area, consumes 47 :84% more power, utilizes 29 :10% more slice LUTs and 1 :47% fewer FFs as compared to the best existing design.
@article{diva2:1691004,
author = {Khan, Mohd. Tasleem and Alhartomi, Mohammed A. and Alzahrani, Saeed and Shaik, Rafi Ahamed and Alsulami, Ruwaybih},
title = {{Two Distributed Arithmetic Based High Throughput Architectures of Non-Pipelined LMS Adaptive Filters}},
journal = {IEEE Access},
year = {2022},
volume = {10},
pages = {76693--76706},
}
User activity detection in grant-free random access massive machine type communication (mMTC) using pilot-hopping sequences can be formulated as solving a non-negative least squares (NNLS) problem. In this work, two architectures using different algorithms to solve the NNLS problem is proposed. The algorithms are implemented using a fully parallel approach and fixed-point arithmetic, leading to high detection rates and low power consumption. The first algorithm, fast projected gradients, converges faster to the optimal value. The second algorithm, multiplicative updates, is partially implemented in the logarithmic domain, and provides a smaller chip area and lower power consumption. For a detection rate of about one million detections per second, the chip area for the fast algorithm is about 0.7 mm 2 compared to about 0.5 mm 2 for the multiplicative algorithm when implemented in a 28 nm FD-SOI standard cell process at 1 V power supply voltage. The energy consumption is about 300 nJ/detection for the fast projected gradient algorithm using 256 iterations, leading to a convergence close to the theoretical. With 128 iterations, about 250 nJ/detection is required, with a detection performance on par with 192 iterations of the multiplicative algorithm for which about 100 nJ/detection is required.
@article{diva2:1599759,
author = {Mohammadi Sarband, Narges and Becirovic, Ema and Krysander, Mattias and Larsson, Erik G. and Gustafsson, Oscar},
title = {{Massive Machine-Type Communication Pilot-Hopping Sequence Detection Architectures Based on Non-Negative Least Squares for Grant-Free Random Access}},
journal = {IEEE Open Journal of Circuits and Systems},
year = {2021},
volume = {2},
pages = {253--264},
}
Predictive maintenance aims to predict failures in components of a system, a heavy-duty vehicle in this work, and do maintenance before any actual fault occurs. Predictive maintenance is increasingly important in the automotive industry due to the development of new services and autonomous vehicles with no driver who can notice first signs of a component problem. The lead-acid battery in a heavy vehicle is mostly used during engine starts, but also for heating and cooling the cockpit, and is an important part of the electrical system that is essential for reliable operation. This paper develops and evaluates two machine-learning based methods for battery prognostics, one based on Long Short-Term Memory (LSTM) neural networks and one on Random Survival Forest (RSF). The objective is to estimate time of battery failure based on sparse and non-equidistant vehicle operational data, obtained from workshop visits or over-the-air readouts. The dataset has three characteristics: 1) no sensor measurements are directly related to battery health, 2) the number of data readouts vary from one vehicle to another, and 3) readouts are collected at different time periods. Missing data is common and is addressed by comparing different imputation techniques. RSF- and LSTM-based models are proposed and evaluated for the case of sparse multiple-readouts. How to measure model performance is discussed and how the amount of vehicle information influences performance.
@article{diva2:1512684,
author = {Voronov, Sergii and Krysander, Mattias and Frisk, Erik},
title = {{Predictive Maintenance of Lead-Acid Batteries with Sparse Vehicle Operational Data}},
journal = {International Journal of Prognostics and Health Management},
year = {2020},
volume = {11},
number = {1},
}
In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by solving the transposed form of the problem and then transposing the solution. The key contribution is a systematic way to obtain the transposed realization with a minimum number of cascaded adders subject to the input realization. In this way, wide and low constant matrix multiplication problems, with sum of products as a special case, which are normally exceptionally time consuming to solve using adder graph algorithms, can be solved by first transposing the matrix and then transposing the solution. Examples show that while the relation between the adder depth of the solution to the transposed problem and the original problem is not straightforward, there are many cases where the reduction in adder cost will more than compensate for the potential increase in adder depth and result in implementations with reduced power consumption compared to using sub-expression sharing algorithms, which can both solve the original problem directly in reasonable time and guarantee a minimum adder depth.
@article{diva2:1461839,
author = {Mohammadi Sarband, Narges and Gustafsson, Oscar and Garrido Gálvez, Mario},
title = {{Using Transposition to Efficiently Solve Constant Matrix-Vector Multiplication and Sum of Product Problems}},
journal = {Journal of Signal Processing Systems},
year = {2020},
volume = {92},
number = {10},
pages = {1075--1089},
}
The life and condition of a mine truck frame are related to how the machine is used. Damage from stress cycles is accumulated over time, and measurements throughout the life of the machine are needed to monitor the condition. This results in high demands on the durability of sensors, especially in a harsh mining application. To make a monitoring system cheap and robust, sensors already available on the vehicles are preferred rather than additional strain gauges. The main question in this work is whether the existing on-board sensors can give the required information to estimate stress signals and calculate accumulated damage of the frame. Model complexity requirements and sensors selection are also considered. A final question is whether the accumulated damage can be used for prognostics and to increase reliability. The investigation is performed using a large data set from two vehicles operating in real mine applications. Coherence analysis, ARX-models, and rain flow counting are techniques used. The results show that a low number of available on-board sensors like load cells, damper cylinder positions, and angle transducers can give enough information to recreate some of the stress signals measured. The models are also used to show significant differences in usage by different operators, and its effect on the accumulated damage.
@article{diva2:1431155,
author = {Jakobsson, Erik and Pettersson, Robert and Frisk, Erik and Krysander, Mattias},
title = {{Fatigue Damage Monitoring for Mining Vehicles using Data Driven Models}},
journal = {International Journal of Prognostics and Health Management},
year = {2020},
volume = {11},
number = {1},
}
The study of fault diagnosis on automotive engine systems has been an interesting and ongoing topic for many years. Numerous research projects were conducted by automakers and research institutions to discover new and more advanced methods to perform diagnosis for better fault isolation (FI). Some of the research in this field has been reported in.
@article{diva2:1424739,
author = {Ng, Kok Yew and Frisk, Erik and Krysander, Mattias and Eriksson, Lars},
title = {{A Realistic Simulation Testbed of a Turbocharged Spark-Ignited Engine System: A Platform for the Evaluation of Fault Diagnosis Algorithms and Strategies}},
journal = {IEEE CONTROL SYSTEMS MAGAZINE},
year = {2020},
volume = {40},
number = {2},
pages = {56--83},
}
The most challenging aspect of particle filtering hardware implementation is the resampling step. This is because of high latency as it can be only partially executed in parallel with the other steps of particle filtering and has no inherent parallelism inside it. To reduce the latency, an improved resampling architecture is proposed which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator along with architectures for realizing the pre-fetch technique. This enables a particle filter using M particles with otherwise streaming operation to get new inputs more often than 2M cycles as the previously best approach gives. Results show that a pre-fetch buffer of five values achieves the best area-latency reduction trade-off while on average achieving an 85% reduction in latency for the resampling step leading to a sample time reduction of more than 40%. We also propose a generic division-free architecture for the resampling steps. It also removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. In addition, on-the-fly computation of the cumulative sum of weights is proposed which helps reduce the word length of the particle weight memory. FPGA implementation results show that the memory size is reduced by up to 50%.
@article{diva2:1417285,
author = {Alam, Syed Asad and Gustafsson, Oscar},
title = {{Improved Particle Filter Resampling Architectures}},
journal = {Journal of Signal Processing Systems},
year = {2020},
volume = {92},
number = {6},
pages = {555--568},
}
Finding the cheapest, or smallest, set of sensors such that a specified level of diagnosis performance is maintained is important to decrease cost while controlling performance. Algorithms have been developed to find sets of sensors that make faults detectable and isolable under ideal circumstances. However, due to model uncertainties and measurement noise, different sets of sensors result in different achievable diagnosability performance in practice. In this paper, the sensor selection problem is formulated to ensure that the set of sensors fulfils required performance specifications when model uncertainties and measurement noise are taken into consideration. However, the algorithms for finding the guaranteed global optimal solution are intractable without exhaustive search. To overcome this problem, a greedy stochastic search algorithm is proposed to solve the sensor selection problem. A case study demonstrates the effectiveness of the greedy stochastic search in finding sets close to the global optimum in short computational time.
@article{diva2:806672,
author = {Jung, Daniel and Dong, Yi and Frisk, Erik and Krysander, Mattias and Biswas, Gautam},
title = {{Sensor selection for fault diagnosis in uncertain systems}},
journal = {International Journal of Control},
year = {2020},
volume = {93},
number = {3},
pages = {629--639},
}
In this paper, we present the first implementation of a 1 million-point fast Fourier transform (FFT) completely integrated on a single field-programmable gate array (FPGA), without the need for external memory or multiple interconnected FPGAs. The proposed architecture is a pipelined single-delay feedback (SDF) FFT. The architecture includes a specifically designed 1 million-point rotator with high accuracy and a thorough study of the word length at the different FFT stages in order to increase the signal-to-quantization-noise ratio (SQNR) and keep the area low. This also results in low power consumption.
@article{diva2:1367440,
author = {Kanders, Hans and Mellqvist, Tobias and Garrido Gálvez, Mario and Palmkvist, Kent and Gustafsson, Oscar},
title = {{A 1 Million-Point FFT on a Single FPGA}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2019},
volume = {66},
number = {10},
pages = {3863--3873},
}
A direct digital-to-RF converter (DRFC) is presented in this work. Due to its digital-in-nature design, the DRFC benefits from technology scaling and can be monolithically integrated into advance digital VLSI systems. A fourth-order single-bit quantizer bandpass digital EA modulator is used preceding the DRFC, resulting in a high in-band signal-to-noise ratio (SNR). The out-of-band spectrally-shaped quantization noise is attenuated by an embedded semi-digital FIR filter (SDFIR). The RF output frequencies are synthesized by a novel configurable voltage-mode RF DAC solution with a high linearity performance. The configurable RF DAC is directly synthesizing RF signals up to 10 GHz in first or second Nyquist zone. The proposed DRFC is designed in 22 nm FDSOI CMOS process and with the aid of Monte-Carlo simulation, shows 78.6 dBc and 63.2 dBc worse case third intermodulation distortion (IM3) under process mismatch in 2.5 GHz and 7.5 GHz output frequency respectively.
@article{diva2:1334868,
author = {Sadeghifar, Mohammad Reza and Bengtsson, Hakan and Wikner, Jacob and Gustafsson, Oscar},
title = {{Direct digital-to-RF converter employing semi-digital FIR voltage-mode RF DAC}},
journal = {Integration},
year = {2019},
volume = {66},
pages = {128--134},
}
In this paper, we present a systematic approach to design hardware circuits for bit-dimension permutations. The proposed approach is based on decomposing any bit-dimension permutation into elementary bit-exchanges. Such decomposition is proven to achieve the theoretical minimum number of delays required for the permutation. This offers optimum solutions for multiple well-known problems in the literature that make use of bit-dimension permutations. This includes the design of permutation circuits for the fast Fourier transform, bit reversal, matrix transposition, stride permutations, and Viterbi decoders.
@article{diva2:1333835,
author = {Garrido, Mario and Grajal, Jesus and Gustafsson, Oscar},
title = {{Optimum Circuits for Bit-Dimension Permutations}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2019},
volume = {27},
number = {5},
pages = {1148--1160},
}
Optimization problem formulation for semi-digital FIR digital-to-analog converter (SDFIR DAC) is investigated in this work. Magnitude and energy metrics with variable coefficient precision are defined for cascaded digital sigma modulators, semi-digital FIR filter, and Sinc roll-off frequency response of the DAC. A set of analog metrics as hardware cost is also defined to be included in SDFIR DAC optimization problem formulation. It is shown in this work, that hardware cost of the SDFIR DAC, can be significantly reduced by introducing flexible coefficient precision while the SDFIR DAC is not over designed either. Different use-cases are selected to demonstrate the optimization problem formulations. A combination of magnitude metric, energy metric, coefficient precision and analog metrics are used in different use cases of optimization problem formulation and solved to find out the optimum set of analog FIR taps. A new method with introducing the variable coefficient precision in optimization procedure was proposed to avoid non-convex optimization problems. It was shown that up to 22% in the total number of unit elements of the SDFIR filter can be saved when targeting the analog metric as the optimization objective subject to magnitude constraint in pass-band and stop-band.
@article{diva2:1333834,
author = {Sadeghifar, Mohammad Reza and Gustafsson, Oscar and Wikner, Jacob},
title = {{Optimization problem formulation for semi-digital FIR digital-to-analog converter considering coefficients precision and analog metrics}},
journal = {Analog Integrated Circuits and Signal Processing},
year = {2019},
volume = {99},
number = {2},
pages = {287--298},
}
This brief presents novel circuits for calculating the bit reversal on parallel data. The circuits consist of delays/memories and multiplexers, and have the advantage that they requires the minimum number of multiplexers among circuits for parallel bit reversal so far, as well as a small total memory.
@article{diva2:1300795,
author = {Garrido, Mario},
title = {{Multiplexer and Memory-Efficient Circuits for Parallel Bit Reversal}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2019},
volume = {66},
number = {4},
pages = {657--661},
}
This paper presents the fastest fast Fourier transform (FFT) hardware architectures so far. The architectures are based on a fully parallel implementation of the FFT algorithm. In order to obtain the highest throughput while keeping the resource utilization low, we base our design on making use of advanced shift-and-add techniques to implement the rotators and on selecting the most suitable FFT algorithms for these architectures. Apart from high throughput and resource efficiency, we also guarantee high accuracy in the proposed architectures. For the implementation, we have developed an automatic tool that generates the architectures as a function of the FFT size, input word length and accuracy of the rotations. We provide experimental results covering various FFT sizes, FFT algorithms, and field-programmable gate array boards. These results show that it is possible to break the barrier of 100 GS/s for FFT calculation.
@article{diva2:1300794,
author = {Garrido, Mario and Möller, K. and Kumm, M.},
title = {{World's Fastest FFT Architectures:
Breaking the Barrier of 100 GS/s}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2019},
volume = {66},
number = {4},
pages = {1507--1516},
}
An all-digital pulse width modulated (PWM) transmitter using outphasing is proposed. The transmitter uses PWM to encode the amplitude, and outphasing for enhanced phase control. In this way, the phase resolution of the transmitter is doubled. The proposed scheme was implemented using Stratix IV FGPA and class-D PAs fabricated in a 130 nm standard CMOS. From the measurement results, a spectral performance improvement is observed due to the enhanced phase resolution. As compared to an all-digital polar PWM transmitter, the error vector magnitude for proposed transmitter is reduced by 4.1% and the adjacent channel leakage ratio shows an improvement of 5.6 dB for a 1.4 MHz LTE up-link signal for a carrier frequency of 700 MHz at the saturated output power of 25 dBm.
@article{diva2:1265262,
author = {Pasha, Muhammad Touqir and Fahim Ul Haque, Muhammad and Ahmad, Jahanzeb and Johansson, Ted},
title = {{An All-Digital PWM Transmitter With Enhanced Phase Resolution}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2018},
volume = {65},
number = {11},
pages = {1634--1638},
}
This brief presents a novel pipelined architecture to compute the fast Fourier transform of real input signals in a serial manner, i.e., one sample is processed per cycle. The proposed architecture, referred to as real-valued serial commutator, achieves full hardware utilization by mapping each stage of the fast Fourier transform (FFT) to a half-butterfly operation that operates on real input signals. Prior serial architectures to compute FFT of real signals only achieved 50% hardware utilization. Novel data-exchange and data-reordering circuits are also presented. The complete serial commutator architecture requires 2 log(2) N - 2 real adders, log(2) N - 2 real multipliers, and N + 9 log(2) N - 19 real delay elements, where N represents the size of the FFT.
@article{diva2:1265261,
author = {Garrido Gálvez, Mario and Unnikrishnan, Nanda K. and Parhi, Keshab K.},
title = {{A Serial Commutator Fast Fourier Transform Architecture for Real-Valued Signals}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2018},
volume = {65},
number = {11},
pages = {1693--1697},
}
n/a
@article{diva2:1259455,
author = {Garrido Gálvez, Mario and Lopez-Vallejo, Maria Luisa and Chen, Sau-Gee},
title = {{Guest Editorial: Special Section on Fast Fourier Transform (FFT) Hardware Implementations}},
journal = {Journal of Signal Processing Systems},
year = {2018},
volume = {90},
number = {11},
pages = {1581--1582},
}
Machine learning can be used to automatically process sensor data and create data-driven models for prediction and classification. However, in applications such as fault diagnosis, faults are rare events and learning models for fault classification is complicated because of lack of relevant training data. This paper proposes a hybrid diagnosis system design which combines model-based residuals with incremental anomaly classifiers. The proposed method is able to identify unknown faults and also classify multiple-faults using only single-fault training data. The proposed method is verified using a physical model and data collected from an internal combustion engine.
@article{diva2:1248561,
author = {Jung, Daniel and Ng, Kok Yew and Frisk, Erik and Krysander, Mattias},
title = {{Combining model-based diagnosis and data-driven anomaly classifiers for fault isolation}},
journal = {Control Engineering Practice},
year = {2018},
volume = {80},
pages = {146--156},
}
In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.
@article{diva2:1245539,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{SFF--The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture}},
journal = {Journal of Signal Processing Systems},
year = {2018},
volume = {90},
number = {11},
pages = {1583--1592},
}
The single constant coefficient multiplication is a frequently used operation in many numeric algorithms. Extensive previous work is available on how to reduce constant multiplications to additions, subtractions, and bit shifts. However, on previous work, only common two-input adders were used. As modern field-programmable gate arrays (FPGAs) support efficient ternary adders, i.e., adders with three inputs, this brief investigates constant multiplications that are built from ternary adders in an optimal way. The results show that the multiplication with any constant up to 22 bits can be realized by only three ternary adders. Average adder reductions of more than 33% compared to optimal constant multiplication circuits using two-input adders are achieved for coefficient word sizes of more than five bits. Synthesis experiments show FPGA average slice reductions in the order of 25% and a similar or higher speed than their two-input adder counterparts.
@article{diva2:1234449,
author = {Kumm, Martin and Gustafsson, Oscar and Garrido Gálvez, Mario and Zipf, Peter},
title = {{Optimal Single Constant Multiplication Using Ternary Adders}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2018},
volume = {65},
number = {7},
pages = {928--932},
}
Maintenance planning is important in the automotive industry as it allows fleet owners or regular customers to avoid unexpected failures of the components. One cause of unplanned stops of heavy-duty trucks is failure in the lead-acid starter battery. High availability of the vehicles can be achieved by changing the battery frequently, but such an approach is expensive both due to the frequent visits to a workshop and also due to the component cost. Here, a data-driven method based on random survival forest (RSF) is proposed for predicting the reliability of the batteries. The dataset available for the study, covering more than 50 000 trucks, has two important properties. First, it does not contain measurements related directly to the battery health; second, there are no time series of measurements for every vehicle. In this paper, the RSF method is used to predict the reliability function for a particular vehicle using data from the fleet of vehicles given that only one set of measurements per vehicle is available. A theory for confidence bands for the RSF method is developed, which is an extension of an existing technique for variance estimation in the random forest method. Adding confidence bands to the RSF method gives an opportunity for an engineer to evaluate the confidence of the model prediction. Some aspects of the confidence bands are considered: their asymptotic behavior and usefulness in model selection. A problem of including time-related variables is addressed in this paper with the argument that why it is a good choice not to add them into the model. Metrics for performance evaluation are suggested, which show that the model can be used to schedule and optimize the cost of the battery replacement. The approach is illustrated extensively using the real-life truck data case study.
@article{diva2:1229803,
author = {Voronov, Sergii and Frisk, Erik and Krysander, Mattias},
title = {{Data-Driven Battery Lifetime Prediction and Confidence Estimation for Heavy-Duty Trucks}},
journal = {IEEE Transactions on Reliability},
year = {2018},
volume = {67},
number = {2},
pages = {623--639},
}
This paper presents a new method called optimal shift reassignment (OSR), used for reconfigurable multiplication circuits. These circuits consist of adders, subtractors, shifts, and multiplexers (MUXs). They calculate the multiplication of an input number by one out of several constants which can be selected dynamically during run-time. The OSR method is based on the idea that shifts can be placed at different positions along the circuit, while the calculated output constant stays the same. This differs from previous approaches, which were limited by the fact that all constants within the constant multiplier were forced to be odd. The OSR method subsequently releases this restriction. As a result, the number of required MUXs in the circuit can be reduced. This happens when the shift reassignment aligns the shift values of different inputs of an MUX. Experimental results show MUX savings of up to 50% and average savings between 11% and 16% using the OSR method compared to previous approaches.
@article{diva2:1192552,
author = {Moeller, Konrad and Kumm, Martin and Garrido Gálvez, Mario and Zipf, Peter},
title = {{Optimal Shift Reassignment in Reconfigurable Constant Multiplication Circuits}},
journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
year = {2018},
volume = {37},
number = {3},
pages = {710--714},
}
In this paper, we present new feedforward FFT hardware architectures based on rotator allocation. The rotator allocation approach consists in distributing the rotations of the FFT in such a way that the number of edges in the FFT that need rotators and the complexity of the rotators are reduced. Radix-2 and radix-2(k) feedforward architectures based on rotator allocation are presented in this paper. Experimental results show that the proposed architectures reduce the hardware cost significantly with respect to previous FFT architectures.
@article{diva2:1188342,
author = {Garrido Gálvez, Mario and Huang, Shen-Jui and Chen, Sau-Gee},
title = {{Feedforward FFT Hardware Architectures Based on Rotator Allocation}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2018},
volume = {65},
number = {2},
pages = {581--592},
}
This paper presents an all-digital polar pulsewidth modulated (PWM) transmitter for wireless communications. The transmitter combines baseband PWM and outphasing to compensate for the amplitude error in the transmitted signal due to aliasing and image distortion. The PWM is implemented in a field programmable gate array (FPGA) core. The outphasing is implemented as pulse-position modulation using the FPGA transceivers, which drive two switch-mode power amplifiers fabricated in 130-nm standard CMOS. The transmitter has an all-digital implementation that offers the flexibility to adapt it to multi-standard and multi-band signals. As the proposed transmitter compensates for aliasing and image distortion, an improvement in the linearity and spectral performance is observed as compared with a digital-PWM transmitter. For a 20-MHz LTE uplink signal, the measurement results show an improvement of up to 6.9 dBc in the adjacent channel leakage ratio.
@article{diva2:1188339,
author = {Pasha, Muhammad Touqir and Fahim Ul Haque, Muhammad and Ahmad, Jahanzeb and Johansson, Ted},
title = {{A Modified All-Digital Polar PWM Transmitter}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2018},
volume = {65},
number = {2},
pages = {758--768},
}
In this paper, an efficient mapping of the pipeline single-path delay feedback (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed. By considering the architectural features of the target FPGA, significantly better implementation results are obtained. This is illustrated by mapping an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a better mapping are proposed, resulting in implementation achievements that by far outperforms earlier published work. For Virtex-4, the results show a 350% increase in throughput per slice and 25% reduction in block RAM (BRAM) use, with the same amount of DSP48 resources, compared with the best earlier published result. The resulting Virtex-6 design sees even larger increases in throughput per slice compared with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The results clearly show that the FPGA mapping is crucial, not only the architecture and algorithm choices.
@article{diva2:1140992,
author = {Ingemarsson, Carl and Källström, Petter and Qureshi, Fahad and Gustafsson, Oscar},
title = {{Efficient FPGA Mapping of Pipeline SDF FFT Cores}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2017},
volume = {25},
number = {9},
pages = {2486--2497},
}
The computational complexity evaluation is necessary for software defined Forward Error Correction (FEC) decoders. However, currently there are a limited number of literatures concerning on the FEC complexity evaluation using analytical methods. In this paper, three high efficient coding schemes including Turbo, QC-LDPC and Convolutional code (CC) are investigated. The hardware-friendly decoding pseudo-codes are provided with explicit parallel execution and memory access procedure. For each step of the pseudo-codes, the parallelism and the operations in each processing element are given. Based on it the total amount of operations is derived. The comparison of the decoding complexity among these FEC algorithms is presented, and the percentage of each computation step is illustrated. The requirements for attaining the evaluated results and reference hardware platforms are provided. The benchmarks of state-of-the-art SDR platforms are compared with the proposed evaluations. The analytical FEC complexity results are beneficial for the design and optimization of high throughput software defined FEC decoding platforms.
@article{diva2:1140159,
author = {Wu, Zhenzhi and Gong, Chen and Liu, Dake},
title = {{Computational Complexity Analysis of FEC Decoding on SDR Platforms}},
journal = {Journal of Signal Processing Systems},
year = {2017},
volume = {89},
number = {2},
pages = {209--224},
}
This paper presents a novel pulse-width modulation (PWM) transmitter architecture that compensates for aliasing distortion by combining PWM and outphasing. The proposed transmitter can use either switch-mode PAs (SMPAs) or linear PAs at peak power, ensuring maximum efficiency. The transmitter shows better linearity, improved spectral performance and increased dynamic range compared to other polar PWM transmitters as it does not suffer from AM-AM distortion of the PAs and aliasing distortion due to digital PWM. Measurement results show that the proposed architecture achieves an improvement of 8 dB and 4 dB in the dynamic range compared to the digital polar PWM transmitter (PPWMT) and the aliasing-free PWM transmitter (AF-PWMT), respectively. The proposed architecture also shows better efficiency compared to the AF-PWMT.
@article{diva2:1091646,
author = {Haque, Muhammad Fahim Ul and Pasha, Muhammad Touqir and Johansson, Ted},
title = {{Aliasing-Compensated Polar PWM Transmitter}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2017},
volume = {64},
number = {8},
pages = {912--916},
}
This brief presents a novel 4096-point radix-4 memory-based fast Fourier transform (FFT). The proposed architecture follows a conflict-free strategy that only requires a total memory of size N and a few additional multiplexers. The control is also simple, as it is generated directly from the bits of a counter. Apart from the low complexity, the FFT has been implemented on a Virtex-5 field programmable gate array (FPGA) using DSP slices. The goal has been to reduce the use of distributed logic, which is scarce in the target FPGA. With this purpose, most of the hardware has been implemented in DSP48E. As a result, the proposed FPGA is efficient in terms of hardware resources, as is shown by the experimental results.
@article{diva2:1086602,
author = {Garrido Gálvez, Mario and Angel Sanchez, Miguel and Luisa Lopez-Vallejo, Maria and Grajal, Jesus},
title = {{A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2017},
volume = {25},
number = {1},
pages = {375--379},
}
This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT. The SC FFT is characterized by the use of circuits for bit-dimension permutation of serial data. The proposed architectures are based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated. This fact, together with a proper data management, makes it possible to allocate rotations only every other clock cycle. This allows for simplifying the rotator, halving the complexity with respect to conventional serial FFT architectures. Likewise, the proposed approach halves the number of adders in the butterflies with respect to previous architectures. As a result, the proposed architectures use the minimum number of adders, rotators, and memory that are necessary for a pipelined FFT of serial data, with 100% utilization ratio.
@article{diva2:1046404,
author = {Garrido Gálvez, Mario and Huang, Shen-Jui and Chen, Sau-Gee and Gustafsson, Oscar},
title = {{The Serial Commutator FFT}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2016},
volume = {63},
number = {10},
pages = {974--978},
}
In this paper we propose a new representation for FFT algorithms called the triangular matrix representation. This representation is more general than the binary tree representation and, therefore, it introduces new FFT algorithms that were not discovered before. Furthermore, the new representation has the advantage that it is simple and easy to understand, as each FFT algorithm only consists of a triangular matrix. Besides, the new representation allows for obtaining the exact twiddle factor values in the FFT flow graph easily. This facilitates the design of FFT hardware architectures. As a result, the triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.
@article{diva2:1046403,
author = {Garrido Gálvez, Mario},
title = {{A New Representation of FFT Algorithms Using Triangular Matrices}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2016},
volume = {63},
number = {10},
pages = {1737--1745},
}
In this brief, we propose a novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs). Previous methods achieve unity-gain FFTs by using either complex multipliers or nonunity-gain rotators with additional scaling compensation. Conversely, this brief proposes unity-gain FFTs without compensation circuits, even when using nonunity-gain rotators. This is achieved by a joint design of rotators, so that the entire FFT is scaled by a power of two, which is then shifted to unity. This reduces the amount of hardware resources of the FFT architecture, while having high accuracy in the calculations. The proposed approach can be applied to any FFT size, and various designs for different FFT sizes are presented.
@article{diva2:1046263,
author = {Garrido Gálvez, Mario and Andersson, Rikard and Qureshi, Fahad and Gustafsson, Oscar},
title = {{Multiplierless Unity-Gain SDF FFTs}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2016},
volume = {24},
number = {9},
pages = {3003--3007},
}
This brief presents the feedforward short-time Fourier transform (STFT). This new approach is based on reusing the calculations of the STFT at consecutive time instants. This leads to significant savings in hardware components with respect to fast Fourier transform based STFTs. Furthermore, the feedforward STFT does not have the accumulative error of iterative STFT approaches. As a result, the proposed feedforward STFT presents an excellent tradeoff between hardware utilization and performance.
@article{diva2:1014928,
author = {Garrido Gálvez, Mario},
title = {{The Feedforward Short-Time Fourier Transform}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2016},
volume = {63},
number = {9},
pages = {868--872},
}
In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.
@article{diva2:912029,
author = {Garrido Gálvez, Mario and Källström, Petter and Kumm, Martin and Gustafsson, Oscar},
title = {{CORDIC II: A New Improved CORDIC Algorithm}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2016},
volume = {63},
number = {2},
pages = {186--190},
}
The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.
We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.
@article{diva2:896484,
author = {Alam, Syed Asad and Gustafsson, Oscar},
title = {{On the implementation of time-multiplexed frequency-response masking filters}},
journal = {IEEE Transactions on Signal Processing},
year = {2016},
volume = {64},
number = {15},
pages = {3933--3944},
}
A commonly used signal for engine misfire detection is the crankshaft angular velocity measured at the flywheel. However, flywheel manufacturing errors result in vehicle-to-vehicle variations in the measurements and have a negative impact on the misfire detection performance, where the negative impact is quantified for a number of vehicles. A misfire detection algorithm is proposed with flywheel error adaptation in order to increase robustness and reduce the number of mis-classifications. Since the available computational power is limited in a vehicle, a filter with low computational load, a Constant Gain Extended Kalman Filter, is proposed to estimate the flywheel errors. Evaluations using measurements from vehicles on the road show that the number of mis-classifications is significantly reduced when taking the estimated flywheel errors into consideration.
@article{diva2:806675,
author = {Jung, Daniel and Frisk, Erik and Krysander, Mattias},
title = {{A flywheel error compensation algorithm for engine misfire detection}},
journal = {Control Engineering Practice},
year = {2016},
volume = {47},
pages = {37--47},
}
Trellis codes, including Low-Density Parity-Check (LDPC), turbo, and convolutional code (CC), are widely adopted in advanced wireless standards to offer high-throughput forward error correction (FEC). Designing a multistandard FEC decoder is of great challenge. In this paper, a trellis application specified instruction-set processor (TASIP) is presented for multistandard trellis decoding. A unified forward-backward recursion kernel with an eight-state parallel trellis structure is proposed. Based on the kernel, a datapath for multialgorithm and a shared memory subsystem are introduced. The flexibility and the compatibility are guaranteed by a programmable decoding flow and the trellis decoding instruction set. Synthesis results show that the area consumption is 2.12 mm(2) (65 nm). TASIP provides trimode FEC decoding ability with the throughput of 533, 186, and 225 Mb/s for LDPC, turbo, and 64 states CC under the clock frequency of 200 MHz, which outperforms other trimode proposals both in area efficiency and recursion efficiency. TASIP provides high-throughput decoding for current standards, including 3rd Generation Partnership Project-Long Term Evolution, 802.16e, and 802.11n, with unified architecture and high compatibility.
@article{diva2:886289,
author = {Wu, Zhenzhi and Liu, Dake},
title = {{High-Throughput Trellis Processor for Multistandard FEC Decoding}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2015},
volume = {23},
number = {12},
pages = {2757--2767},
}
n/a
@article{diva2:820495,
author = {Liu, Dake and Wang, Zhihua and Luo, Li},
title = {{Editorial Material: SPECIAL ISSUE ON COMMUNICATION IC in CHINA COMMUNICATIONS, vol 12, issue 5, pp III-VI}},
journal = {China Communications},
year = {2015},
volume = {12},
number = {5},
pages = {III--VI},
}
A model-based misfire detection algorithm is proposed. The algorithm is able to detect misfires and identify the failing cylinder during different conditions, such as cylinder-to-cylinder variations, cold starts, and different engine behavior in different operating points. Also, a method is proposed for automatic tuning of the algorithm based on training data. The misfire detection algorithm is evaluated using data from several vehicles on the road and the results show that a low misclassification rate is achieved even during difficult conditions. (C) 2014 Elsevier Ltd. All rights reserved.
@article{diva2:786796,
author = {Jung, Daniel and Eriksson, Lars and Frisk, Erik and Krysander, Mattias},
title = {{Development of misfire detection algorithm using quantitative FDI performance analysis}},
journal = {Control Engineering Practice},
year = {2015},
volume = {34},
pages = {49--60},
}
This paper presents the design of a 10-bit, 50 MS/s successive approximation register (SAR) analog-to-digital converter (ADC) with an onchip reference voltage buffer implemented in 65 nm CMOS process. The speed limitation on SAR ADCs with off-chip reference voltage and the necessity of a fast-settling reference voltage buffer are elaborated. Design details of a high-speed reference voltage buffer which ensures precise settling of the DAC output voltage in the presence of bondwire inductances are provided. The ADC uses bootstrapped switches for input sampling, a double-tail high-speed dynamic comparator and split binary-weighted capacitive array charge redistribution DACs. The split binary-weighted array DAC topology helps to achieve low area and less capacitive load and thus enhances power efficiency. Top-plate sampling is utilized in the DAC to reduce the number of switches. In post-layout simulation which includes the entire pad frame and associated parasitics, the ADC achieves an ENOB of 9.25 bits at a supply voltage of 1.2 V, typical process corner and sampling frequency of 50 MS/s for near-Nyquist input. Excluding the reference voltage buffer, the ADC consumes 697 μW and achieves an energy efficiency of 25 fJ/conversionstep while occupying a core area of 0.055 mm2.
@article{diva2:762360,
author = {Harikumar, Prakash and Wikner, Jacob},
title = {{A 10-bit 50 MS/s SAR ADC in 65 nm CMOS with On-Chip Reference Voltage Buffer}},
journal = {Integration},
year = {2015},
volume = {50},
pages = {28--38},
}
We present the design of an integrated multiplexer and a dc clamp for the input analog interface of a high-speed video digitizer in the 1.1-V 65-nm complementary metal-oxide-semiconductor process. The ac-coupled video signal is dc restored using a novel all-digital current-mode charge pump. An eight-input multiplexer is realized with T-switches, each containing two series-connected bootstrapped switches. A T-switchs grounding branch is merged with the pull-down end of the clamping charge pump. An adaptive digital feedback loop encompassing a video analog-to-digital converter (ADC) controls the clamp charge pump. The bootstrapped switches have been adapted to suit the video environment, allowing on-the-fly recharging. The varying ON-resistance of the conventional bootstrapped switch is utilized to linearize the multiplexer response by canceling the effect of the nonlinear load capacitance contributed by the clamp transistors. Under worst case conditions, the multiplexer maintains a 62-85-dB spurious-free dynamic range over a range of known input video frequencies, and it reduces the second-order harmonic component upon optimization. The dc clamp provides 12-bit precision over the full range of the video ADC and can set the dc at the target level for at most 194 video lines.
@article{diva2:778921,
author = {Angelov, Pavel and Ahmed Aamir, Syed and Wikner, Jacob},
title = {{A 1.1-V Analog Multiplexer With an Adaptive Digital Clamp for CMOS Video Digitizers}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2014},
volume = {61},
number = {11},
pages = {860--864},
}
Synthesizable all-digital ADCs that can be designed, verified and taped out using a digital design flow are of interest due to a consequent reduction in design cost and an improved technology portability. As a step towards high performance synthesizable ADCs built using generic and low accuracy components, an ADC designed exclusively with standard digital cell library components is presented. The proposed design is a time-mode circuit employing a VCO based multi-bit quantizer. The ADC has first order noise-shaping due to inherent error feedback of the oscillator and sinc anti-aliasing filtering due to continuous-time sampling. The proposed architecture employs a Gray-counter based quantizer design, which mitigates the problem of partial sampling of digital data in multi-bit VCO-based quantizers. Furthermore, digital correction employing polynomial-fit estimation is proposed to correct for VCO non-linearity. The design occupies 0.026 mm when fabricated in a 65 nm CMOS process and delivers an ENOB of 8.1 bits over a signal bandwidth of 25.6 MHz, while sampling at 205 MHz. The performance is comparable to that of recently reported custom designed single-ended open-loop VCO-based ADCs, while being designed exclusively with standard cells, and consuming relatively low average power of 3.3 mW achieving an FoM of 235 fJ/step.
@article{diva2:778352,
author = {Unnikrishnan, Vishnu and Vesterbacka, Mark},
title = {{Time-Mode Analog-to-Digital Conversion Using Standard Cells}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2014},
volume = {61},
number = {12},
pages = {3348--3357},
}
This paper presents a bit reversal circuit for continuous-flow parallel pipelined FFT processors. In addition to two flexible commutators, the circuit consists of two memory groups, where each group has P memory banks. For the consideration of achieving both low delay time and area complexity, a novel write/read scheduling mechanism is devised, so that FFT outputs can be stored in those memory banks in an optimized way. The proposed scheduling mechanism can write the current successively generated FFT output data samples to the locations without any delay right after they are successively released by the previous symbol. Therefore, total memory space of only N data samples is enough for continuous-flow FFT operations. Since read operation is not overlapped with write operation during the entire period, only single-port memory is required, which leads to great area reduction. The proposed bit-reversal circuit architecture can generate natural-order FFT output and support variable power-of-2 FFT lengths.
@article{diva2:763891,
author = {Chen, Sau-Gee and Huang, Shen-Jui and Garrido Gálvez, Mario and Jou, Shyh-Jye},
title = {{Continuous-flow Parallel Bit-Reversal Circuit for MDF and MDC FFT Architectures}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2014},
volume = {61},
number = {10},
pages = {2869--2877},
}
In this brief, we propose how the hardware complexity of arbitrary-order digital multibit error-feedback delta-sigma modulators can be reduced. This is achieved by splitting the combinatorial circuitry of the modulators into two parts, i.e., one producing the modulator output and another producing the error signal fed back. The part producing modulator output is removed by utilizing a unit-element-based digital-to-analog converter. To illustrate the reduced complexity and power consumption, we compare the synthesized results with those of conventional structures. Fourth-order modulators implemented with the proposed technique use up to 26% less area compared with conventional implementations. Due to the area reduction, the designs consume up to 33% less dynamic power. Furthermore, it can operate at a frequency 100 MHz higher than that of the conventional.
@article{diva2:755591,
author = {Afzal, Nadeem and Wikner, Jacob and Gustafsson, Oscar},
title = {{Reducing Complexity and Power of Digital Multibit Error-Feedback Delta Sigma Modulators}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2014},
volume = {61},
number = {9},
pages = {641--645},
}
This paper presents a new approach to design multiplierless constant rotators. The approach is based on a combined coefficient selection and shift-and-add implementation (CCSSI) for the design of the rotators. First, complete freedom is given to the selection of the coefficients, i.e., no constraints to the coefficients are set in advance and all the alternatives are taken into account. Second, the shift-and-add implementation uses advanced single constant multiplication (SCM) and multiple constant multiplication (MCM) techniques that lead to low-complexity multiplierless implementations. Third, the design of the rotators is done by a joint optimization of the coefficient selection and shift-and-add implementation. As a result, the CCSSI provides an extended design space that offers a larger number of alternatives with respect to previous works. Furthermore, the design space is explored in a simple and efficient way. The proposed approach has wide applications in numerous hardware scenarios. This includes rotations by single or multiple angles, rotators in single or multiple branches, and different scaling of the outputs. Experimental results for various scenarios are provided. In all of them, the proposed approach achieves significant improvements with respect to state of the art.
@article{diva2:738039,
author = {Garrido Gálvez, Mario and Qureshi, Fahad and Gustafsson, Oscar},
title = {{Low-Complexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2014},
volume = {61},
number = {7},
pages = {2002--2012},
}
This paper introduces add-equalize structures for the implementation of linear-phase Nyquist (th-band) finite-length impulse response (FIR) filter interpolators and decimators. The paper also introduces a systematic design technique for these structures based on iteratively reweighted -norm minimization. In the proposed structures, the polyphase components share common parts which leads to a considerably lower implementation complexity as compared to conventional single-stage converter structures. The complexity is comparable to that of multi-stage Nyquist structures. A main advantage of the proposed structures is that they work equally well for all integer conversion factors, thus including prime numbers which cannot be handled by the regular multi-stage Nyquist converters. Moreover, the paper shows how to utilize the frequency-response masking approach to further reduce the complexity for sharp-transition specifications. It also shows how the proposed structures can be used to reduce the complexity for reconfigurable sampling rate converters. Several design examples are included to demonstrate the effectiveness of the proposed structures.
@article{diva2:732913,
author = {Johansson, Håkan and Eghbali, Amir},
title = {{Add-Equalize Structures for Linear-Phase Nyquist FIR Filter Interpolators and Decimators}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2014},
volume = {61},
number = {6},
pages = {1766--1777},
}
This paper introduces two polynomial finite-length impulse response (FIR) digital filter structures with simultaneously variable fractional delay (VFD) and phase shift (VPS). The structures are reconfigurable (adaptable) online without redesign and do not exhibit transients when the VFD and VPS parameters are altered. The structures can be viewed as generalizations of VFD structures in the sense that they offer a VPS in addition to the regular VFD. The overall filters are composed of a number of fixed subfilters and a few variable multipliers whose values are determined by the desired FD and PS values. A systematic design algorithm, based on iteratively reweighted l(1)- norm minimization, is proposed. It generates fixed subfilters with many zero-valued coefficients, typically located in the impulse response tails. The paper considers two different structures, referred to as the basic structure and common-subfilters structure, and compares these proposals as well as the existing cascaded VFD and VPS structures, in terms of arithmetic complexity, delay, memory cost, and transients. In general, the common-subfilters structure is superior when all of these aspects are taken into account. Further, the paper shows and exemplifies that the VFDPS filters under consideration can be used for simultaneous resampling and frequency shift of signals.
@article{diva2:722040,
author = {Johansson, Håkan and Eghbali, Amir},
title = {{Two Polynomial FIR Filter Structures With Variable Fractional Delay and Phase Shift}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2014},
volume = {61},
number = {5},
pages = {1355--1365},
}
This paper describes the front-end of a fully integrated analog interface for 300 MSps, high-definition video digitizers in a system on-chip environment. The analog interface is implemented in a 1.2 V, 65-nm digital CMOS process and the design minimizes the number of power domains using core transistors only. Each analog video receiver channel contains an integrated multiplexer with a current-mode dc-clamp, a programmable gain amplifier (PGA) and a pseudo second-order RC low-pass filter. The digital charge-pump clamp is integrated with low-voltage bootstrapped tee-switches inside the multiplexer, while restoring the dc component of ac-coupled inputs. The PGA contains a four-stage fully symmetric pseudo-differential amplifier with common-mode feedforward and inherent common-mode feedback, utilized in a closed loop capacitive feedback configuration. The amplifier features offset cancellation during the horizontal blanking. The video interface is evaluated using a unique test signal over a range of video formats for INL+/DNL+, INL-/DNL-. The 0.07-0.39 mV INL, 2-70 mu V DNL, and 66-74 dB of SFDR, enable us to target various formats for 9-12 bit Low-voltage digitizers.
@article{diva2:714042,
author = {Ahmed Aamir, Syed and Angelov, Pavel and Wikner, Jacob},
title = {{1.2-V Analog Interface for a 300-MSps HD Video Digitizer in Core 65-nm CMOS}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2014},
volume = {22},
number = {4},
pages = {888--898},
}
This paper proposes optimal finite-length impulse response (FIR) digital filters, in the least-squares (LS) sense, for compensation of chromatic dispersion (CD) in digital coherent optical receivers. The proposed filters are based on the convex minimization of the energy of the complex error between the frequency responses of the actual CD compensation filter and the ideal CD compensation filter. The paper utilizes the fact that pulse shaping filters limit the effective bandwidth of the signal. Then, the filter design for CD compensation needs to be performed over a smaller frequency range, as compared to the whole frequency band in the existing CD compensation methods. By means of design examples, we show that our proposed optimal LS FIR CD compensation filters outperform the existing filters in terms of performance, implementation complexity, and delay.
@article{diva2:712955,
author = {Eghbali, Amir and Johansson, Håkan and Gustafsson, Oscar and Savory, Seb J.},
title = {{Optimal Least-Squares FIR Digital Filters for Compensation of Chromatic Dispersion in Digital Coherent Optical Receivers}},
journal = {Journal of Lightwave Technology},
year = {2014},
volume = {32},
number = {8},
pages = {1449--1456},
}
Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because ofmultiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directlyreplaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in theLNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branchingstrategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditionalfinite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturallyprovides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length propertiesof FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error comparedto a standard FIR filter.
@article{diva2:711604,
author = {Alam, Syed Asad and Gustafsson, Oscar},
title = {{Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain}},
journal = {VLSI design (Print)},
year = {2014},
volume = {2014},
number = {217495},
}
The paper proposes an optimization technique for the design of variable digital filters with simultaneously tunable bandedge and fractional delay using a fast filter bank (FFB) approach. In the FFB approach, full band signals are split into multibands, and each band is multiplied by a proper phase shift to realize the variable fractional delay. In the proposed technique, in the formulation of the optimization of the 0th stage prototype filter of the FFB, the ripples of the filters in the subsequent stages are all taken into consideration. In addition, a shaping filter is applied to the last retained band of the FFB to form the transition band of the variable filter, such that the transition width of each band in the FFB can be relaxed to reduce the computational complexity. In total three shaping filters, constructed from a prototype filter, can be shared by different bands, so that the extra cost incurred due to the shaping filter is low.
@article{diva2:710330,
author = {Jing Xu, Wei and Jun Yu, Ya and Johansson, Håkan},
title = {{Improved Filter Bank Approach for the Design of Variable Bandedge and Fractional Delay Filters}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2014},
volume = {61},
number = {3},
pages = {764--777},
}
This paper proposes a method for designing high-order linear-phase finite-length impulse response (FIR) filters which are required as, e.g., the prototype filters in filter banks (FBs) and transmultiplexers (TMUXs) with a large number of channels. The proposed method uses the Farrow structure to express the polyphase components of the desired filter. Thereby, the only unknown parameters, in the filter design, are the coefficients of the Farrow subfilters. The number of these unknown parameters is considerably smaller than that of the direct filter design methods. Besides these unknown parameters, the proposed method needs some predefined multipliers. Although the number of these multipliers is larger than the number of unknown parameters, they are known a priori. The proposed method is generally applicable to any linear-phase FIR filter irrespective of its order being high, low, even, or odd as well as the impulse response being symmetric or antisymmetric. However, it is more efficient for filters with high orders as the conventional design of such filters is more challenging. For example, to design a linear-phase FIR lowpass filter of order 131071 with a stopband attenuation of about 55 dB, which is used as the prototype filter of a cosine modulated filter bank (CMFB) with 8192 channels, our proposed method requires only 16 unknown parameters. The paper gives design examples for individual lowpass filters as well as the prototype filters for fixed and flexible modulated FBs.
@article{diva2:708863,
author = {Eghbali, Amir and Johansson, Håkan},
title = {{On Efficient Design of High-Order Filters With Applications to Filter Banks and Transmultiplexers With Large Number of Channels}},
journal = {IEEE Transactions on Signal Processing},
year = {2014},
volume = {62},
number = {5},
pages = {1198--1209},
}
This paper presents formulas for the number of optimization parameters (degrees of freedom) when designing Type I linear-phase finite-length impulse response (FIR) Lth-band filters of order 2N as cascades of identical linear-phase FIR spectral factors of order N. We deal with two types of degrees of freedom referred to as (i) the total degrees of freedom D-T, and (ii) the remaining degrees of freedom D-R. Due to the symmetries or antisymmetries in the impulse responses of the spectral factors, D-T roughly equals N/2. Some of these parameters are specifically needed to meet the Lth-band conditions because, in an Lth-band filter, every Lth coefficient is zero and the center tap equals 1/L. The remaining D-R parameters can then be used to improve the stopband characteristics of the overall Lth-band filter. We derive general formulas for D-R with given pairs of L and N. It is shown that for a fixed L, the choices of N, in a close neighborhood, may even decrease D-R despite increasing the arithmetic complexity, order, and the delay.
@article{diva2:706693,
author = {Eghbali, Amir and Saramaki, Tapio and Johansson, Håkan},
title = {{Conditions for Lth-band filters of order 2N as cascades of identical linear-phase FIR spectral factors of order N}},
journal = {Signal Processing},
year = {2014},
volume = {97},
number = {April},
}
This paper introduces a class of reconfigurable two-stage Nyquist filters where the Farrow structure realizes the polyphase components of linear-phase finite-length impulse response (FIR) filters. By adjusting the variable predetermined multipliers of the Farrow structure, various linear-phase FIR Nyquist filters and integer interpolation/decimation structures are obtained, online. However, the filter design problem is solved only once, offline. Design examples, based on the reweighted l(1)-norm minimization, illustrate the proposed method. Savings in the arithmetic complexity are obtained when compared to the reconfigurable single-stage structures.
@article{diva2:698189,
author = {Eghbali, Amir and Johansson, Håkan},
title = {{A class of reconfigurable and low-complexity two-stage Nyquist filters}},
journal = {Signal Processing},
year = {2014},
volume = {96},
pages = {164--172},
}
Information about wheel loader usage can be used in several ways to optimize customer adaption. First, optimizing the configuration and component sizing of a wheel loader to customer needs can lead to a significant improvement in e.g. fuel efficiency and cost. Second, relevant driving cycles to be used in the development of wheel loaders can be extracted from usage data. Third, on-line usage identification opens up for the possibility of implementing advanced look-ahead control strategies for wheel loader operation. The main objective here is to develop an on-line algorithm that automatically, using production sensors only, can extract information about the usage of a machine. Two main challenges are that sensors are not located with respect to this task and that significant usage disturbances typically occur during operation. The proposed method is based on a combination of several individually simple techniques using signal processing, state automaton techniques, and parameter estimation algorithms. The approach is found to berobust when evaluated on measured data of wheel loaders loading gravel and shot rock.
@article{diva2:620352,
author = {Nilsson, Tomas and Nyberg, Peter and Sundström, Christofer and Frisk, Erik and Krysander, Mattias},
title = {{Robust Driving Pattern Detection and Identification with a Wheel Loader Application}},
journal = {International journal of vehicle systems modelling and testing},
year = {2014},
volume = {9},
number = {1},
pages = {56--76},
}
@article{diva2:790461,
author = {Ashrafi, Ashkan and Strollo, Antonio G. M. and Gustafsson, Oscar},
title = {{Hardware implementation of digital signal processing algorithms}},
journal = {Journal of Electrical and Computer Engineering},
year = {2013},
volume = {2013},
number = {782575},
pages = {1--2},
}
A pipelined circuit to calculate linear regression is presented. The proposed circuit has the advantages that it can process a continuous flow of data, it does not need memory to store the input samples and supports variable length that can be reconfigured in run time. The circuit is efficient in area, as it consists of a small number of adders, multipliers and dividers. These features make it very suitable for real-time applications, as well as for calculating the linear regression of a large number of samples.
@article{diva2:692306,
author = {Garrido Gálvez, Mario and Grajal, J.},
title = {{Continuous-flow variable-length memoryless linear regression architecture}},
journal = {Electronics Letters},
year = {2013},
volume = {49},
number = {24},
pages = {1567--1568},
}
In time-interleaved analog-to-digital converters (TI-ADCs), the timing mismatches between the channels result in a periodically nonuniformly sampled sequence at the output. Such nonuniformly sampled output limits the achievable resolution of the TI-ADC. In order to correct the errors due to timing mismatches, the output of the TI-ADC is passed through a digital time-varying finite-length impulse response reconstructor. Such reconstructors convert the nonuniformly sampled output sequence to a uniformly spaced output. Since the reconstructor runs at the output rate of the TI-ADC, it is beneficial to reduce the number of coefficient multipliers in the reconstructor. Also, it is advantageous to have as few coefficient updates as possible when the timing errors change. Reconstructors that reduce the number of multipliers to be updated online do so at a cost of increased number of multiplications per corrected output sample. This paper proposes a technique which can be used to reduce the number of reconstructor coefficients that need to be updated online without increasing the number of multiplications per corrected output sample.
@article{diva2:664481,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{Efficient signal reconstruction scheme for \emph{ M}-channel time-interleaved ADCs}},
journal = {Analog Integrated Circuits and Signal Processing},
year = {2013},
volume = {77},
number = {2},
pages = {113--122},
}
We propose a non-binary stochastic decoding algorithm for low-density parity-check (LDPC) codes over GF(q) with degree two variable nodes, called Adaptive Multiset Stochastic Algorithm (AMSA). The algorithm uses multisets, an extension of sets that allows multiple occurrences of an element, to represent probability mass functions that simplifies the structure of the variable nodes. The run-time complexity of one decoding cycle using AMSA is O(q) for conventional memory architectures, and O(1) if a custom memory architecture is used. Two fully-parallel AMSA decoders are implemented on FPGA for two (192,96) (2,4)-regular codes over GF(64) and GF(256), both achieving a maximum clock frequency of 108 MHz. The GF(64) decoder has a coded throughput of 65 Mb/s at E-b/N-0 = 2.4 dB when using conventional memory, while a decoder using the custom memory version can achieve 698 Mb/s at the same E-b/N-0. At a frame error rate (FER) of 2 x 10(-6) the GF(64) version of the algorithm is only 0.04 dB away from the floating-point SPA performance, and for the GF(256) code the difference is 0.2 dB. To the best of our knowledge, this is the first fully parallel non-binary LDPC decoder over GF(256) reported in the literature.
@article{diva2:642979,
author = {Ciobanu, Alexandru and Hemati, Saied and Gross, Warren J.},
title = {{Adaptive Multiset Stochastic Decoding of Non-Binary LDPC Codes}},
journal = {IEEE Transactions on Signal Processing},
year = {2013},
volume = {61},
number = {16},
pages = {4100--4113},
}
This paper proposes a method to design variable fractional-delay (FD) filters using the Farrow structure. In the transfer function of the Farrow structure, different subfilters are weighted by different powers of the FD value. As both the FD value and its powers are smaller than 0.5, our proposed method uses them as diminishing weighting functions. The approximation error, for each subfilter, is then increased in proportion to the power of the FD value. This gives a new distribution for the orders of the Farrow subfilters which has not been utilized before. This paper also includes these diminishing weighting functions in the filter design so as to obtain their optimal values, iteratively. We consider subfilters of both even and odd orders. Examples illustrate our proposed method and comparisons, to various earlier designs, show a reduction of the arithmetic complexity.
@article{diva2:640719,
author = {Eghbali, Amir and Johansson, Håkan and Saramaki, Tapio},
title = {{A method for the design of Farrow-structure based variable fractional-delay FIR filters}},
journal = {Signal Processing},
year = {2013},
volume = {93},
number = {5},
pages = {1341--1348},
}
In this paper, the fixed-point implementation of adjustable fractional-delay filters using the Farrow structure is considered. Based on the observation that the sub-filters approximate differentiators, closed-form expressions for the L-2-norm scaling values at the outputs of each sub-filter as well as at the inputs of each delay multiplier are derived. The scaling values can then be used to derive suitable word lengths by also considering the round-off noise analysis and optimization. Different approaches are proposed to derive suitable word lengths including one based on integer linear programming, which always gives an optimal allocation. Finally, a new approach for multiplierless implementation of the sub-filters in the Farrow structure is suggested. This is shown to reduce register complexity and, for most word lengths, require less number of adders and subtracters when compared to existing approaches.
@article{diva2:640713,
author = {Abbas, Muhammad and Gustafsson, Oscar and Johansson, Håkan},
title = {{On the Fixed-Point Implementation of Fractional-Delay Filters Based on the Farrow Structure}},
journal = {IEEE Transactions on Circuits and Systems Part 1},
year = {2013},
volume = {60},
number = {4},
pages = {926--937},
}
Despite the outstanding performance of non-binary low-density parity-check (LDPC) codes over many communication channels, they are not in widespread use yet. This is due to the high implementation complexity of their decoding algorithms, even those that compromise performance for the sake of simplicity. less thanbrgreater than less thanbrgreater thanIn this paper, we present three algorithms based on stochastic computation to reduce the decoding complexity. The first is a purely stochastic algorithm with error-correcting performance matching that of the sum-product algorithm (SPA) for LDPC codes over Galois fields with low order and a small variable node degree. We also present a modified version which reduces the number of decoding iterations required while remaining purely stochastic and having a low per-iteration complexity. less thanbrgreater than less thanbrgreater thanThe second algorithm, relaxed half-stochastic (RHS) decoding, combines elements of the SPA and the stochastic decoder and uses successive relaxation to match the error-correcting performance of the SPA. Furthermore, it uses fewer iterations than the purely stochastic algorithm and does not have limitations on the field order and variable node degree of the codes it can decode. less thanbrgreater than less thanbrgreater thanThe third algorithm, NoX, is a fully stochastic specialization of RHS for codes with a variable node degree 2 that offers similar performance, but at a significantly lower computational complexity. less thanbrgreater than less thanbrgreater thanWe study the performance and complexity of the algorithms; noting that all have lower per-iteration complexity than SPA and that RHS can have comparable average per-codeword computational complexity, and NoX a lower one.
@article{diva2:627330,
author = {Sarkis, Gabi and Hemati, Saied and Mannor, Shie and Gross, Warren J.},
title = {{Stochastic Decoding of LDPC Codes over GF(q)}},
journal = {IEEE Transactions on Communications},
year = {2013},
volume = {61},
number = {3},
pages = {939--950},
}
This paper considers two-rate based structures for variable fractional-delay (VFD) finite-length impulse response (FIR) filters. They are single-rate structures but derived through a two-rate approach. The basic structure considered hitherto utilizes a regular half-band (HB) linear-phase filter and the Farrow structure with linear-phase subfilters. Especially for wide-band specifications, this structure is computationally efficient because most of the overall arithmetic complexity is due to the HB filter which is common to all Farrow-structure subfilters. This paper extends and generalizes existing results. Firstly, frequency-response masking (FRM) HB filters are utilized which offer further complexity reductions. Secondly, both linear-phase and low-delay subfilters are treated and combined which offers trade-offs between the complexity, delay, and magnitude response overshoot which is typical for low-delay filters. Thirdly, the HB filter is replaced by a general filter which enables additional frequency-response constraints in the upper frequency band which normally is treated as a dont-care band. Wide-band design examples (90, 95, and 98% of the Nyquist band) reveal arithmetic complexity savings between some 20 and 85% compared with other structures, including infinite-length impulse response structures. Hence, the VFD filter structures proposed in this paper exhibit the lowest arithmetic complexity among all hitherto published VFD filter structures.
@article{diva2:604073,
author = {Johansson, Håkan and Hermanowicz, Ewa},
title = {{Two-Rate Based Low-Complexity Variable Fractional-Delay FIR Filter Structures}},
journal = {IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS},
year = {2013},
volume = {60},
number = {1},
pages = {136--149},
}
The appearance of radix-2(2) was a milestone in the design of pipelined FFT hardware architectures. Later, radix-2(2) was extended to radix-2(k). However, radix-2(k) was only proposed for single-path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay commutator (MDC). This paper presents the radix-2(k) feedforward (MDC) FFT architectures. In feedforward architectures radix-2(k) canbe used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for the most demanding applications. Indeed, the proposed radix-2(k) feedforward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix-2(k) feedforward architectures not only offer an attractive solution for current applications, but also open up a new research line on feedforward structures.
@article{diva2:602884,
author = {Garrido Gálvez, Mario and Grajal, J and Sanchez, M A. and Gustafsson, Oscar},
title = {{Pipelined Radix-2(k) Feedforward FFT Architectures}},
journal = {IEEE Transactions on Very Large Scale Integration (vlsi) Systems},
year = {2013},
volume = {21},
number = {1},
pages = {23--32},
}
A unified hardware architecture that can be reconfigured to calculate 2, 3, 4, 5, or 7-point DFTs is presented. The architecture is based on the Winograd Fourier transform algorithm and the complexity is equal to a 7-point DFT in terms of adders/subtractors and multipliers plus only seven multiplexers introduced to enable reconfigurability. The processing element finds potential use in memory-based FFTs, where non-power-of-two sizes are required such as in DMB-T.
@article{diva2:492039,
author = {Qureshi, Fahad and Garrido, Mario and Gustafsson, Oscar},
title = {{Unified architecture for 2, 3, 4, 5, and 7-point DFTs based on Winograd Fourier transform algorithm}},
journal = {Electronics Letters},
year = {2013},
volume = {49},
number = {5},
pages = {348--U60},
}
This paper deals with time-varying finite-length impulse response (FIR) filters used for reconstruction of two-periodic nonuniformly sampled signals. The complexity of such reconstructorsincreases as their bandwidth approaches the whole Nyquist band. Reconstructor design that yields minimum reconstructor order requires expensive online redesign while those methods that simplify online redesign result in higher reconstructor complexity. This paper utilizes a two-rate approach to derive a single-rate structure where part of the complexity of the reconstructor is moved to a symmetric filter so as to reduce the number of multipliers. The symmetric filter is designed such that it can be used for all time-skew errors within a certain range, thereby reducing the number of coefficients that need online redesign. The basic two-rate based reconstructor is further extended to completely remove the need for online redesign at the cost of a slight increase in the total number of multipliers.
@article{diva2:707576,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{Two-rate based low-complexity time-varying discrete-time FIR reconstructors for two-periodic nonuniformly sampled signals}},
journal = {Sampling Theory in Signal and Image Processing},
year = {2012},
volume = {11},
number = {2-3},
pages = {195--220},
}
A Design-Build-Test (DBT) course in electronics is presented. The course is designed based on the CDIO (Conceive-Design-Implement-Operate) framework for engineering education. It is part of the curriculum of two engineering programs at Linköping University, Sweden, where it has been given successfully for a number of years. The cornerstones of the course consist of carefully designed learning outcomes based on the CDIO Syllabus, a structured project management model such that the project tasks are carried out according to professional and industry-like routines, with well-designed organisation of the staff supporting the course, and challenging project tasks.
@article{diva2:647743,
author = {Svensson, Tomas and Gunnarsson, Svante},
title = {{A Design-Build-Test course in electronics based on the CDIO framework for engineering education}},
journal = {International Journal of Electrical Engineering Education},
year = {2012},
volume = {49},
number = {4},
pages = {349--364},
}
This brief considers fractional-delay finite-length impulse response (FIR) filters and a class of supersymmetric Mth-band linear-phase FIR filters utilizing partially symmetric and partially antisymmetric impulse responses. Design examples reveal significant multiplication savings, depending on the specification, as compared to traditional filters.
@article{diva2:544971,
author = {Johansson, Håkan},
title = {{Fractional-Delay and Supersymmetric Mth-Band Linear-Phase FIR Filters Utilizing Partially Symmetric and Antisymmetric Impulse Responses}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2012},
volume = {59},
number = {6},
pages = {366--370},
}
This paper introduces multimode transmultiplexers (TMUXs) in which the Farrow structure realizes the polyphase components of general lowpass interpolation/decimation filters. As various lowpass filters are obtained by one set of common Farrow subfilters, only one offline filter design enables us to cover different integer sampling rate conversion (SRC) ratios. A model of general rational SRC is also constructed where the same fixed subfilters perform rational SRC. These two SRC schemes are then used to construct multimode TMUXs. Efficient implementation structures are introduced and different filter design techniques such as minimax and least-squares (LS) are discussed. By means of simulation results, it is shown that the performance of the transmultiplexer (TMUX) depends on the ripples of the filters. With the error vector magnitude (EVM) as the performance metric, the LS method has a superiority over the minimax approach.
@article{diva2:526368,
author = {Eghbali, Amir and Johansson, Håkan and Löwenborg, Per},
title = {{A Class of Multimode Transmultiplexers Based on the Farrow Structure}},
journal = {Circuits, systems, and signal processing},
year = {2012},
volume = {31},
number = {3},
pages = {961--985},
}
We present a radix-4 static CMOS full adder circuit that reduces the propagation delay, PDP, and EDP in carry-based adders compared with using a standard radix-2 full adder solution. The improvements are obtained by employing carry look-ahead technique at the transistor level. Spice simulations using 45 nm CMOS technology parameters with a power supply voltage of 1.1 V indicate that the radix-4 circuit is 24% faster than a 2-bit radix-2 ripple carry adder with slightly larger transistor count, whereas the power consumption is almost the same. A second scheme for radix-2 and radix-4 adders that have a reduced number of transistors in the carry path is also investigated. Simulation results also confirm that the radix-4 adder gives better performance as compared to a standard 2-bit CLA. 32-Bit ripple carry, 2-stage carry select, variable size carry select, and carry skip adders are implemented with the different full adders as building blocks. There are POP savings, with one exception, for the 32-bit adders in the range 8-18% and EDP savings in the range 21-53% using radix-4 as compared to radix-2.
@article{diva2:515448,
author = {Asif, Shahzad and Vesterbacka, Mark},
title = {{Performance analysis of radix-4 adders}},
journal = {Integration},
year = {2012},
volume = {45},
number = {2},
pages = {111--120},
}
We describe a high-rate energy-resolving photon-counting ASIC aimed for spectral computed tomography. The chip has 160 channels and 8 energy bins per channel. It demonstrates a noise level of ENC= electrons at 5 pF input load at a power consumption of andlt;5mW/channel. Maximum count rate is 17 Mcps at a peak time of 40 ns, made possible through a new filter reset scheme, and maximum read-out frame rate is 37 kframe/s.
@article{diva2:508710,
author = {Gustavsson, Mikael and Ul Amin, Farooq and Bjorklid, Anders and Ehliar, Andreas and Xu, Cheng and Svensson, Christer},
title = {{A High-Rate Energy-Resolving Photon-Counting ASIC for Spectral Computed Tomography}},
journal = {IEEE Transactions on Nuclear Science},
year = {2012},
volume = {59},
number = {1},
pages = {30--39},
}
This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.
@article{diva2:504548,
author = {Asghar, Rizwan and Wu, Di and Saeed, Ali and Huang, Yulin and Liu, Dake},
title = {{Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support}},
journal = {Journal of Signal Processing Systems},
year = {2012},
volume = {66},
number = {1},
pages = {25--41},
}
The coefficient decimation technique for reconfigurable FIR filters was recently proposed as a filter structure with low computational complexity. In this brief, we propose to design these filters using linear programming taking all configuration modes into account, instead of only considering the initial reconfiguration mode as in previous works. Minimax solutions with significantly lower approximation errors compared to the straightforward design method in earlier works are obtained. In addition, some new insights that are useful when designing coefficient decimation filters are provided.
@article{diva2:503634,
author = {Sheikh, Zaka Ullah and Gustafsson, Oscar},
title = {{Linear Programming Design of Coefficient Decimation FIR Filters}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2012},
volume = {59},
number = {1},
pages = {60--64},
}
This correspondence introduces efficient realizations of wide-band LTI systems. They are single-rate realizations but derived via multirate techniques and sparse bandpass filters. The realizations target mid-band systems with narrow don’t-care bands near the zero and Nyquist frequencies. Design examples for fractional-order differentiators demonstrate substantial complexity savings as compared to the conventional minimax-optimal direct-form realizations.
@article{diva2:503631,
author = {Sheikh, Zaka Ullah and Johansson, Håkan},
title = {{Efficient Wide-Band FIR LTI Systems Derived Via Multi-Rate Techniques and Sparse Bandpass}},
journal = {IEEE Transactions on Signal Processing},
year = {2012},
volume = {60},
number = {7},
pages = {3859--3863},
}
This correspondence introduces a technique for efficient realization of wide-band finite-length impulse response (FIR) linear and timeinvariant (LTI) systems. It divides the overall frequency region into three subregions through lowpass, bandpass, and highpass filters realized in terms of only one filter. The actual function to be approximated is in the low- and high-frequency regions realized using periodic subsystems. In this way, one can realize an overall wide-band LTI function in terms of three low-cost subblocks, leading to a reduced overall arithmetic complexity as compared to the regular realization. A systematic design technique is provided and a detailed example shows multiplication and addition savings of 62 and 48 percent, respectively, for a fractional-order differentiator with a 96 percent utilization of the bandwidth. Another example shows that the savings increase/decrease with increased/decreased bandwidth.
@article{diva2:503629,
author = {Sheikh, Zaka Ullah and Johansson, Håkan},
title = {{A Technique for Efficient Realization of Wide-Band FIR LTI Systems}},
journal = {IEEE Transactions on Signal Processing},
year = {2012},
volume = {60},
number = {3},
pages = {1482--1486},
}
This correspondence outlines a method for designing two-stage Nyquist filters. The Nyquist filter is split into two equal and linear-phase finite-length impulse response spectral factors. The per-time-unit multiplicative complexity, of the overall structure, is included as the objective function. Examples are then provided where Nyquist filters are designed so as to minimize the multiplicative complexity subject to the constraints on the overall Nyquist filter. In comparison to the single-stage case, the two-stage realization reduces the multiplicative complexity by an average of 48%. For two-stage sampling rate conversion (SRC), the correspondence shows that it is better to have a larger SRC ratio in the first stage. © 2006 IEEE.
@article{diva2:480901,
author = {Eghbali, Amir and Saramaki, T. and Johansson, Håkan},
title = {{On two-stage Nyquist pulse shaping filters}},
journal = {IEEE Transactions on Signal Processing},
year = {2012},
volume = {60},
number = {1},
pages = {483--488},
}
The ePUMA architecture is a novel parallel archi- tecture being developed as a platform for low-power computing, typically for embedded or hand-held devices. It was originally designed for radio baseband processors for hand-held devices and for radio base stations. It has also been adapted for executing high definition video CODECs. In this paper, we investigate the possibilities and limitations of the platform for real-time graphics, with focus on hand-held gaming.
@article{diva2:437260,
author = {Ragnemalm, Ingemar and Liu, Dake},
title = {{Adapting the ePUMA Architecture for Hand-held Video Games}},
journal = {International Journal of Computer Information Systems and Industrial Management Applications},
year = {2012},
volume = {4},
pages = {153--160},
}
In this work we consider optimized twiddle factor multipliers based on shift-and-add-multiplication. We propose a low-complexity structure for twiddle factors with a resolution of 32 points. Furthermore, we propose a slightly modified version of a previously reported multiplier for a resolution of 16 points with lower round-off noise. For completeness we also include results on optimal coefficients for eight-points resolution. We perform finite word length analysis for both coefficients and round-off errors and derive optimized coefficients with minimum complexity for varying requirements.
@article{diva2:462835,
author = {Qureshi, Fahad and Gustafsson, Oscar},
title = {{Low-Complexity Constant Multiplication Based on Trigonometric Identities with Applications to FFTs}},
journal = {IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences},
year = {2011},
volume = {E94A},
number = {11},
pages = {2361--2368},
}
This brief presents a novel approach for improving the accuracy of rotations implemented by complex multipliers, based on scaling the complex coefficients that define these rotations. A method for obtaining the optimum coefficients that lead to the lowest error is proposed. This approach can be used to get more accurate rotations without increasing the coefficient word length and to reduce the word length without increasing the rotation error. This brief analyzes two different situations where the optimization method can be applied: rotations that can be optimized independently and sets of rotations that require the same scaling. These cases appear in important signal processing algorithms such as the discrete cosine transform and the fast Fourier transform (FFT). Experimental results show that the use of scaling for the coefficients clearly improves the accuracy of the algorithms. For instance, improvements of about 8 dB in the Frobenius norm of the FFT are achieved with respect to using non-scaled coefficients.
@article{diva2:453944,
author = {Garrido Gálvez, Mario and Gustafsson, Oscar and Grajal, Jesus},
title = {{Accurate Rotations Based on Coefficient Scaling}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2011},
volume = {58},
number = {10},
pages = {662--666},
}
This brief presents novel circuits for calculating bit reversal on a series of data. The circuits are simple and consist of buffers and multiplexers connected in series. The circuits are optimum in two senses: they use the minimum number of registers that are necessary for calculating the bit reversal and have minimum latency. This makes them very suitable for calculating the bit reversal of the output frequencies in hardware fast Fourier transform (FFT) architectures. This brief also proposes optimum solutions for reordering the output frequencies of the FFT when different common radices are used, including radix-2, radix-2(k), radix-4, and radix-8.
@article{diva2:453943,
author = {Garrido Gálvez, Mario and Grajal, Jesus and Gustafsson, Oscar},
title = {{Optimum Circuits for Bit Reversal}},
journal = {IEEE Transactions on Circuits and Systems - II - Express Briefs},
year = {2011},
volume = {58},
number = {10},
pages = {657--661},
}
This letter describes an efficient architecture for the computation of fast Fourier transform (FFT) algorithms with single-bit input. The proposed architecture is aimed for the first stages of pipelined FFT architectures, processing one sample per clock cycle, hence making it suiable for real-time FFT computation. Since natural input order pipeline FFTs use large memories in the early stages, it is important to keep the word length shorter in the beginning of the pipeline. By replacing the initial butterflies and rotators of an architecture with that of the proposed block, the memory requirements can be significantly reduced. Comparisons with the commonly used single delay feedback (SDF) architecture show that more than 50% of the required memory can be saved in some cases.
@article{diva2:451816,
author = {Athar, Saima and Gustafsson, Oscar and Qureshi, Fahad and Kale, Izzet},
title = {{On the efficient computation of single-bit input word length pipelined FFTs}},
journal = {IEICE Electronics Express},
year = {2011},
volume = {8},
number = {17},
pages = {1437--1443},
}
This paper presents a digital background calibration technique that measures and cancels offset, linear and nonlinear errors in each stage of a pipelined analog to digital converter (ADC) using a single algorithm. A simple two-step subranging ADC architecture is used as an extra ADC in order to extract the data points of the stage-under-calibration and perform correction process without imposing any changes on the main ADC architecture which is the main trend of the current work. Contrary to the conventional calibration methods that use high resolution reference ADCs, averaging and chopping concepts are used in this work to allow the resolution of the extra ADC to be lower than that of the main ADC.
@article{diva2:440993,
author = {Jalili, Armin and Sayedi, S. M. and Wikner, Jacob and Zeidaabadi Nezhad, Abolghasem},
title = {{A nonlinearity error calibration technique for pipelined ADCs}},
journal = {Integration},
year = {2011},
volume = {44},
number = {3},
pages = {229--241},
}
This paper presents a digital background calibration technique to compensate inter-channel gain and offset errors in parallel, pipelined analog-to-digital converters (ADCs). By using an extra analog path, calibration of each ADC channel is done without imposing any changes on the digitizing structure, i.e., keeping each channel completely intact. The extra analog path is simplified using averaging and chopping concepts, and it is realized in a standard 0.18‐μm CMOS technology. The complexity of the analog part of the proposed calibration system is same for a different number of channels.
Simulation results of a behavioral 12-bit, dual channel, pipelined ADC show that offset and gain error tones are improved from −56.5 and −58.3 dB before calibration to about −86.7 and −103 dB after calibration, respectively.
@article{diva2:440992,
author = {Jalili, Armin and Sayedi, Sayed Masoud and Wikner, Jacob},
title = {{Inter-channel offset and gain mismatch correction for time-interleaved pipelined ADCs}},
journal = {Microelectronics Journal},
year = {2011},
volume = {42},
number = {1},
pages = {158--164},
}
This paper introduces a class of wide-band linear-phase finite-length impulse response (FIR) differentiators. It is based on two-rate and frequency-response masking techniques. It is shown how to use these techniques to obtain all four types of linear-phase FIR differentiators. Design examples demonstrate that differentiators in this class can achieve substantial savings in arithmetic complexity in comparison with conventional direct-form linear-phase FIR differentiators. The savings achievable depend on the bandwidth and increase with increasing bandwidth beyond the break-even points which are in the neighborhood of 90% (80%) of the whole bandwidth for Type II and III (Type I and IV) differentiators. The price to pay for the savings is a moderate increase in the delay and number of delay elements. Further, in terms of structural arithmetic operations, the proposed filters are comparable to filters based on piecewise-polynomial impulse responses. The advantage of the proposed filters is that they can be implemented using non-recursive structures as opposed to the polynomial-based filters which are implemented with recursive structures.
@article{diva2:436994,
author = {Ullah Sheikh, Zaka and Johansson, Håkan},
title = {{A Class of Wide-Band Linear-Phase FIR Differentiators Using a Two-Rate Approach and the Frequency-Response Masking Technique}},
journal = {IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS},
year = {2011},
volume = {58},
number = {8},
pages = {1827--1839},
}
@article{diva2:433314,
author = {Laddomada, Massimiliano and Jovanovic Dolecek, Gordana and Yong Ching, Lim and Luo, Fa-Long and Renfors, Markku and Wanhammar, Lars},
title = {{Advanced techniques on multirate signal processing for digital information processing}},
journal = {IET Signal Processing},
year = {2011},
volume = {5},
number = {3},
pages = {313--315},
}
n/a
@article{diva2:416865,
author = {Larsson, Erik G and Gustafsson, Oscar},
title = {{The Impact of Dynamic Voltage and Frequency Scaling on Multicore DSP Algorithm Design}},
journal = {IEEE SIGNAL PROCESSING MAGAZINE},
year = {2011},
volume = {28},
number = {3},
}
This paper introduces reconfigurable nonuniform transmultiplexers (TMUXs) based on fixed uniform modulated filter banks (FBs). The TMUXs use parallel processing where polyphase components, of any user, are processed by a number of synthesis FB and analysis FB branches. One branch represents one granularity band, and any user can occupy integer multiples of a granularity band. The proposed TMUX also requires adjustable commutators so that any user occupies any portion of the frequency spectrum. The location and width of this portion can be modified without additional arithmetic complexity or filter redesign. This paper considers both cosine modulated and modified discrete Fourier transform FBs. It discusses the filter design, TMUX realization, and the parameter selection. It is shown that one can indeed decrease the arithmetic complexity by proper choice of system parameters. For the critically sampled case and if the number of channels is higher than necessary, we can reduce the arithmetic complexity. In case of an oversampled system, the arithmetic complexity can be reduced by proper choice of the number of channels and the roll-off factor of the prototype filter. The proposed TMUX is compared to existing reconfigurable TMUXs, and examples are provided for illustration.
@article{diva2:403125,
author = {Eghbali, Amir and Johansson, Håkan and Löwenborg, Per},
title = {{Reconfigurable Nonuniform Transmultiplexers Using Uniform Modulated Filter Banks}},
journal = {IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS},
year = {2011},
volume = {58},
number = {3},
pages = {539--547},
}
The generation of a canonical signed digit representation from a binary representation is revisited. Based on the property that each nonzero digit is surrounded by a zero digit, a hardware-efficient conversion method using bypass instead of carry propagation is proposed. The proposed method requires less area per digit and the required bypass signal can be generated or propagated with only a single NOR gate. It is shown that the proposed converter outperforms previous converters and a look-ahead circuitry to speed up the generation of bypass signals is also proposed.
@article{diva2:398513,
author = {Faust, M and Gustafsson, Oscar and Chang, C-H},
title = {{Fast and VLSI efficient binary-to-CSD encoder using bypass signal}},
journal = {ELECTRONICS LETTERS},
year = {2011},
volume = {47},
number = {1},
pages = {18--19},
}
A class of Farrow-structure-based reconfigurable bandpass finite-length impulse response (FIR) filters for integer sampling rate conversion is introduced. The converters are realized in terms of a number of fixed linear-phase FIR subfilters and two sets of reconfigurable multipliers that determine the passband location and conversion factor, respectively. Both Mth-band and general FIR filters can be realized, and the filters work equally well for any integer factor and passband location. Design examples are included demonstrating their efficiency compared to modulated regular filters. In addition, in contrast to regular filters, the proposed ones have considerably fewer filter coefficients that need to be determined in the filter design process.
@article{diva2:397480,
author = {Johansson, Håkan},
title = {{Farrow-structure-based reconfigurable bandpass linear-phase FIR filters for integer sampling rate conversion}},
journal = {IEEE Transactions on Circuits and Systems II: Express Briefs},
year = {2011},
volume = {58},
number = {1},
pages = {46--50},
}
This paper discusses two approaches for the baseband processing part of cognitive radios. These approaches can be used depending on the availability of (i) a composite signal comprising several user signals or, (ii) the individual user signals. The aim is to introduce solutions which can support different bandwidths and center frequencies for a large set of users and at the cost of simple modifications on the same hardware platform. Such structures have previously been used for satellite-based communication systems and the paper aims to outline their possible applications in the context of cognitive radios. For this purpose, dynamic frequencyband allocation (DFBA) and reallocation (DFBR) structures based on multirate building blocks are introduced and their reconfigurability issues with respect to the required reconfigurability measures in cognitive radios are discussed.
@article{diva2:272252,
author = {Eghbali, Amir and Johansson, Håkan and Löwenborg, Per and Göckler, Heinz G},
title = {{Dynamic Frequency-Band Reallocation and Allocation:
from Satellite-Based Communication Systems to Cognitive Radios}},
journal = {Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology},
year = {2011},
volume = {62},
number = {2},
pages = {187--203},
}
This paper presents a programmable MMSE soft-output MIMO symbol detector that supports 600 Mbps data rate defined in 802.11n. The detector is implemented using a multi-core floating-point processor and configurable soft-bit demapper. Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of channel processing with sufficient numerical stability for large channel matrices. When compared to several existing fixed-functional solutions, the detector proposed in this paper is smaller and faster. More important, it is programmable and configurable so that it can support various MIMO transmission schemes defined by different standards.
@article{diva2:271871,
author = {Wu, Di and Eilert, Johan and Liu, Dake},
title = {{Implementation of a High-Speed MIMO Soft-Output Symbol Detector for Software Defined Radio}},
journal = {Journal of Signal Processing Systems},
year = {2011},
volume = {63},
number = {1},
pages = {27--37},
}
We propose a method of reducing the switching noise in the substrate of an integrated circuit. The main idea is to design the digital circuits to obtain a periodic supply current with the same period as the clock. This property locates the frequency components of the switching noise above the clock frequency. Differential return-to-zero signaling is used to reduce the data-dependency of the current. Circuits are implemented in symmetrical precharged DCVS logic with internally asynchronous D registers. A chip was fabricated in a standard 130-nm CMOS technology holding two versions of a pipelined 16-bit adder. First version employed the proposed method, and second version used conventional static CMOS logic circuits and TSPC registers. The respective device counts are 1190 and 684, and maximal operating frequencies 450 and 375 MHz. Frequency domain measurements were performed at the substrate node with on-chip generated sinusoidal and pseudo-random data at a clock frequency of 300 MHz. The sinusoidal case resulted in the largest frequency components, where an 8.5 dB/Hz decrease in maximal power is measured for the proposed circuitry at a cost of three times larger power consumption.
@article{diva2:355843,
author = {Yasser Sherazi, Syed Muhammad and Asif, Shahzad and Backenius, Erik and Vesterbacka, Mark},
title = {{Reduction of Substrate Noise in Sub Clock Frequency Range}},
journal = {IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS},
year = {2010},
volume = {57},
number = {6},
pages = {1287--1297},
}
Doubly resistively terminated LC filters are optimal from an element sensitivity point of view and are therefore used as reference filter for high-performance active filters. The later inherits the sensitivity properties of the LC filter. Hence it is important to design the reference filter to have minimal element sensitivity. In this paper, we first review the mechanism for the low sensitivity and give an upper bound on the deviation in the passband attenuation. Next we compare classical lowpass approximations with respect to their influence on the sensitivity and propose the use of diminishing ripple in the passband to further reduce the sensitivity. Finally, we propose a design strategy for doubly resistively terminated LC filters with low sensitivity.
@article{diva2:342947,
author = {Wanhammar, Lars},
title = {{Synthesis of Low-Sensitivity Analog Filters}},
journal = {ANALOG CIRCUIT DESIGN},
year = {2010},
pages = {129--145},
}
This paper presents a flexible interleaver architecture supportingmultiple standards likeWLAN,WiMAX, HSPA+, 3GPP-LTE, and DVB. Algorithmic level optimizations like 2D transformation and realization of recursive computation are applied, which appear to be the key to reach to an efficient hardware multiplexing among different interleaver implementations. The presented hardware enables the mapping of vital types of interleavers including multiple block interleavers and convolutional interleaver onto a single architecture. By exploiting the hardware reuse methodology the silicon cost is reduced, and it consumes 0.126mm2 area in total in 65nm CMOS process for a fully reconfigurable architecture. It can operate at a frequency of 166 MHz, providing a maximum throughput up to 664 Mbps for a multistream system and 166 Mbps for single stream communication systems, respectively. One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by this hardware with minimal cycle cost overheads. Maximum flexibility and fast switchability among multiple standards during run time makes the proposed architecture a right choice for the radio baseband processing platform.
@article{diva2:315973,
author = {Asghar, Rizwan and Liu, Dake},
title = {{Multimode flex-interleaver core for baseband processor platform}},
journal = {Journal of Computer Systems, Networks and Communications},
year = {2010},
volume = {2010},
pages = {1--16},
}
This paper introduces two classes of cosine-modulated causal and stable filter banks (FBs) with near perfect reconstruction (NPR) and low implementation complexity. Both classes have the same infinite-length impulse response (IIR) analysis FB but different synthesis FBs utilizing IIR and finite-length impulse response (FIR) filters, respectively. The two classes are preferable for different types of specifications. The IIR/FIR FBs are preferred if small phase errors relative to the magnitude error are desired, and vice versa. The paper provides systematic design procedures so that PR can be approximated as closely as desired. It is demonstrated through several examples that the proposed FB classes, depending on the specification, can have a lower implementation complexity compared to existing FIR and IIR cosine-modulated FBs (CMFBs). The price to pay for the reduced complexity is generally an increased delay. Furthermore, two additional attractive features of the proposed FBs are that they are asymmetric in the sense that one of the analysis and synthesis banks has a lower computational complexity compared to the other, which can be beneficial in some applications, and that the number of distinct coefficients is small, which facilitates the design of FBs with large numbers of channels.
@article{diva2:292213,
author = {Rosenbaum, Linnea and Löwenborg, Per and Johansson, Håkan},
title = {{Two Classes of Cosine-Modulated IIR/IIR and IIR/FIR NPR Filter Banks}},
journal = {CIRCUITS SYSTEMS AND SIGNAL PROCESSING},
year = {2010},
volume = {29},
number = {1},
pages = {103--133},
}
Analog-to-digital converters based on sigma-delta modulation have shown promising performance, with steadily increasing bandwidth. However, associated with the increasing bandwidth is an increasing modulator sampling rate, which becomes costly to decimate in the digital domain. Several architectures exist for the digital decimation filter, and among the more common and efficient are polyphase decomposed finite-length impulse response (FIR) filter structures. In this paper, we consider such filters implemented with partial product generation for the multiplications, and carry-save adders to merge the partial products. The focus is on the efficient pipelined reduction of the partial products, which is done using a bit-level optimization algorithm for the tree design. However, the method is not limited only to filter design, but may also be used in other applications where high-speed reduction of partial products is required. The presentation of the reduction method is carried out through a comparison between the main architectural choices for FIR filters: the direct-form and transposed direct-form structures. For the direct-form structure, usage of symmetry adders for linear-phase filters is investigated, and a new scheme utilizing partial symmetry adders is introduced. The optimization results are complemented with energy dissipation and cell area estimations for a 90 nm CMOS process.
@article{diva2:292214,
author = {Blad, Anton and Gustafsson, Oscar},
title = {{Integer Linear Programming-Based Bit-Level Optimization for High-Speed FIR Decimation Filter Architectures}},
journal = {CIRCUITS SYSTEMS AND SIGNAL PROCESSING},
year = {2010},
volume = {29},
number = {1},
pages = {81--101},
}
This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.
@article{diva2:271855,
author = {Asghar, Rizwan and Wu, Di and Eilert, Johan and Liu, Dake},
title = {{Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding}},
journal = {Journal of Signal Processing Systems for Signal, Image, and Video Technology},
year = {2010},
volume = {60},
number = {1},
pages = {15--29},
}
Book chapters
In this chapter fundamentals of arithmetic operations and number representations used in DSP systems are discussed. Different relevant number systems are outlined with a focus on fixed-point representations. Structures for accelerating the carry-propagation of addition are discussed, as well as multi-operand addition. For multiplication, different schemes for generating and accumulating partial products are presented. In addition to that, optimization for constant coefficient multiplication is discussed. Division and square-rooting are also briefly outlined. Furthermore, floating-point arithmetic and the IEEE 754 floating-point arithmetic standard are presented. Finally, some methods for computing elementary functions, e.g., trigonometric functions, are presented.
@incollection{diva2:1598793,
author = {Gustafsson, Oscar and Wanhammar, Lars},
title = {{Arithmetic}},
booktitle = {Handbook of signal processing systems},
year = {2019},
pages = {381--426},
publisher = {Springer},
address = {Cham},
}
The fast Fourier transform (FFT) is a widely used algorithm in signal processing applications. FFT hardware architectures are designed to meet the requirements of the most demanding applications in terms of performance, circuit area, and/or power consumption. This chapter summarizes the research on FFT hardware architectures by presenting the FFT algorithms, the building blocks in FFT hardware architectures, the architectures themselves, and the bit reversal algorithm.
@incollection{diva2:1598788,
author = {Garrido, Mario and Qureshi, Fahad and Takala, Jarmo and Gustafsson, Oscar},
title = {{Hardware architectures for the fast Fourier transform}},
booktitle = {Handbook of signal processing systems},
year = {2019},
pages = {613--647},
publisher = {Springer},
address = {Cham},
}
The optimization of shift‐and‐add network for constant multiplications is found to have great potential for reducing the area, delay, and power consumption of implementation of multiplications in several computation‐intensive applications not only in dedicated hardware but also in programmable computing systems. To simplify the shift‐and‐add network in single constant multiplication (SCM) circuits, this chapter discusses three design approaches, including direct simplification from a given number representation, simplification by redundant signed digit (SD) representation, and simplification by adder graph. Examples of the multiple constant multiplication (MCM) methods are constant matrix multiplication, discrete cosine transform (DCT) or fast Fourier transform (FFT), and polyphase finite impulse response (FIR) filters and filter banks. The given constant multiplication methods can be used for matrix multiplications and inner‐product; and can be applied easily to image/video processing and graphics applications. The chapter further discusses some of the shortcomings in the current research on constant multiplications, and possible scopes of improvement.
@incollection{diva2:1245407,
author = {Meher, Pramod Kumar and Chang, Chip-Hong and Gustafsson, Oscar and Vinod, A.P. and Faust, Mattias},
title = {{Shift-Add Circuits for Constant Multiplications}},
booktitle = {Arithmetic Circuits for DSP Applications},
year = {2017},
pages = {33--76},
publisher = {John Wiley \& Sons},
}
General‐purpose DSP processors, application‐specific processors, and algorithm‐specific processors are used to implement different types of DSP systems or subsystems. They are typically used in applications involving complex and irregular algorithms while application‐specific processors provide lower unit cost and higher performance for a specific application, particularly when the volume of production is high. Most DSP applications use fractional arithmetic instead of integer arithmetic. Multimedia and communication applications involve real‐time audio and video/image processing which very often require sum‐of‐products (SOP) computation. The need of computing non‐linear functions arises in many different applications. The straightforward method of approximating an elementary function is to just store the values in a look‐up table typically leads to large tables, even though the resulting area from standard cell synthesis grows slower than the number of memory bits. It is of interest to find ways to approximate elementary functions using a trade‐off between arithmetic operations and look‐up tables.
@incollection{diva2:1245375,
author = {Gustafsson, Oscar and Wanhammar, Lars},
title = {{Basic Arithmetic Circuits}},
booktitle = {Arithmetic Circuits for DSP Applications},
year = {2017},
pages = {1--32},
publisher = {John Wiley \& Sons},
}
@incollection{diva2:797158,
author = {Gustafsson, Oscar and Wanhammar, Lars},
title = {{Arithmetic}},
booktitle = {Handbook of signal processing systems},
year = {2013},
pages = {593--637},
publisher = {Springer},
address = {New York},
}
Digital filters, together with signal processing, are being employed in the new technologies and information systems, and are implemented in different areas and applications. Digital filters and signal processing are used with no costs and they can be adapted to different cases with great flexibility and reliability. This book presents advanced developments in digital filters and signal process methods covering different cases studies. They present the main essence of the subject, with the principal approaches to the most recent mathematical models that are being employed worldwide.
@incollection{diva2:588996,
author = {Johansson, Håkan and Gustafsson, Oscar},
title = {{Two-Rate Based Structures for Computationally Efficient Wide-Band FIR Systems}},
booktitle = {Digital Filters and Signal Processing},
year = {2013},
pages = {189--212},
publisher = {InTech},
}
In this work we discuss the realization of constant multiplication using a minimum number of carry-save adders. We consider both non-redundant and carry-save representation for the input data. For both cases we present all possible interconnection topologies, using up to six and five adders, respectively. These are sufficient to realize constant multiplications for all coefficients with a wordlength up to 19 bits.
@incollection{diva2:582002,
author = {Gustafsson, Oscar and Wanhammar, Lars},
title = {{Low-complexity and high-speed constant multiplications for digital filters using carry-save arithmetic}},
booktitle = {Digital Filters},
year = {2011},
pages = {241--256},
publisher = {InTech},
address = {Rijeka, Croatia},
}
@incollection{diva2:797154,
author = {Liu, Dake},
title = {{Application specific instruction set DSP processors}},
booktitle = {Handbook of signal processing systems},
year = {2010},
pages = {415--447},
publisher = {Springer},
address = {New York},
}
Handbook of Signal Processing Systems is organized in three parts. The first part motivates representative applications that drive and apply state-of-the art methods for design and implementation of signal processing systems; the second part discusses architectures for implementing these applications; the third part focuses on compilers and simulation tools, describes models of computation and their associated design tools and methodologies. This handbook is an essential tool for professionals in many fields and researchers of all levels.
@incollection{diva2:395770,
author = {Gustafsson, Oscar and Wanhammar, Lars},
title = {{Arithmetic}},
booktitle = {Handbook of signal processing systems},
year = {2010},
pages = {283--327},
publisher = {Springer},
}
Conference papers
Graph neural networks (GNNs) combine sparse and dense data compute requirements that are challenging to meet in resource-constrained embedded hardware. In this paper, we investigate a dataflow of dataflows architecture that optimizes data access and processing element utilization. The architecture is described with high-level synthesis and offers multiple configuration options including varying the number of independent hardware threads, the interface data width and the number of compute units per thread. Each hardware thread uses a fine-grained dataflow to stream words with a bit-width that depends on the network precision while a coarse-grained dataflow links the thread stages streaming partially-computed matrix tiles. The accelerator is mapped to the programmable logic of a Zynq Ultrascale device whose processing system runs Pytorch extended with PYNQ overlays. Results based on the citation networks show a performance gain of up to 140x with multi-threaded hardware configurations compared with the optimized software implementation available in Pytorch. The results also show competitive performance of the embedded hardware compared with other high-performance state-of-the-art hardware accelerators.
@inproceedings{diva2:1842600,
author = {Nunez-Yanez, Jose Luis},
title = {{Accelerating Graph Neural Networks in Pytorch with HLS and Deep Dataflows}},
booktitle = {APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2023},
year = {2023},
series = {Lecture Notes in Computer Science},
pages = {131--145},
publisher = {SPRINGER INTERNATIONAL PUBLISHING AG},
}
This paper is focused on fault detection and isolation of component-based multi-mode systems, i.e., systems that can be operated in different continuous modes. As the system mode changes, the structure of the system also changes which impacts diagnosability analysis and synthesis. To meet this challenge, diagnosis based on a structural approach is modified to also detect and isolate faults when modes change. Here, definitions for some important diagnosis concepts are extended to cover also multi-mode systems. Then, a method for hierarchical diagnosis of component-based systems is proposed. The method is exemplified on a Li-ion battery pack to show its effectiveness.
@inproceedings{diva2:1840245,
author = {Hashemniya, Fatemeh and Frisk, Erik and Krysander, Mattias},
title = {{Hierarchical Diagnosis Algorithm for Component-Based Multi-Mode Systems}},
booktitle = {22nd IFAC World Congress: Yokohama, Japan, July 9-14, 2023},
year = {2023},
series = {IFAC papersonline},
pages = {11317--11323},
}
Spiking Neural Networks (SNNs) constitute a representativeexample of neuromorphic computing in which event-driven computationis mapped to neuron spikes reducing power consumption. A challengethat limits the general adoption of SNNs is the need for mature trainingalgorithms compared with other artificial neural networks, such as multi-layer perceptrons or convolutional neural networks. This paper exploresthe use of evolutionary algorithms as a black-box solution for trainingSNNs. The selected SNN model relies on the Izhikevich neuron modelimplemented in hardware. Differently from state-of-the-art, the approachfollowed in this paper integrates within the same System-on-a-chip (SoC)both the training algorithm and the SNN fabric, enabling continuousnetwork adaptation in-field and, thus, eliminating the barrier betweenoffline (training) and online (inference). A novel encoding approach forthe inputs based on receptive fields is also provided to improve net-work accuracy. Experimental results demonstrate that these techniquesperform similarly to other algorithms in the literature without dynamicadaptability for classification and control problems.
@inproceedings{diva2:1823745,
author = {Otero, Andr\'{e}s and Sanllorente, Guillermo and de la Torre, Eduardo and Nunez-Yanez, Jose Luis},
title = {{Evolutionary FPGA-based Spiking NeuralNetworks for Continual Learning}},
booktitle = {APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2023},
year = {2023},
series = {Lecture Notes in Computer Science},
volume = {14251},
publisher = {SPRINGER INTERNATIONAL PUBLISHING AG},
}
Graph neural networks (GNNs) combine sparse and densedata compute requirements that are challenging to meet in resourceconstrained embedded hardware. In this paper, we investigate a dataflowof dataflows architecture that optimizes data access and processing element utilization. The architecture is described with high-level synthesisand offers multiple configuration options including varying the number ofindependent hardware threads, the interface data width and the numberof compute units per thread. Each hardware thread uses a fine-graineddataflow to stream words with a bit-width that depends on the network precision while a coarse-grained dataflow links the thread stagesstreaming partially-computed matrix tiles. The accelerator is mappedto the programmable logic of a Zynq Ultrascale device whose processing system runs Pytorch extended with PYNQ overlays. Results basedon the citation networks show a performance gain of up to 140x withmulti-threaded hardware configurations compared with the optimizedsoftware implementation available in Pytorch. The results also show competitive performance of the embedded hardware compared with otherhigh-performance state-of-the-art hardware accelerators.
@inproceedings{diva2:1823736,
author = {Nunez-Yanez, Jose Luis},
title = {{Accelerating Graph Neural Networks in Pytorch With HLS and Deep Dataflows}},
booktitle = {Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer},
year = {2023},
}
In this work, we analyze the step-size approximation for fixed-point least-mean-square (LMS) and block LMS (BLMS) algorithms. Our primary focus is on investigating how step size approximation impacts the convergence rate and steady-state mean square error (MSE) across varying block sizes and filter lengths. We consider three different FP quantized LMS and BLMS algorithms. The results demonstrate that the algorithm with two quantizers in single precision behaves approximately the same as one quantizer under quantized weights, regardless of block size and filter lengths. Subsequently, we explore the approximation effects of nearest power-of-two and their combinations with different design parameters on the convergence performance. Simulation results for within the context of a system identification problem under these approximations reveal intriguing insights. For instance, a single quantizer algorithm without quantized error is more robust than its counterpart under these approximations. Additionally, both single quantizer algorithms with combined power-of-two approximations matches the behavior of the actual step-size.
@inproceedings{diva2:1814332,
author = {Khan, Mohd Tasleem and Gustafsson, Oscar},
title = {{Analyzing Step-Size Approximation for Fixed-Point Implementation of LMS and BLMS Algorithms}},
booktitle = {2023 IEEE Nordic Circuits and Systems Conference (NorCAS)},
year = {2023},
publisher = {IEEE},
}
FIR filtering realized in frequency domain can use different FFT sizes leading to different arithmetic complexities. The implementation results indicate that not only arithmetic complexities must be considered for minimal power consumption.
@inproceedings{diva2:1814213,
author = {Bae, Cheolyong and Gustafsson, Oscar},
title = {{FFT-Size Implementation Tradeoffs for Chromatic Dispersion Compensation Filters}},
booktitle = {Signal Processing in Photonic Communications 2023, Busan, Republic of Korea, 9--13 July, 2023.},
year = {2023},
}
Matrix transposition, the procedure of swapping rows and columns of a matrix, has applications in various signal processing applications, such as massive multiple-input multiple-output (MIMO) communication systems, data compression, and multidimensional fast Fourier transforms – which are used in MIMO radar systems. In low-latency high-throughput streaming applications, specialized circuits for matrix transposition are needed in order to perform transposition in real-time. This is in contrast to "slower" applications, where transposition can be adequately performed by storing a matrix in a shared memory and afterward reading it back in a transposed order. In this paper, a design procedure for streaming matrix transposition on field-programmable gate arrays (FPGAs) using distributed memories is presented. It is shown that significantly fewer FPGA resources are required for small- to medium-sized streaming matrix transpositions compared to recent related works.
@inproceedings{diva2:1810365,
author = {Henriksson, Mikael and Gustafsson, Oscar},
title = {{Streaming Matrix Transposition on FPGAs Using Distributed Memories}},
booktitle = {Proceeding of the IEEE Nordic Circuits and Systems Conference (NorCAS)},
year = {2023},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Aalborg, Denmark},
}
Spade is an HDL that enhances the productivity of HDL designers byadding useful abstractions for hardware design. These abstractionsare zero- or low-cost, meaning that the designer still has full controlover what hardware gets generated.
@inproceedings{diva2:1802722,
author = {Skarman, Frans and Gustafsson, Oscar},
title = {{Abstraction in the Spade Hardware Description Language}},
booktitle = {LATTE '23 - Workshop on Languages, Tools, and Techniques for Accelerator Design, Vancouver, BC, Canada, March 26, 2023},
year = {2023},
}
Spade is a new open source hardware descriptionlanguage (HDL) designed to increase developer productivitywithout sacrificing the low-level control offered by HDLs. Itis a standalone language which takes inspiration from modernsoftware languages, and adds useful abstractions for commonhardware constructs. It also comes with a convenient set of tool-ing, such as a helpful compiler, a build system with dependencymanagement, tools for debugging, and editor integration.
@inproceedings{diva2:1802720,
author = {Skarman, Frans and Gustafsson, Oscar},
title = {{Spade: An Expression-Based HDL With Pipelines}},
booktitle = {Proceedings of the 3rd Workshop on Open-Source Design Automation (OSDA), 2023},
year = {2023},
series = {arXiv.org},
pages = {7--12},
}
This paper investigates an indoor multiple human tracking andfall detection system based on the usage of multiple MillimeterWave radars from Texas Instruments. We propose a real-timesystem framework to merge the signals received from radars andtrack the position and body status of human objects. In order toguarantee the overall accuracy of our system, we develop novelstrategies such as dynamic DBSCAN clustering based on signalenergy levels and a possibility matrix for multiple object tracking. Our prototype system, which employs three radars placedon x-y-z surfaces, demonstrates higher accuracy than the solution in [1] (90%), with 98.5% and 98.2% accuracy in multiplehuman tracking and fall detection respectively. The accuracyreaches 99.7% for single human tracking.
@inproceedings{diva2:1784817,
author = {Nunez-Yanez, Jose Luis},
title = {{Z. Shen, J. Nunez-Yanez and N. Dahnoun, "Multiple Human Tracking and Fall Detection Real-Time System Using Millimeter-Wave Radar and Data Fusion,"}},
booktitle = {2023 12th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 2023, pp. 1-6,},
year = {2023},
}
This paper presents EnergyAnalyzer, a code-level static analysis tool for estimating the energy consumption of embedded software based on statically predictable hardware events. The tool utilises techniques usually used for worst-case execution time (WCET) analysis together with bespoke energy models developed for two predictable architectures - the ARM Cortex-M0 and the Gaisler LEON3 - to perform energy usage analysis. EnergyAnalyzer has been applied in various use cases, such as selecting candidates for an optimised convolutional neural network, analysing the energy consumption of a camera pill prototype, and analysing the energy consumption of satellite communications software. The tool was developed as part of a larger project called TeamPlay, which aimed to provide a toolchain for developing embedded applications where energy properties are first-class citizens, allowing the developer to reflect directly on these properties at the source code level. The analysis capabilities of EnergyAnalyzer are validated across a large number of benchmarks for the two target architectures and the results show that the statically estimated energy consumption has, with a few exceptions, less than 1% difference compared to the underlying empirical energy models which have been validated on real hardware.
@inproceedings{diva2:1784585,
author = {Nunez-Yanez, Jose Luis},
title = {{EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications}},
booktitle = {21st International Workshop on Worst-Case Execution Time Analysis (WCET 2023)},
year = {2023},
}
Spade is a new hardware description language which aims to make hardware description easier and less error prone. It does this by taking lessons from software programming languages, and adding language level support for common hardware constructs, all without compromising the low level control over what hardware gets generated.
@inproceedings{diva2:1742758,
author = {Skarman, Frans and Gustafsson, Oscar},
title = {{Spade: An HDL Inspired by Modern Software Languages}},
booktitle = {2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)},
year = {2022},
series = {International Conference on Field Programmable Logic and Applications},
pages = {454--455},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Energy modelling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific config-urations, neither are they suitable for static energy consumption estimation. This paper introduces a set of comprehensive energy models for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. We use a commercially representative physical platform together with a custom modified Instruction Set Simulator to obtain the physical data and system state markers used to generate the models. The models account for different processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. Unlike existing works, which target a very limited set of applications, all developed models are generated and validated using a very wide range of benchmarks from a variety of emerging IoT application areas, including machine learning and have a prediction error of less than 5%.
@inproceedings{diva2:1727113,
author = {Nikov, Kris and Georgiou, Kyriakos and Chamski, Zbigniew and Eder, Kerstin and Nunez-Yanez, Jose Luis},
title = {{Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis}},
booktitle = {2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS)},
year = {2022},
pages = {1--4},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
In this paper, we investigate the application of early-exit strategies to quantized neural networks with binarized weights, mapped to low-cost FPGA SoC devices. The increasing complexity of network models means that hardware reuse and heterogeneous execution are needed and this opens the opportunity to evaluate the prediction confidence level early on. We apply the early-exit strategy to a network model suitable for ImageNet classification that combines weights with floating-point and binary arithmetic precision. The experiments show an improvement in inferred speed of around 20% using an early-exit network, compared with using a single primary neural network, with a negligible accuracy drop of 1.56%.
@inproceedings{diva2:1724502,
author = {Kong, Minxuan and Nikov, Kris and Nunez-Yanez, Jose Luis},
title = {{Evaluation of Early-exit Strategies in Low-cost FPGA-based Binarized Neural Networks}},
booktitle = {2022 25th Euromicro Conference on Digital System Design (DSD)},
year = {2022},
pages = {01--08},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
High-precision time-to-digital converters (TDCs) are key components for controlling quantum systems and FPGAs have gained popularity for this task thanks to their low-cost and flexibility compared with Application Specific Integrated Circuits (ASICs). This paper investigates a novel FPGA-based TDC architecture that combines a wave union launcher and delay lines constructed with DSP blocks. The configuration achieves a 8.07ps RMS resolution on a low-cost Zynq FPGA with a power usage of only 0.628W. The low power consumption is achieved thanks to a combination of operating frequency and logic resource usage that are lower than other methods, such as multi-chain DSP based TDCs and multi-chain CARRY4 based TDCs
@inproceedings{diva2:1724501,
author = {Wang, Zijie and Lu, Jiajun and Nunez-Yanez, Jose Luis},
title = {{A Low-complexity FPGA TDC based on a DSP Delay Line and a Wave Union Launcher}},
booktitle = {2022 25th Euromicro Conference on Digital System Design (DSD)},
year = {2022},
series = {EUROMICRO Conference Proceedings},
pages = {101--108},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Graph processing is an area that has received significant attention in recent years due to the substantial expansion in industries relying on data analytics. Alongside the vital role of finding relations in social networks, graph processing is also widely used in transportation to find optimal routes and biological networks to analyse sequences. The main bottleneck in graph processing is irregular memory accesses rather than computation intensity. Since computational intensity is not a driving factor, we propose a method to perform graph processing at the edge more efficiently. We believe current cloud computing solutions are still very costly and have latency issues. The results demonstrate the benefits of a dedicated sparse graph processing algorithm compared with dense graph processing when analysing data with low density. As graph datasets grow exponentially, traversal algorithms such as breadth-first search (BFS), fundamental to many graph processing applications and metrics, become more costly to compute. Our work focuses on reviewing other implementations of breadth-first search algorithms designed for low power systems and proposing our solution that utilises advanced enhancements to achieve a significant performance boost up to 9.2x better performance in terms of MTEPS compared to other state-of-the-art solutions with a power usage of 2.32W.
@inproceedings{diva2:1724472,
author = {Olgu, Kaan and Nikov, Kris and Nunez-Yanez, Jose Luis},
title = {{Analysis of Graph Processing in Reconfigurable Devices for Edge Computing Applications}},
booktitle = {2022 25th Euromicro Conference on Digital System Design (DSD)},
year = {2022},
series = {EUROMICRO Conference Proceedings},
pages = {16--23},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Approaches to shift-and-add realization of time-domain chromatic dispersion compensation FIR filters are considered. The coefficient word length has larger impact than filter length on both BER penalty and adder complexity.
@inproceedings{diva2:1710277,
author = {Gustafsson, Oscar and Bae, Cheolyong},
title = {{Shift-and-Add Realization Trade-Offs for Chromatic Dispersion Compensation FIR Filters}},
booktitle = {Optica Advanced Photonics Congress 2022},
year = {2022},
publisher = {Optical Society of America},
}
In this paper, we investigate the application of early-exit strategies to fully quantized neural networks, mapped to low-complexity FPGA SoC devices. The challenge of accuracy drop with low bitwidth quantized first convolutional layer and fully connected layers has been resolved. We apply an early-exit strategy to a network model that combines weights and activation with extremely low bitwidth and binary arithmetic precision based on the ImageNet dataset. We use entropy calculations to decide which branch of the early-exit network to take. The experiments show an improvement in inferred speed of $$1.52\times $$1.52×using an early-exit system, compared with using a single primary neural network, with a slight accuracy decrease of 1.64%.
@inproceedings{diva2:1709087,
author = {Kong, Minxuan and Nunez-Yanez, Jose Luis},
title = {{Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network}},
booktitle = {Applied Reconfigurable Computing. Architectures, Tools, and Applications},
year = {2022},
series = {Lecture Notes in Computer Science},
volume = {13569},
pages = {72--86},
publisher = {Springer Nature},
}
In this work, implementation trade-offs for ASIC-implementation of least-mean-square (LMS) and block LMS (BLMS) adaptive filters are presented. We explore the design trade-offs by increasing the block size and/or relying on the synthesis tool for increased sample rate. For area, lower block size is advantageous as long as the synthesis tool can meet timing. Energy optimum is however found at a different point in design space. Simulation confirms that longer block sizes leads to lower MSE errors for identical step-size. Hence, the design-point should be decided based on weighted requirements for area, energy and MSE.
@inproceedings{diva2:1698989,
author = {Khan, Mohd Tasleem and Gustafsson, Oscar},
title = {{ASIC Implementation Trade-Offs for High-Speed LMS and Block LMS Adaptive Filters}},
booktitle = {65th International Midwest Symposium on Circuits and Systems (MWSCAS)},
year = {2022},
series = {Midwest Symposium on Circuits and Systems. Conference Proceedings},
pages = {1--4},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Fukuoka, Japan},
}
Data-driven fault diagnosis requires training data that is representative of the different operating conditions of the system to capture its behavior. If training data is limited, one solution is to incorporate physical insights into machine learning models to improve their effectiveness. However, while previous works show the usefulness of hybrid approaches for isolation of faults, the impact of training data must be taken into consideration when drawing conclusions from data-driven residuals in a consistency-based diagnosis framework. By giving an understanding of the physical interaction between the signals, a hybrid fault diagnosis approach, can enforce model properties of residual generators to isolate faults that are not represented in training data. The objective of this work is to analyze the impact of limited training data when training neural network-based residual generators. It is also investigated how the use of structural information when selecting the network structure is a solution to limited training data and how to ameliorate the performance of hybrid approaches in face of this challenge.
@inproceedings{diva2:1693759,
author = {Mohammadi, Arman and Krysander, Mattias and Jung, Daniel},
title = {{Analysis of grey-box neural network-based residuals for consistency-based fault diagnosis}},
booktitle = {11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes SAFEPROCESS 2022. Pafos, Cyprus, 8-10 June 2022},
year = {2022},
series = {IFAC papers online},
volume = {6},
pages = {1--6},
publisher = {Elsevier},
}
With trends as IoT and increased connectivity, the availability of data is consistently increasing and its automated processing with, e.g., machine learning becomes more important. This is certainly true for the area of fault diagnostics and prognostics. However, for rare events like faults, the availability of meaningful data will stay inherently sparse making a pure data-driven approach more difficult. In this paper, the question when to use model-based, data-driven techniques, or a combined approach for fault diagnosis is discussed using real-world data of a permanent magnet synchronous machine. Key properties of the different approaches are discussed in a diagnosis context, performance quantified, and benefits of a combined approach are demonstrated.
@inproceedings{diva2:1693751,
author = {Frisk, Erik and Jarmolowitz, Fabian and Jung, Daniel and Krysander, Mattias},
title = {{Fault Diagnosis Using Data, Models, or Both -- An Electrical Motor Use-Case}},
booktitle = {11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes SAFEPROCESS 2022. Pafos, Cyprus, 8-10 June 2022},
year = {2022},
series = {IFAC papers online},
pages = {533--538},
publisher = {Elsevier},
}
Finite word length effects for frequency-domain implementation of chromatic dispersion compensation is analyzed. The results show a significant difference for the different factors when it comes to power consumption and receiver penalty.
@inproceedings{diva2:1636886,
author = {Bae, Cheolyong and Gustafsson, Oscar},
title = {{Finite Word Length Analysis for FFT-Based Chromatic Dispersion Compensation Filters}},
booktitle = {Signal Processing in Photonic Communications 2021},
year = {2021},
publisher = {OPTICA},
}
In this work, the effect of latency for three different positive definite matrix inversion algorithms when implemented on parallel and pipelined processing elements is considered. The work is motivated by the fact that in a massive MIMO system, matrix inversion needs to be performed between estimating the channels and producing the transmitted downlink signal, which means that the latency of the matrix inversion has a significant impact on the system performance. It is shown that, despite the algorithms having different complexity, all three algorithms can have the lowest latency for different number of processing elements and pipeline levels. Especially, in systems with many processing elements, the algorithm with the highest complexity has the lowest latency.
@inproceedings{diva2:1636880,
author = {Bertilsson, Erik and Ingemarsson, Carl and Gustafsson, Oscar},
title = {{Low-Latency Parallel Hermitian Positive-Definite Matrix Inversion for Massive MIMO}},
booktitle = {2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021)},
year = {2021},
series = {IEEE International Symposium on Biomedical Imaging},
pages = {23--28},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Floating-point numbers represented using a hidden one can readily be approximately converted to the logarithmic domain using Mitchell's approximation. Once in the logarithmic domain, several arithmetic operations including multiplication, division, and square-root can be easily computed using the integer arithmetic unit. This has earlier been used in fast reciprocal square-root algorithms, sometimes referred to as magic number algorithms. The proposed approximate operations are realized by performing an integer operation using an integer unit on floating-point data and adding an integer constant to obtain the approximate floating-point result. In this work, we derive easy to use equations and constants for multiple floating-point formats and operations.
@inproceedings{diva2:1636876,
author = {Gustafsson, Oscar and Hellman, Noah},
title = {{Approximate Floating-Point Operations with Integer Units by Processing in the Logarithmic Domain}},
booktitle = {2021 IEEE 28th Symposium on Computer Arithmetic (ARITH)},
year = {2021},
pages = {45--52},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
This work presents a method for on-line condition monitoring of a hydraulic rock drill, though some of the findings can likely be applied in other applications. A fundamental difficulty for the rock drill application is discussed, namely the similarity between frequencies of internal standing waves and rock drill operation. This results in unpredictable pressure oscillations and superposition, which makes synchronization between measurement and model difficult. To overcome this, a data driven approach is proposed. The number and types of sensors are restricted due to harsh environmental conditions, and only operational data is available. Some faults are shown to be detectable using hand-crafted engineering features, with a direct physical connection to the fault of interest. Such features are easily interpreted and are shown to be robust against disturbances. Other faults are detected by classifying measured signals against a known reference. Dynamic Time Warping is shown to be an efficient way to measure similarity for cyclic signals with stochastic elements from disturbances, wave propagation and different durations, and also for cases with very small differences in measured pressure signals. Together, the two methods enables a step towards condition monitoring of a rock drill, robustly detecting very small changes in behaviour using a minimum amount of sensors. Copyright (C) 2021 The Authors.
@inproceedings{diva2:1613877,
author = {Jakobsson, Erik and Frisk, Erik and Krysander, Mattias and Pettersson, Robert},
title = {{Fault Identification in Hydraulic Rock Drills from Indirect Measurement During Operation}},
booktitle = {IFAC PAPERSONLINE},
year = {2021},
pages = {73--78},
publisher = {ELSEVIER},
}
Overlap-save and overlap-add methods enable efficient implementation of FIR filters. In this paper, a compact method for handling the overlap and shuffle of samples for realtime processing using pipelined FFT architectures is presented. It is suitable for cases when the sample rate is equal to or higher than the clock frequency
@inproceedings{diva2:1593074,
author = {Bae, Cheolyong and Gustafsson, Oscar},
title = {{Overlap-Save Commutators for High-Speed Streaming Data Filtering}},
booktitle = {2021 IEEE International Symposium on Circuits and Systems (ISCAS)},
year = {2021},
series = {IEEE International Symposium on Circuits and Systems (ISCAS)},
publisher = {IEEE},
}
We perform exploratory ASIC design of key DSP and FEC units for 400-Gbit/s coherent data-center interconnect receivers. In 22-nm CMOS, the considered units together dissipate 5 W, suggesting implementation feasibility in power-constrained form factors. (C) 2020 The Authors
@inproceedings{diva2:1588494,
author = {Fougstedt, Christoffer and Gustafsson, Oscar and Bae, Cheolyong and Borjeson, Erik and Larsson-Edefors, Per},
title = {{ASIC Design Exploration for DSP and FEC of 400-Gbitis Coherent Data-Center Interconnect Receivers}},
booktitle = {2020 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC)},
year = {2020},
publisher = {IEEE},
}
When optimising the vehicle trajectory and powertrain energy management of hybrid electric vehicles, it is important to include look-ahead information such as road conditions and other traffic. One method for doing so is dynamic programming, but the execution time of such an algorithm on a general purpose CPU is too slow for it to be useable in real time. Significant improvements in execution time can be achieved by utilising parallel computations, for example, using a Field-Programmable Gate Array (FPGA). A tool for automatically converting a vehicle model written in C++ into code that can executed on an FPGA which can be used for dynamic programming-based control is presented in this paper. A vehicle model with a mild-hybrid powertrain is used as a case study to evaluate the developed tool and the output quality and execution time of the resulting hardware. Copyright (C) 2020 The Authors.
@inproceedings{diva2:1574085,
author = {Skarman, Frans and Gustafsson, Oscar and Jung, Daniel and Krysander, Mattias},
title = {{A Tool to Enable FPGA-Accelerated Dynamic Programming for Energy Management of Hybrid Electric Vehicles}},
booktitle = {IFAC PAPERSONLINE},
year = {2020},
pages = {15104--15109},
publisher = {ELSEVIER},
}
The life of a vehicle is heavily influenced by how it is used, and usage information is critical to predict the future condition of the machine. In this work we present a method to categorize what task an earthmoving vehicle is performing, based on a data driven model and a single standalone accelerometer. By training a convolutional neural network using a couple of weeks of labeled data, we show that a three axis accelerometer is sufficient to correctly classify between 5 different classes with an accuracy over 96% for a balanced dataset with no manual feature generation. The results are also compared against some other machine learning techniques, showing that the convolutional neural network has the highest performance, although other techniques are not far behind. An important conclusion is that methods and ideas from the area of Human Activity Recognition (HAR) are applicable also for vehicles. Copyright (C) 2020 The Authors.
@inproceedings{diva2:1572002,
author = {Jakobsson, Erik and Frisk, Erik and Krysander, Mattias and Pettersson, R.},
title = {{Automated Usage Characterization of Mining Vehicles For Life Time Prediction}},
booktitle = {IFAC PAPERSONLINE},
year = {2020},
pages = {11950--11955},
publisher = {ELSEVIER},
}
An implementation of activity detection for grant-free massive machine type communication is presented. The implemented algorithm is based on coordinate descent which shows a rapid convergence time. A number of modifications to the original algorithm is proposed to allow efficient implementation in hardware. In addition, the implementation is based on fixed-point representation, and, hence, exhaustive word length simulations have been performed for the different processing steps.
@inproceedings{diva2:1562575,
author = {Henriksson, Mikael and Gustafsson, Oscar and Kunnath Ganesan, Unnikrishnan and Larsson, Erik G.},
title = {{An Architecture for Grant-Free Random Access Massive Machine Type Communication Using Coordinate Descent}},
booktitle = {Proceedings of Fifty-Fourth Asilomar Conference on Signals, Systems and Computers},
year = {2020},
series = {Asilomar Conference on Signals, Systems and Computers},
volume = {54},
pages = {1112--1116},
publisher = {IEEE},
address = {Pacific Grove, CA, USA},
}
This paper presents a method to enhance fault isolation without adding physical sensors on a turbocharged spark ignited petrol engine system by designing additional residuals from an initial observer-based residuals setup. The best candidates from all potential additional residuals are selected using the concept of sequential residual generation to ensure best fault isolation performance for the least number of additional residuals required. A simulation testbed is used to generate realistic engine data for the design of the additional residuals and the fault isolation performance is verified using structural analysis method.
@inproceedings{diva2:1555427,
author = {Ng, Kok Yew and Frisk, Erik and Krysander, Mattias},
title = {{Design and Selection of Additional Residuals to Enhance Fault Isolation of a Turbocharged Spark Ignited Engine System}},
booktitle = {2020 7TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT20), VOL 1},
year = {2020},
series = {International Conference on Control Decision and Information Technologies},
pages = {76--81},
publisher = {IEEE},
}
Prime factor algorithms are beneficial in fully parallel frequency-domain implementation of CDC filters and enable a more continuous scaling of filter lengths. ASICimplementation results in 28-nm CMOS for 60 GBd are provided.
@inproceedings{diva2:1516425,
author = {Bae, Cheolyong and Larsson-Edefors, Per and Gustafsson, Oscar},
title = {{Benefit of Prime Factor FFTs in Fully Parallel 60 GBaud CDC Filters}},
booktitle = {OSA Advanced Photonics Congress (AP) 2020 (IPR, NP, NOMA, Networks, PVLED, PSC, SPPCom, SOF), Washington, DC United States, 13--16 July 2020},
year = {2020},
}
Chromatic dispersion is one of the error sources limiting the transmission capacity in coherent optical communication that can be mitigated with digital signal processing. In this paper, the current status and plans of implementation of chromatic dispersion compensation (CDC) filters on FPGAs are discussed. As these high-speed filters are most efficiently implemented in the frequency-domain, different approaches for high-speed FFT-based architectures are considered and preliminary results of fully parallel FFT implementation by utilizing FPGA hardware features are presented.
@inproceedings{diva2:1516412,
author = {Bae, Cheolyong and Gustafsson, Oscar},
title = {{High-Speed Chromatic Dispersion Compensation Filtering in FPGAs for Coherent Optical Communication}},
booktitle = {2020 30th International Conference on Field-Programmable Logic and Applications (FPL)},
year = {2020},
series = {International Conference on Field-Programmable Logic and Applications (FPL)},
pages = {357--358},
publisher = {IEEE},
}
By running simulation models on FPGAs, their execution speed can be significantly improved, at the cost of increased development effort. This paper describes a project to develop a tool which converts simulation models written in high level languages into fast FPGA hardware. The tool currently converts code written using custom C++ data types into Verilog. A model of a hybrid electric vehicle is used as a case study, and the resulting hardware runs significantly faster than on a general purpose CPU.
@inproceedings{diva2:1500582,
author = {Skarman, Frans and Gustafsson, Oscar and Jung, Daniel and Krysander, Mattias},
title = {{Acceleration of Simulation Models Through Automatic Conversion to FPGA Hardware}},
booktitle = {2020 30th International Conference on Field-Programmable Logic and Applications (FPL)},
year = {2020},
pages = {359--360},
publisher = {IEEE},
}
We perform exploratory ASIC design of key DSP and FEC units for 400-Gbit/s coherent data-center interconnect receivers. In 22-nm CMOS, the considered units together dissipate 5 W, suggesting implementation feasibility in power-constrained form factors.
@inproceedings{diva2:1484035,
author = {Fougstedt, Christoffer and Gustafsson, Oscar and Bae, Cheolyong and Börjesson, Erik and Larsson-Edefors, Per},
title = {{ASIC Design Exploration for DSP and FEC of 400-Gbit/s Coherent Data-Center Interconnect Receivers}},
booktitle = {Optical Fiber Communications Conference and Exhibition (OFC), San Diego, California, United States, 8--12 March, 2020},
year = {2020},
publisher = {Optical Society of America},
}
In this work, an implementation of a pilot-hopping sequence detector for massive machine type communication is presented. The architecture is based on solution a non-negative least squares problem. The results show that the architecture supporting 1024 users can perform more than one million detections per second with a power consumption of less than 70 mW when implemented in a 28 nm FD-SOI process.
@inproceedings{diva2:1471714,
author = {Mohammadi Sarband, Narges and Becirovic, Ema and Krysander, Mattias and Larsson, Erik G. and Gustafsson, Oscar},
title = {{Pilot-Hopping Sequence Detection Architecture for Grant-Free Random Access using Massive MIMO}},
booktitle = {2020 IEEE International Symposium on Circuits and Systems (ISCAS)},
year = {2020},
series = {International Symposium on Circuits and Systems (ISCAS)},
publisher = {IEEE},
}
In this work, a processing architecture for grant-free machine type communication based on compressive sensing is proposed. The architecture can be adapted for a number of parameters. An instantiation for 128 terminals and 96 antennas is implemented. Without memories it consumes 1.52 W and occupies and area of 5.1 mm(2) in a 28 nm SOI CMOS process. The implemented instance can process about 10k messages per second, each containing four bits.
@inproceedings{diva2:1459392,
author = {Tran, Markus and Gustafsson, Oscar and Källström, Petter and Senel, Kamil and Larsson, Erik G},
title = {{An Architecture for Grant-Free Massive MIMO MTC Based on Compressive Sensing}},
booktitle = {CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS \& COMPUTERS},
year = {2019},
series = {Conference Record of the Asilomar Conference on Signals Systems and Computers},
pages = {901--905},
publisher = {IEEE},
}
Modern applications for DSP systems are increasingly constrained by tight area and power requirements. Therefore, it is imperative to analyze effective strategies that work within these requirements. This paper studies the impact of finite word-length arithmetic on the signal to quantization noise ratio (SQNR), power and area for a real-valued serial FFT implementation. An experiment is set up using a hardware description language (HDL) to empirically determine the tradeoffs associated with the following parameters: (i) the input word-length, (ii) the word-length of the rotation coefficients, and (iii) length of the FFT on performance (SQNR), power and area. The results of this paper can be used to make design decisions by careful selection of word-length to achieve a reduction in area and power for an acceptable loss in SQNR.
@inproceedings{diva2:1355900,
author = {Unnikrishnan, Nanda K. and Garrido Gálvez, Mario and Parhi, Keshab K.},
title = {{Effect of Finite Word-Length on SQNR, Area and Power for Real-Valued Serial FFT}},
booktitle = {2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)},
year = {2019},
series = {IEEE International Symposium on Circuits and Systems},
publisher = {IEEE},
}
Predictive maintenance of components has the potential to significantly reduce costs for maintenance and to reduce unexpected failures. Failure prognostics for heavy-duty truck lead-acid batteries is considered with a multilayer perceptron (MLP) predictive model. Data used in the study contains information about how approximately 46,000 vehicles have been operated starting from the delivery date until the date when they come to the workshop. The model estimates a reliability and lifetime probability function for a vehicle entering a workshop. First, this work demonstrates how heterogeneous data is handled, then the architectures of the MLP models are discussed. Main contributions are a battery maintenance planning method and predictive performance evaluation based on reliability and lifetime functions, a new model for reliability function when its true shape is unknown, the improved objective function for training MLP models, and handling of imbalanced data and comparison of performance of different neural network architectures. Evaluation shows significant improvements of the model compared to more simple, time-based maintenance plans.
@inproceedings{diva2:1388334,
author = {Voronov, Sergii and Frisk, Erik and Krysander, Mattias},
title = {{Lead-acid battery maintenance using multilayer perceptron models}},
booktitle = {2018 IEEE International Conference on Prognostics and Health Management (ICPHM)},
year = {2018},
pages = {1--8},
}
In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. In this work, it is shown that it is not required to use two overlapping FFTs to obtain continuous filtering. In addition, efficient highly parallel implementation of FFTs is discussed and an unproved FFT compared to our earlier work is proposed. The results are compared to using an approach with a shorter FFT and FIR filters.
@inproceedings{diva2:1332814,
author = {Bae, Cheolyong and Gokhale, Madhur and Gustafsson, Oscar and Garrido Gálvez, Mario},
title = {{Improved Implementation Approaches for 512-tap 60 GSa/s Chromatic Dispersion FIR Filters}},
booktitle = {2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS},
year = {2018},
series = {Conference Record of the Asilomar Conference on Signals Systems and Computers},
pages = {213--217},
publisher = {IEEE},
}
Massive MIMO is key technology for the upcoming fifth generation cellular networks (5G), promising high spectral efficiency, low power consumption, and the use of cheap hardware to reduce costs. Previous work has shown how to create a distributed processing architecture, where each node in a network performs the computations related to one or more antennas. The required total number of antennas, M, at the base station depends on the number of simultaneously operating terminals, K. In this work, a flexible node architecture is presented, where the number of terminals can he traded for additional antennas at the same node. This means that the same node can be used with a wide range of system configurations. The computational complexity, along with the order in which to compute incoming and outgoing symbols is explored.
@inproceedings{diva2:1332801,
author = {Bertilsson, Erik and Gustafsson, Oscar and Larsson, Erik G},
title = {{A Modular Base Station Architecture for Massive MIMO with Antenna and User Scalability per Processing Node}},
booktitle = {2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS},
year = {2018},
series = {Conference Record of the Asilomar Conference on Signals Systems and Computers},
pages = {1649--1653},
publisher = {IEEE},
}
@inproceedings{diva2:1306311,
author = {Touqir Pasha, Muhammad and Haque, Muhammad Fahim Ul and Ahmad, Jahanzeb and Johansson, Ted},
title = {{An All-Digital Polar PWM Transmitter}},
booktitle = {Gigahertz 2018 symposium, Lund, Sweden, May 24-25, 2018},
year = {2018},
}
This paper presents an area-efficient fast Fourier transform (FFT) processor for orthogonal frequency-division multiplexing systems based on multi-path delay commutator architecture. This paper proposes a data scheduling scheme to reduce the number of complex constant multipliers. The proposed mixed-radix multi-path delay commutator FFT processor can support 128-, 256-, and 512-point FFT sizes. The proposed processor was synthesized using the Samsung 65-nm CMOS standard cell library. The proposed processor with eight parallel data paths can achieve a high throughput rate of up to 2.64 GSample/s at 330 MHz.
@inproceedings{diva2:1291938,
author = {Jang, Jeong Keun and Kim, Ho Keun and Sunwoo, Myung Hoon and Gustafsson, Oscar},
title = {{Area-Efficient Scheduling Scheme Based FFT Processor for Various OFDM Systems}},
booktitle = {2018 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2018)},
year = {2018},
pages = {338--341},
publisher = {IEEE},
}
This paper presents an area-efficient fast Fouriertransform (FFT) processor for orthogonal frequency-division multiplexing systems based on multi-path delay commutator architecture. This paper proposes a data scheduling scheme to reduce the number of complex constant multipliers. The proposed mixed-radix multi-path delay commutator FFT processor can support 128-, 256-, and 512-point FFT sizes. The proposed processor was synthesized using the Samsung 65-nm CMOS standard cell library. The proposed processor with eight parallel data paths can achieve a high throughput rate of up to 2.64 GSample/s at 330 MHz.
@inproceedings{diva2:1266388,
author = {Jang, Jeong Keun and Kim, Ho Keun and Sunwoo, Myung Hoon and Gustafsson, Oscar},
title = {{Area-efficient scheduling scheme based FFT processor for various OFDM systems}},
booktitle = {IEEE APCCAS 2018, the 14th of the biennial Asia Pacific Conference on Circuits and Systems, Chengdu, China, Oct 26-30, 2018},
year = {2018},
}
In this work, an approach for transposing solutions to the multiple constant multiplication (MCM) problem to obtain a sum of product (SOP) computation with minimum depth is proposed. The reason for doing this is that solving the SOP problem directly is highly computationally intensive when adder graph algorithms are used. Compared to using subexpression sharing algorithms, which has a lower computational complexity, directly for the SOP problem, it is shown that the proposed approach, as expected, results in lower complexity for the SOP. It is also shown that there is no obvious way to construct the MCM solution in such a way that the SOP solution has the minimum theoretical depth. However, the proposed approach guarantees minimum depth subject to the MCM solution given as input.
@inproceedings{diva2:1262172,
author = {Mohammadi Sarband, Narges and Gustafsson, Oscar and Garrido, Mario},
title = {{Obtaining Minimum Depth Sum of Products from Multiple Constant Multiplication}},
booktitle = {PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), IEEE},
year = {2018},
series = {IEEE Workshop on Signal Processing Systems},
pages = {134--139},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
To leverage on model based engineering for fault diagnosis, it is useful to be able to do direct analysis of general purpose modelling languages for engineering systems. In this work, it is demonstrated how non-trivial Modelica models, for example utilizing the Modelica standard library, can be automatically transformed into a format where existing fault diagnosis analysis techniques are applicable. The procedure is demonstrated on a model of an air cooling system in the Gripen fighter aircraft developed by Saab, Sweden. It is discussed why the Modelica language is well suited for diagnosability analysis, and a number of non-trivial diagnosability analysis shows the efficacy of the approach. The methods extract the model structure, which gives additional insight into the system, e.g., highlighting model connections and possible model decompositions. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
@inproceedings{diva2:1262058,
author = {Krysander, Mattias and Frisk, Erik and Lind, Ingela and Nilsson, Ylva},
title = {{Diagnosis Analysis of Modelica Models}},
booktitle = {IFAC PAPERSONLINE},
year = {2018},
series = {IFAC papers online},
pages = {153--159},
publisher = {ELSEVIER SCIENCE BV},
}
A common architecture of model-based diagnosis systems is to use a set of residuals to detect and isolate faults. In the paper it is motivated that in many cases there are more possible candidate residuals than needed for detection and single fault isolation and key sources of varying performance in the candidate residuals are model errors and noise. This paper formulates a systematic method of how to select, from a set of candidate residuals, a subset with good diagnosis performance. A key contribution is the combination of a machine learning model, here a random forest model, with diagnosis specific performance specifications to select a high performing subset of residuals. The approach is applied to an industrial use case, an automotive engine, and it is shown how the trade-off between diagnosis performance and the number of residuals easily can be controlled. The number of residuals used are reduced from original 42 to only 12 without losing significant diagnosis performance. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
@inproceedings{diva2:1262059,
author = {Frisk, Erik and Krysander, Mattias},
title = {{Residual Selection for Consistency Based Diagnosis Using Machine Learning Models}},
booktitle = {IFAC PAPERSONLINE},
year = {2018},
series = {IFAC papers online},
pages = {139--146},
publisher = {ELSEVIER SCIENCE BV},
}
This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18x25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18x18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPGA case, this can be obtained by using the embedded multipliers as 16x24 unsigned and as 17x25 signed ones.The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.
@inproceedings{diva2:1245437,
author = {Kumm, Martin and Gustafsson, Oscar and de Dinechin, Florent and Kappauf, Johannes and Zipf, Peter},
title = {{Karatsuba with Rectangular Multipliers for FPGAs}},
booktitle = {2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH)},
year = {2018},
series = {International Symposium on Computer Arithmetic},
pages = {13--20},
publisher = {IEEE},
}
The life and condition of a MT65 mine truck frame is to a large extent related to how the machine is used. Damage from different stress cycles in the frame are accumulated over time, and measurements throughout the life of the machine are needed to monitor the condition. This results in high demands on the durability of sensors used. To make a monitoring system cheap and robust enough for a mining application, a small number of robust sensors are preferred rather than a multitude of local sensors such as strain gauges. The main question to be answered is whether a low number of robust on-board sensors can give the required information to recreate stress signals at various locations of the frame. Also the choice of sensors among many different locations and kinds are considered. A final question is whether the data could also be used to estimate road condition. By using accelerometer, gyroscope and strain gauge data from field tests of an Atlas Copco MT65 mine truck, coherence and Lasso-regression were evaluated as means to select which signals to use. ARX-models for stress estimation were created using the same data. By simulating stress signals using the models, rain flow counting and damage accumulation calculations were performed. The results showed that a low number of on-board sensors like accelerometers and gyroscopes could give enough information to recreate some of the stress signals measured. Together with a linear model, the estimated stress was accurate enough to evaluate the accumulated fatigue damage in a mining truck. The accumulated damage was also used to estimate the condition of the road on which the truck was traveling. To make a useful road monitoring system some more work is required, in particular regarding how vehicle speed influences damage accumulation.
@inproceedings{diva2:1259820,
author = {Jakobsson, Erik and Frisk, Erik and Pettersson, Robert and Krysander, Mattias},
title = {{Data driven modeling and estimation of accumulated damage in mining vehicles using on-board sensors}},
booktitle = {PHM 2017. Proceedings of the Annual Conference of the Prognostics and Health Management Society 2017, St. Petersburg, Florida, USA, October 2--5, 2017},
year = {2017},
series = {Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM},
pages = {98--107},
publisher = {Prognostics and Health Management Society},
}
Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.
@inproceedings{diva2:1248917,
author = {Bertilsson, Erik and Gustafsson, Oscar and Larsson, Erik G.},
title = {{Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO}},
booktitle = {2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS},
year = {2017},
pages = {466--469},
}
In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. Furthermore, the theoretical multiplication complexity estimates are not good predictors for the energy consumption. The results show that the implementation with 10% more multiplications per sample has half the power consumption and one third of the area consumption. The best architecture for this specification results in a power consumption of 3.12 W in a 65 nm technology, corresponding to an energy per complex filter tap of 0.10 mW/GHz.
@inproceedings{diva2:1245365,
author = {Kovalev, Anton and Gustafsson, Oscar and Garrido, Mario},
title = {{Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters}},
booktitle = {Conference Record of The Fifty-First Asilomar Conference on Signals, Systems \& Computers},
year = {2017},
series = {Signals, Systems, and Computers},
volume = {2017},
pages = {1779--1783},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
To facilitate the use of advanced fault diagnosis analysis and design techniques to industrial sized systems, there is a need for computer support. This paper describes a Matlab toolbox and evaluates the software on a challenging industrial problem, air-path diagnosis in an automotive engine. The toolbox includes tools for analysis and design of model based diagnosis systems for large-scale differential algebraic models. The software package supports a complete tool-chain from modeling a system to generating C-code for residual generators. Major design steps supported by the tool are modeling, fault diagnosability analysis, sensor selection, residual generator analysis, test selection, and code generation. Structural methods based on efficient graph theoretical algorithms are used in several steps. In the automotive diagnosis example, a diagnosis system is generated and evaluated using measurement data, both in fault-free operation and with faults injected in the control-loop. The results clearly show the benefit of the toolbox in a model-based design of a diagnosis system. Latest version of the toolbox can be downloaded at faultdiagnosistoolbox.github.io. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
@inproceedings{diva2:1205862,
author = {Frisk, Erik and Krysander, Mattias and Jung, Daniel},
title = {{A Toolbox for Analysis and Design of Model Based Diagnosis Systems for Large Scale Models}},
booktitle = {IFAC PAPERSONLINE},
year = {2017},
series = {IFAC Papers Online},
pages = {3287--3293},
publisher = {ELSEVIER SCIENCE BV},
}
The charge sustaining mode of a hybrid electric vehicle maintains the state of charge of the battery within a predetermined narrow band. Due to the poor system observability in this range, the state of charge estimation is tricky, and inadequate prior knowledge of the system uncertainties could lead to deterioration and divergence of estimates. In this paper, a comparative study of three estimators tuned based on the noise covariance matching technique is established in order to analyze their robustness in the state of charge estimation. Simulation results show a significant enhancement of filter accuracy using this adaptation. The adaptive particle filter has the best estimation results but it is vulnerable to model parameter uncertainties, further it is time consuming. On the other hand, the adaptive Unscented Kalman filter and the adaptive Extended Kalman filter show enough estimation accuracy, robustness for model uncertainty, and simplicity of implementation. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
@inproceedings{diva2:1197309,
author = {Mansour, Imene and Frisk, Erik and Jemni, Adel and Krysander, Mattias and Liouane, Noureddine},
title = {{State of Charge Estimation Accuracy in Charge Sustainable Mode of Hybrid Electric Vehicles}},
booktitle = {IFAC PAPERSONLINE},
year = {2017},
series = {IFAC Papersonline},
pages = {2158--2163},
publisher = {ELSEVIER SCIENCE BV},
}
Structural approaches have shown to be useful for analyzing and designing diagnosis systems for industrial systems. In simulation and estimation literature, related theories about differential index have been developed and, also there, structural methods have been successfully applied for simulating large-scale differential algebraic models. A main contribution of this paper is to connect those theories and thus making the tools from simulation and estimation literature available for model based diagnosis design. A key step in the unification is an extension of the notion of differential index of exactly determined systems of equations to overdetermined systems of equations. A second main contribution is how differential-index can be used in diagnosability analysis and also in the design stage where an exponentially sized search space is significantly reduced. This allows focusing on residual generators where basic design techniques, such as standard state-observation techniques and sequential residual generation are directly applicable. The developed theory has a direct industrial relevance, which is illustrated with discussions on an automotive engine example. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
@inproceedings{diva2:1187160,
author = {Frisk, Erik and Krysander, Mattias and Åslund, Jan},
title = {{Analysis and Design of Diagnosis Systems Based on the Structural Differential Index}},
booktitle = {20th IFAC World Congress},
year = {2017},
series = {IFAC PAPERSONLINE},
pages = {12236--12242},
publisher = {ELSEVIER SCIENCE BV},
}
Approximate matrix inversion based on Neumann series has seen a recent increased interest motivated by massive MIMO systems. There, the matrices are in many cases diagonally dominant, and, hence, a reasonable approximation can be obtained within a few iterations of a Neumann series. In this work, we clarify that the complexity of exact methods are about the same as when three terms are used for the Neumann series, so in this case, the complexity is not lower as often claimed. The second common argument for Neumann series approximation, higher parallelism, is indeed correct. However, in most current practical use cases, such a high degree of parallelism is not required to obtain a low latency realization. Hence, we conclude that a careful evaluation, based on accuracy and latency requirements must be performed and that exact matrix inversion is in fact viable in many more cases than the current literature claims.
@inproceedings{diva2:1121300,
author = {Gustafsson, Oscar and Bertilsson, Erik and Klasson, Johannes and Ingemarsson, Carl},
title = {{Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)}},
booktitle = {Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017},
year = {2017},
series = {Proceedings Symposium on Computer Arithmetic},
volume = {2017},
pages = {62--63},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Lifting-based complex multiplications and rotations are integer invertible, i.e., an integer input value is mapped to the same integer output value when rotating forward and backward. This is an important aspect for lossless transform-based source coding, but since the structure only require three real-valued multiplications and three real-valued additions it is also a potentially attractive way to perform complex multiplications when the coefficient has unity magnitude. In this work, we consider two aspects of these structures. First, we show that both the magnitude and angular error is dependent on the angle of input value and derive both exact and approximated expressions for these. Second, we discuss how to design such structures without the typical separation into three subsequent matrix multiplications. It is shown that the proposed design method allows many more values which are integer invertible, but can not be separated into three subsequent matrix multiplications with fixed-point values. The results show good correspondence between the error approximations and the actual error as well as a significantly increased design space.
@inproceedings{diva2:1121297,
author = {Gustafsson, Oscar},
title = {{On Lifting-Based Fixed-Point Complex Multiplications and Rotations}},
booktitle = {Proceedings 24th IEEE Symposium on Computer Arithmetic 24--26 July 2017 London, United Kingdom},
year = {2017},
series = {Proceedings Symposium on Computer Arithmetic},
volume = {2017},
pages = {43--49},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Detecting changes in residuals is important for fault detection and is commonly performed by thresholding the residual using, for example, a CUSUM test. However, detecting variations in the residual distribution, not causing a change of bias or increased variance, is difficult using these methods. A plug-and-play residual change detection approach is proposed based on sequential quantile estimation to detect changes in the residual cumulative density function. An advantage of the proposed algorithm is that it is non-parametric and has low computational cost and memory usage which makes it suitable for on-line implementations where computational power is limited.
@inproceedings{diva2:1109456,
author = {Jung, Daniel and Frisk, Erik and Krysander, Mattias},
title = {{Residual change detection using low-complexity sequential quantile estimation}},
booktitle = {20th IFAC World Congress},
year = {2017},
series = {IFAC-PapersOnLine},
pages = {14064--14069},
}
In this paper we propose an efficient hardware architecture for computation of matrix inversion of positive definite matrices. The algorithm chosen is LDL decomposition followed directly by equation system solving using back substitution. The architecture combines a high throughput with an efficient utilization of its hardware units. We also report FPGA implementation results that show that the architecture is well tailored for implementation in real-time applications.
@inproceedings{diva2:1135176,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{Hardware Architecture for Positive Definite Matrix Inversion Based on LDL Decomposition and Back-Substitution}},
booktitle = {2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS},
year = {2016},
series = {Conference Record of the Asilomar Conference on Signals Systems and Computers},
pages = {859--863},
publisher = {IEEE COMPUTER SOC},
}
Massive MIMO-systems have received considerable attention in recent years as an enabler in future wireless communication systems. As the idea is based on having a large number of antennas at the base station it is important to have both a scalable and distributed realization of such a system to ease deployment. Most work so far have focused on the theoretical aspects although a few demonstrators have been reported. In this work, we propose a base station architecture based on connecting the processing nodes in a K-ary tree, allowing simple scalability. Furthermore, it is shown that most of the processing can be performed locally in each node. Further analysis of the node processing shows that it should be enough that each node contains one or two complex multipliers and a few complex adders/subtracters operating at some hundred MHz. It is also shown that a communication link of some Gbps is required between the nodes, and, hence, it is fully feasible to have one or a few links between the nodes to cope with the communication requirements.
@inproceedings{diva2:1135164,
author = {Bertilsson, Erik and Gustafsson, Oscar and Larsson, Erik G},
title = {{A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing}},
booktitle = {2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS},
year = {2016},
series = {Conference Record of the Asilomar Conference on Signals Systems and Computers},
pages = {864--868},
publisher = {IEEE COMPUTER SOC},
address = {Washington},
}
A hybrid diagnosis system design is proposed that combines model-based and data-driven diagnosis methods for fault isolation. A set of residuals are used to detect if there is a fault in the system and a consistency-based fault isolation algorithm is used to compute all diagnosis candidates that can explain the triggered residuals. To improve fault isolation, diagnosis candidates are ranked by evaluating the residuals using a set of one-class support vector machines trained using data from different faults. The proposed diagnosis system design is evaluated using simulations of a model describing the air-flow in an internal combustion engine.
@inproceedings{diva2:1074340,
author = {Jung, Daniel and Yew Ng, Kok and Frisk, Erik and Krysander, Mattias},
title = {{A combined diagnosis system design using model-based and data-driven methods}},
booktitle = {2016 3RD CONFERENCE ON CONTROL AND FAULT-TOLERANT SYSTEMS (SYSTOL)},
year = {2016},
series = {Conference on Control and Fault-Tolerant Systems},
pages = {177--182},
publisher = {IEEE},
}
Most modern FPGAs have very optimised carry logic for efficient implementations of ripple carry adders (RCA). Some FPGAs also have a six input look up table (LUT) per cell, whereof two inputs are used during normal addition. In this paper we present an architecture that compresses the carry chain length to N/2 in recent Xilinx FPGA, by utilising the LUTs better. This carry compression was implemented by letting some cells calculate the carry chain in two bits per cell, while some others calculate the summary output bits. In total the proposed design uses no more hardware than the normal adder. The result shows that the proposed adder is faster than a normal adder for word length larger than 64 bits in Virtex-6 FPGAs.
@inproceedings{diva2:967655,
author = {Källström, Petter and Gustafsson, Oscar},
title = {{Fast and Area Efficient Adder for Wide Data in Recent Xilinx FPGAs}},
booktitle = {26th International Conference on Field-Programmable Logic and Applications},
year = {2016},
series = {Field Programmable Logic and Applications, International Conference on},
pages = {338--341},
publisher = {IEEE},
address = {Lausanne},
}
With the arrival of heterogeneous manycores comprising various features to support task, data and instruction-level parallelism, developing applications that take full advantage of the hardware parallel features has become a major challenge. In this paper, we present an extension to our CAL compilation framework (CAL2Many) that supports data parallelism in the CAL Actor Language. Our compilation framework makes it possible to program architectures with SIMD support using high-level language and provides efficient code generation. We support general SIMD instructions but the code generation backend is currently implemented for two custom architectures, namely ePUMA and EIT. Our experiments were carried out for two custom SIMD processor architectures using two applications.
The experiment shows the possibility of achieving performance comparable to hand-written machine code with much less programming effort.
@inproceedings{diva2:956510,
author = {Gebrewahid, Essayas and Ali Arslan, Mehmet and Karlsson, Andr\'{e}as and Ul-Abdin, Zain},
title = {{Support for Data Parallelism in the CAL Actor Language}},
booktitle = {PROCEEDINGS OF THE 2016 3RD WORKSHOP ON PROGRAMMING MODELS FOR SIMD/VECTOR PROCESSING (WPMVP 2016)},
year = {2016},
pages = {1--8},
publisher = {Association for Computing Machinery (ACM)},
address = {New York, NY},
}
@inproceedings{diva2:928615,
author = {Ul Haque, Muhammad Fahim and Johansson, Ted and Liu, Dake},
title = {{Large dynamic range PWM transmitter}},
booktitle = {Swedish Microwave Days - GigaHertz and AntennEMB and the GigaHertz Symposium ,15-16 Mars, 2016.Konsert \& Kongress Center in Linköping},
year = {2016},
pages = {34--},
address = {Linkoping},
}
@inproceedings{diva2:928620,
author = {Ul Haque, Muhammad Fahim and Johansson, Ted and Liu, Dake},
title = {{Power Efficienct Band-limited Pulse Width Modulated Transmitter}},
booktitle = {Swedish System on Chip Conference (SSoCC 2015) in Goteborg. 4-5 maj 2015, Novotel Hotel, Göteborg},
year = {2015},
address = {Gothenburg},
}
In this work we explore the trade-offs between established algorithms for symmetric matrix inversion for fixed-point hardware implementation. Inversion of symmetric positive definite matrices finds applications in many areas, e.g. in MIMO detection and adaptive filtering. We explore computational complexity and show simulation results where numerical properties are analyzed. We show that LDLT decomposition combined with equation system solving are the most promising algorithm for fixed-point hardware implementation. We further show that simply counting the number of operations does not establish a valid comparison between the algorithms as the required word lengths differ significantly.
@inproceedings{diva2:974167,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{On fixed-point implementation of symmetric matrix inversion}},
booktitle = {Proceedings of the European Conference on Circuit Theory and Design (ECCTD)},
year = {2015},
pages = {440--443},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
In this work we explore the trade-offs between established algorithms for symmetric matrix inversion for fixed-point hardware implementation. Inversion of symmetric positive definite matrices finds applications in many areas, e.g. in MIMO detection and adaptive filtering. We explore computational complexity and show simulation results where numerical properties are analyzed. We show that LDLT decomposition combined with equation system solving are the most promising algorithm for fixed-point hardware implementation. We further show that simply counting the number of operations does not establish a valid comparison between the algorithms as the required word lengths differ significantly.
@inproceedings{diva2:898187,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{On fixed-point implementation of symmetric matrix inversion}},
booktitle = {Proceedings of the European Conference on Circuit Theory and Design (ECCTD)},
year = {2015},
pages = {1--4},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
This paper considers frequency-domain implementation of finite-length impulse response filters. In practical fixed-point arithmetic implementations, the overall system corresponds to a time-varying system which can be represented with either a multirate filter bank, and the corresponding distortion and aliasing functions, or a periodic time-varying impulse-response representation or, equivalently, a set of impulse responses and the corresponding frequency responses. The paper provides systematic derivations and analyses of these representations along with design examples. These representations are useful when analyzing the effect of coefficient quantizations as well as the use of shorter DFT lengths than theoretically required.
@inproceedings{diva2:894991,
author = {Johansson, Håkan and Gustafsson, Oscar},
title = {{On frequency-domain implementation of digital FIR filters}},
booktitle = {IEEE International Conference on Digital Signal Processing (DSP), 2015},
year = {2015},
pages = {315--318},
publisher = {IEEE},
}
This paper introduces all-digital flexible channelizersand aggregators for multi-standard video distribution. The overall problem is to aggregate a number of narrow-band subsignals with different bandwidths (6, 7, or 8 MHz) into one composite wide-band signal. In the proposed scheme, this is carried out through a set of analysis filter banks (FBs), that channelize the subsignals into 1/2-MHz subbands, which subsequently are aggregated through one synthesis FB. In this way, full flexibility with a low computational complexity and maintained quality is enabled. The proposed solution offers orders-of-magnitude complexity reductions as compared with a straightforward alternative. Design examples are included that demonstrate the functionality, flexibility, and efficiency.
@inproceedings{diva2:891968,
author = {Johansson, Håkan and Gustafsson, Oscar},
title = {{Filter-Bank Based All-Digital Channelizers and Aggregators for Multi-Standard Video Distribution}},
booktitle = {IEEE International Conference on Digital Signal Processing (DSP), 2015},
year = {2015},
pages = {1117--1120},
publisher = {IEEE},
}
@inproceedings{diva2:871994,
author = {Haque, Muhammad Fahim Ul and Johansson, Ted and Liu, Dake},
title = {{Combined RF and Multiphase PWM Transmitter}},
booktitle = {Swedish System on Chip Conference (SSoCC'15), Göteborg, Sweden, May 4-5 2015},
year = {2015},
}
This paper presents two novel transmitter architectures based on the combination of radio-frequency pulse-width modulation and multiphase pulse-width modulation. The proposed transmitter architectures provide good amplitude resolution and large dynamic range at high carrier frequency, which is problematic with existing radio-frequency pulse-width modulation based transmitters. They also have better power efficiency and smaller chip area compared to multiphase pulse-width modulation based transmitters.
@inproceedings{diva2:871760,
author = {Haque, Muhammad Fahim Ul and Johansson, Ted and Liu, Dake},
title = {{Combined RF and Multiphase PWM Transmitter}},
booktitle = {2015 European Conference on Circuit Theory and Design (ECCTD)},
year = {2015},
pages = {264--267},
publisher = {IEEE},
}
@inproceedings{diva2:871759,
author = {Haque, Muhammad Fahim Ul and Johansson, Ted and Liu, Dake},
title = {{Modified Band-limited Pulse-Width Modulated Polar Transmitter}},
booktitle = {15th International Symposium on Microwave and Optical Technology (ISMOT 2015), Dresden, Germany, June 29-July 1 2015},
year = {2015},
}
The most challenging step of implementing particle filtering is the resampling step which replicates particles with large weights and discards those with small weights. In this paper, we propose a generic architecture for resampling which uses double multipliers to avoid normalization divisions and make the architecture equally efficient for non-powers-of-two number of particles. Furthermore, the complexity of resampling is greatly affected by the size of memories used to store weights. We illustrate that by storing the original weights instead of their cumulative sum and calculating them online reduces the total complexity, in terms of area, ranging from 21% to 45%, while giving up to 50% reduction in memory usage.
@inproceedings{diva2:862758,
author = {Alam, Syed Asad and Gustafsson, Oscar},
title = {{Generalized Division-Free Architecture and Compact Memory Structure for Resampling in Particle Filters}},
booktitle = {2015 European Conference on Circuit Theory and Design (ECCTD)},
year = {2015},
pages = {416--419},
publisher = {IEEE Press},
}
This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks suitable for distributed memory architectures, which allows very simple scheduling and predictable data-independent sorting time.The implementation on ePUMA utilizes the architecture's specialized compute cores and control cores, and local memory parallelism, to separate and overlap sorting with data access and control for close to stall-free sorting.Penalty-free unaligned and out-of-order local memory access is used in combination with proposed application-specific sorting instructions to derive highly efficient local sorting and merging kernels used by the system-level algorithm.Our evaluation shows that the proposed implementation can rival the sorting performance of high-performance commercial CPUs and GPUs, with two orders of magnitude higher energy efficiency, which would allow high-performance sorting on low-power devices.
@inproceedings{diva2:844273,
author = {Karlsson, Andreas and Sohl, Joar and Liu, Dake},
title = {{Energy-efficient sorting with the distributed memory architecture ePUMA}},
booktitle = {IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)},
year = {2015},
pages = {116--123},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Discrete Fourier transforms of 3 and 5 points are essential building blocks in FFT implementations for standards such as 3GPP-LTE. In addition to being more complex than 2 and 4 point DFTs, these DFTs also cause problems with data access in SDR-DSPs, since the data access width, in general, is a power of 2. This work derives mappings of these DFTs to a 4-way SIMD datapath that has been designed with 2 and 4-point DFT in mind. Our instruction set proposals, based on modified Winograd DFT, achieves single cycle execution of 3-point DFTs and 2.25 cycle average execution of 5-point DFTs in a cost-effective manner by reutilizing the already available arithmetic units. This represents an approximate speed-up of 3 times compared to an SDR-DSP with only MAC-support. In contrast to our more general design, we also demonstrate that a typical single-purpose FFT-specialized 5-way architecture only delivers 9% to 25% extra performance on average, while requiring 85% more arithmetic units and a more expensive memory subsystem.
@inproceedings{diva2:844270,
author = {Karlsson, Andreas and Sohl, Joar and Liu, Dake},
title = {{Cost-efficient Mapping of 3- and 5-point DFTs to General Baseband Processors}},
booktitle = {International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015},
year = {2015},
pages = {780--784},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
Since the breakdown of Dennard scaling the primary design goal for processor designs has shifted from increasing performance to increasing performance per Watt. The ePUMA platform is a flexible and configurable DSP platform that tries to address many of the problems with traditional DSP designs, to increase performance, but use less power. We trade the flexibility of traditional VLIW DSP designs for a simpler single instruction issue scheme and instead make sure that each instruction can perform more work. Multi-cycle instructions can operate directly on vectors and matrices in memory and the datapaths implement common DSP subgraphs directly in hardware, for high compute throughput. Memory bottlenecks, that are common in other architectures, are handled with flexible LUT-based multi-bank memory addressing and memory parallelism. A major contributor to energy consumption, data movement, is reduced by using heterogeneous interconnect and clustering compute resources around local memories for simple data sharing. To evaluate ePUMA we have implemented the majority of the kernel library from a commercial VLIW DSP manufacturer for comparison. Our results not only show good performance, but also an order of magnitude increase in energy- and area efficiency. In addition, the kernel code size is reduced by 91% on average compared to the VLIW DSP. These benefits makes ePUMA an attractive solution for future DSP.
@inproceedings{diva2:844264,
author = {Karlsson, Andreas and Sohl, Joar and Liu, Dake},
title = {{ePUMA: A Processor Architecture for Future DSP}},
booktitle = {International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015},
year = {2015},
pages = {253--257},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
This paper demonstrates how QPP interleaving and de-interleaving for Turbo decoding in 3GPP-LTE can be implemented efficiently on baseband processors with lookup-table (LUT) based addressing support of multi-bank memory. We introduce a LUT-compression technique that reduces LUT size to 1% of what would otherwise be needed to store the full data access patterns for all LTE block sizes. By reusing the already existing program memory of a baseband processor to store LUTs and using our proposed general address generator, our 8-way data access path can reach the same throughput as a dedicated 8-way interleaving ASIC implementation. This avoids the addition of a dedicated interleaving address generator to a processor which, according to ASIC synthesis, would be 75\% larger than our proposed address generator. Since our software implementation only involves the address generator, the processor's datapaths are free to perform the other operations of Turbo decoding in parallel with interleaving. Our software implementation ensure programmability and flexibility and is the fastest software-based implementation of QPP interleaving known to us.
@inproceedings{diva2:844262,
author = {Karlsson, Andreas and Sohl, Joar and Liu, Dake},
title = {{Software-based QPP Interleaving for Baseband DSPs with LUT-accelerated Addressing}},
booktitle = {International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015},
year = {2015},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
To be able to evaluate quantitative fault diagnosability performance in model-based diagnosis is useful during the design of a diagnosis system. Different fault realizations are more or less likely to occur and the fault diagnosis problem is complicated by model uncertainties and noise. Thus, it is not obvious how to evaluate performance when all of this information is taken into consideration. Four candidates for quantifying fault diagnosability performance between fault modes are discussed. The proposed measure is called expected distinguishability and is based of the previous distinguishability measure and two methods to compute expected distinguishability are presented.
@inproceedings{diva2:806669,
author = {Jung, Daniel and Frisk, Erik and Krysander, Mattias},
title = {{Quantitative isolability analysis of different fault modes}},
booktitle = {9th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes SAFEPROCESS 2015 -- Paris, 2--4 September 2015},
year = {2015},
series = {IFAC-PapersOnLine},
pages = {1275--1282},
publisher = {Elsevier},
}
For high-speed delta-sigma modulators the decimation filters are typically polyphase FIR filters as the recursive CIC filters can not be implemented because of the iteration period bound. In addition, the high clock frequency and short input word length make multiple constant multiplication techniques less beneficial. Instead a realistic complexity measure in this setting is the number of non-zero digits of the FIR filter tap coefficients. As there is limited control of the passband approximation error for CIC-based filters these must in most cases be compensated to meet a passband specification. In this work we investigate the complexity of decimation filters meeting CIC-like stopband behavior, but with a well defined passband approximation error. It is found that the general approach can in many cases produce filters with much smaller passband approximation error at a similar complexity.
@inproceedings{diva2:790454,
author = {Gustafsson, Oscar and Johansson, Håkan},
title = {{Decimation Filters for High-Speed Delta-Sigma Modulators With Passband Constraints: General Versus CIC-Based FIR Filters}},
booktitle = {2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)},
year = {2015},
series = {IEEE International Symposium on Circuits and Systems},
pages = {2205--2208},
publisher = {IEEE conference proceedings},
}
An 8-bit time-to-digital converter (TDC) for all-digital frequency-locked loops ispresented. The selected architecture uses a Vernier delay line where the commonlyused D flip-flops are replaced with a single enable transistor in the delay elements.This architecture allows for an area efficient and power efficient implementation. Thetarget application for the TDC is an all-digital frequency-locked loop which is alsooverviewed in the paper. A prototype chip has been implemented in a 65 nm CMOSprocess with an active core area of 75μmˆ120μm. The time resolution is 5.7 ps with apower consumption of 1.85 mW measured at 50 MHz sampling frequency.
@inproceedings{diva2:768523,
author = {Andersson, Niklas and Vesterbacka, Mark},
title = {{Power-efficient time-to-digital converter for all-digital frequency locked loops}},
booktitle = {2015 EUROPEAN CONFERENCE ON CIRCUIT THEORY AND DESIGN (ECCTD)},
year = {2015},
pages = {300--303},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
}
A number of residual generation methods have been developed for robust model-based fault detection and isolation (FDI). There have also been a number of offline (i.e., design-time) methods that focus on optimizing FDI performance (e.g., trading off detection performance versus cost). However, design-time algorithms are not tuned to optimize performance for different operating regions of system behavior. To do this, would need to define online measures of sensitivity and robustness, and use them to select the best residual set online as system behavior transitions between operating regions. In this paper we develop a quantitative measure of residual performance, called the detectability ratio that applies to additive and multiplicative uncertainties when determining the best residual set in different operating regions. We discuss this methodology and demonstrate its effectiveness using a case study.
@inproceedings{diva2:1095567,
author = {Khorasgani, Hamed and Jung, Daniel and Biswas, Gautam and Frisk, Erik and Krysander, Mattias},
title = {{Robust Residual Selection for Fault Detection}},
booktitle = {2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC)},
year = {2014},
pages = {5764--5769},
publisher = {IEEE},
}
Subspace identification is a classical and very well studied problem in system identification. The problem was recently posed as a convex optimization problem via the nuclear norm relaxation. Inspired by robust PCA, we extend this framework to handle outliers. The proposed framework takes the form of a convex optimization problem with an objective that trades off fit, rank and sparsity. As in robust PCA, it can be problematic to find a suitable regularization parameter. We show how the space in which a suitable parameter should be sought can be limited to a bounded open set of the two-dimensional parameter space. In practice, this is very useful since it restricts the parameter space that is needed to be surveyed.
@inproceedings{diva2:1095566,
author = {Sadigh, Dorsa and Ohlsson, Henrik and Shankar Sastry, S. and Seshia, Sanjit A.},
title = {{Robust Subspace System Identification via Weighted Nuclear Norm Optimization}},
booktitle = {IFAC PAPERSONLINE},
year = {2014},
pages = {9510--9515},
publisher = {ELSEVIER SCIENCE BV},
}
Piecewise affine (PWA) models serve as an important class of models for nonlinear systems. The identification of PWA models is known to be a difficult task and often implies solving a non-convex combinatorial optimization problems. In this paper, we revisit a recently proposed PWA identification method. We do this to give a novel derivation of the identification method and to show that under certain conditions, the method is optimal in the sense that it finds the PWA function that passes through the measurements and has the least number of hinges. We also show how the alternating direction method of multipliers (ADMM) can be used to solve the underlying convex optimization problem.
@inproceedings{diva2:1095565,
author = {Maruta, Ichiro and Ohlsson, Henrik},
title = {{Compression Based Identification of PWA Systems}},
booktitle = {IFAC PAPERSONLINE},
year = {2014},
pages = {4985--4992},
publisher = {ELSEVIER SCIENCE BV},
}
This paper analyzes the limits of FFT performance on FPGAs. For this purpose, a FFT generation tool has been developed. This tool is highly parameterizable and allows for generating FFTs with different FFT sizes and amount of parallelization. Experimental results for FFT sizes from 16 to 65536, and 4 to 64 parallel samples have been obtained. They show that even the largest FFT architectures fit well in today's FPGAs, achieving throughput rates from several GSamples/s to tens of GSamples/s.
@inproceedings{diva2:927920,
author = {Garrido, Mario and Acevedo, Miguel and Ehliar, Andreas and Gustafsson, Oscar},
title = {{Challenging the Limits of FFT Performance on FPGAs}},
booktitle = {2014 International Symposium on Integrated Circuits (ISIC) 10-12 December 2014 Singapore},
year = {2014},
pages = {172--175},
publisher = {IEEE},
}
Nowadays multi-standard wireless baseband, Convolutional Code (CC), Turbo code and LDPC code are widely applied and need to be integrated within one FEC module. Since memory occupies half or even more area of the decoder, memory sharing techniques for area saving purpose is valuable to consider. In this work, several memory merging techniques are proposed. A non-conflict access technique for merged path metric buffer is proposed. The results show that 41% of total memory bits are saved when integrating three different decoding schemes including CC (802.11a/g/n), LDPC (802.11n and 802.16e) and Turbo (3GPP-LTE). Synthesis result with 65nm process shows that the merged memory blocks consume merely 1.06mm(2) of the chip area.
@inproceedings{diva2:861944,
author = {Wu, Zhenzhi and Liu, Dake},
title = {{Memory Sharing Techniques for Multi-standard High-throughput FEC Decoder}},
booktitle = {2014 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS XIV)},
year = {2014},
pages = {93--98},
publisher = {IEEE},
}
@inproceedings{diva2:792292,
author = {Haque, Muhammad Fahim Ul and Johansson, Ted and Liu, Dake},
title = {{Modified Multilevel PWM Switch Mode Power Amplifier}},
booktitle = {SSoCC14, Vadstena, Sweden, May 12-13, 2014},
year = {2014},
}
In this paper we describe an open source floating-point adder andmultiplier implemented using a 36-bit custom number format based onradix-16 and optimized for the 7-series FPGAs from Xilinx. Althoughthis number format is not identical to the single-precision IEEE-754format, the floating-point operators are designed in such a way thatthe numerical results for a given operation will be identical to theresult from an IEEE-754 compliant operator with support forround-to-nearest even, NaNs and Infs, and subnormalnumbers. The drawback of this number format is that the rounding stepis more involved than in a regular, radix-2 based operator. On theother hand, the use of a high radix means that the area costassociated with normalization and denormalization can be reduced,leading to a net area advantage for the custom number format, underthe assumption that support for subnormal numbers is required.
The area of the floating-point adder in a Kintex-7 FPGA is 261 sliceLUTs and the area of the floating-point multiplier is 235 slice LUTsand 2 DSP48E blocks. The adder can operate at 319 MHz and themultiplier can operate at a frequency of 305 MHz.
@inproceedings{diva2:789250,
author = {Ehliar, Andreas},
title = {{Area Efficient Floating-Point Adder and Multiplier with IEEE-754 Compatible Semantics}},
booktitle = {ICFPT2014: The 2014 International Conference on Field-Programmable Technology},
year = {2014},
}
An oversampled digital-to-analog converter including digital Sigma Delta modulator and semi-digital FIR filter can be employed in the transmitter of the VDSL2 technology. To select the optimum set of coefficients for the semi-digital FIR filter, an integer optimization problem is formulated in this work, where the model includes the FIR filter magnitude metrics as well as Sigma Delta modulator noise transfer function. The semi-digital FIR filter is optimized with respect to magnitude constraints according to the International Telecommunication Union Power Spectral Density mask for VDSL2 technology and minimizing analog cost as the objective function. Utilizing the semi-digital FIR filter with one bit DACs, high linearity required in high-bandwidth profiles of VDSL2, can be achieved. The resolution of the conventional DACs are limited by the mismatch between DAC unit elements. By utilizing one-bit DACs in semi-digital FIR filter, there will be less degradation caused by mismatch between unit elements. The optimization problem is solved in two conditions; fixed passband gain and variable passband gain. It is shown in this paper that 38% saving in total number of unit elements can be achieved by employing variable passband gain in the optimization problem.
@inproceedings{diva2:785064,
author = {Sadeghifar, Mohammad Reza and Wikner, Jacob and Gustafsson, Oscar},
title = {{Linear Programming Design of Semi-Digital FIR Filter and Sigma Delta Modulator for VDSL2 Transmitter}},
booktitle = {2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)},
year = {2014},
pages = {2465--2468},
publisher = {IEEE},
}
Designing decoder for forward error correction (FEC) is more and more challenging because of the requirements on simultaneous supporting of various wireless standards within one IC module. The flexibility, silicon cost and throughput efficiency are all necessary to be traded off. In this paper, by using ASIP methodology, software-hardware co-design is introduced to offer sufficient flexibility of FEC decoding. The decoding procedure can be programmable for decoding QC-LDPC, Turbo and Convolutional Codes. Firstly, the common features from all mentioned algorithms and their corresponding datapaths are analyzed and a unified multi-standard datapath is introduced. Based on it, an application specific instruction-set is proposed and an ASIP (Application Specific Instruction-set Processor) for the FEC algorithms is designed. The firmware FEC codes are developed to adapt to standards. Synthesis results show that the proposed FEC processor is 1.54mm(2) under 65nm CMOS process. It offers QC-LDPC decoding for WiMAX, Turbo decoding for 3GPP-LTE, and 64 states Convolutional code (CC) decoding at the throughput of 193 Mbps, 62 Mbps and 60 Mbps respectively under clock frequency of 200 MHz. The proposed ASIP provides programmable high throughput compared to other tri-mode hardware modules.
@inproceedings{diva2:779289,
author = {Wu, Zhenzhi and Liu, Dake},
title = {{Flexible Multistandard FEC Processor Design With ASIP Methodology}},
booktitle = {PROCEEDINGS OF THE 2014 IEEE 25TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2014)},
year = {2014},
series = {Proceedings IEEE International Conference of Application-Specific Systems Architectures and Processors},
pages = {210--218},
publisher = {IEEE},
}
In this paper, we present LDPC decoder designs based on gear-shift algorithms, which can use multiple decoding algorithms or update rules over the course of decoding a single frame. By first attempting to decode using low-complexity algorithms, followed by high-complexity algorithms, we increase energy efficiency without sacrificing error correction performance. We present the GSP and IGSP algorithms, and ASIC designs of these algorithms for the 10 Gbps Ethernet (2048,1723) LDPC code. In 65nm CMOS, our pipelined GSP decoder achieves a core area of 5.29mm(2), throughput of 88.1 Gbps, and energy efficiency of 39.3 pJ/bit, while our IGSP decoder achieves a core area of 6.00mm(2), throughput of 100.3 Gbps, and energy efficiency of 14.6 pJ/bit. Both algorithms achieve error correction performance equivalent to the offset min-sum algorithm. The throughput per unit area and energy efficiency of these decoders improve upon state-of-the-art decoders with comparable error correction performance.
@inproceedings{diva2:779276,
author = {Cushon, Kevin and Hemati, Saied and Mannor, Shie and Gross, Warren J.},
title = {{Energy-Efficient Gear-Shift LDPC Decoders}},
booktitle = {PROCEEDINGS OF THE 2014 IEEE 25TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2014)},
year = {2014},
series = {Proceedings IEEE International Conference of Application-Specific Systems Architectures and Processors},
pages = {219--223},
publisher = {IEEE},
}
In this paper, we explore two nonrecursive reconstructors which recover the uniform-grid samples from the output of a time-interleaved analog-to-digital converter (TI-ADC) that uses some of the sampling instants for estimating the mismatches in the TI-ADC. Nonuniform sampling occurs due to timing mismatches between the individual channel ADCs and also due to missing input samples. Compared to a previous solution, the reconstructors presented here offer substantially lower computational complexity.
@inproceedings{diva2:713080,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{Two reconstructors for M-channel time-interleaved ADCs with missing samples}},
booktitle = {IEEE 12th International New Circuits and Systems Conference (NEWCAS), 2014},
year = {2014},
pages = {41--44},
publisher = {IEEE conference proceedings},
}
This paper proposes a scheme for the recovery of a uniformly sampled sequence from the output of a time-interleaved analog-to-digital converter (TI-ADC) with static time-skew errors and missing samples. Nonuniform sampling occurs due to timing mismatches between the individual channel ADCs and also due to missing input samples as some of the sampling instants are reserved for estimating the mismatches in the TI-ADC. In addition to using a non-recursive structure, the proposed reconstruction scheme supports online reconfigurability and reduces the computational complexity of the reconstructor as compared to a previous solution.
@inproceedings{diva2:693171,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{A sub-band based reconstructor for \emph{M}-channel time-interleaved ADCs with missing samples}},
booktitle = {2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)},
year = {2014},
series = {International Conference on Acoustics Speech and Signal Processing ICASSP},
publisher = {IEEE conference proceedings},
}
This paper presents new radix-2 and radix-22 constant geometry fast Fourier transform (FFT) algorithms for graphics processing units (GPUs). The algorithms combine the use of constant geometry with special scheduling of operations and distribution among the cores. Performance tests on current GPUs show a significant improvements compared to the most recent version of NVIDIA’s well-known CUFFT, achieving speedups of up to 5.6x.
@inproceedings{diva2:927926,
author = {Ambuluri, Sreehari and Garrido, Mario and Caffarena, Gabriel and Ogniewski, Jens and Ragnemalm, Ingemar},
title = {{New Radix-2 and Radix-2$^{2}$ Constant Geometry Fast Fourier Transform Algorithms For GPUs}},
booktitle = {IADIS Computer Graphics, Visualization, Computer Vision and Image Processing},
year = {2013},
pages = {59--66},
}
A digital-to-RF converter (DRFC) architecture for IQ modulator is proposed in this paper. The digital-RF converter utilizes the mixer DAC concept but a discrete-time oscillatory signal is applied to the digital-RF converter instead of a conventional continuous-time LO. The architecture utilizes a low pass Sigma Delta modulator and a semi-digital FIR filter. The digital Sigma Delta modulator provides a single-bit data stream to a current-mode SDFIR filter in each branch of the IQ modulator. The filter taps are realized as weighted one-bit DACs and the filter response attenuates the out-of-band shaped quantization noise generated by the Sigma Delta modulator. To find the semi-digital FIR filter response, an optimization problem is formulated. The magnitude metrics in out-of-band is set as optimization constraint and the total number of unit elements required for the DAC/mixer is set as the objective function. The proposed architecture and the design technique is described in system level and simulation results are presented to support the feasibility of the solution.
@inproceedings{diva2:741526,
author = {Sadeghifar, Mohammad Reza and Afzal, Nadeem and Wikner, Jacob},
title = {{A Digital-RF Converter Architecture for IQ Modulator with Discrete-Time Low Resolution Quadrature LO}},
booktitle = {2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS)},
year = {2013},
pages = {641--644},
publisher = {IEEE},
}
This paper presents ePUMA, a master-slave heterogeneous DSP processor for communications and multimedia. We introduce the ePUMA VPE, a vector processing slave-core designed for heavy DSP workloads and demonstrate how its features can used to implement DSP kernels that efficiently overlap computing, data access and control to achieve maximum datapath utilization. The efficiency is evaluated by implementing a basic set of kernels commonly used in SDR. The experiments show that all kernels asymptotically reach above 90% effective datapath utilization. while many approach 100%, thus the design effectively overlaps computing, data access and control. Compared to popular VLIW solutions, the need for a large register file with many ports is eliminated, thus saving power and chip area. When compared to a commercial VLIW solution, our solution also achieves code size reductions of up to 30 times and a significantly simplified kernel implementation.
@inproceedings{diva2:737549,
author = {Karlsson, Andr\'{e}as and Sohl, Joar and Wang, Jian and Liu, Dake},
title = {{ePUMA:
A unique memory access based parallel DSP processor for SDR and CR}},
booktitle = {Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE},
year = {2013},
pages = {1234--1237},
publisher = {IEEE},
}
For high performance computation memory access is a major issue. Whether it is a supercomputer, a GPGPU device, or an Application Specific Instruction set Processor (ASIP) for Digital Signal Processing (DSP) parallel execution is a necessity. A high rate of computation puts pressure on the memory access, and it is often non-trivial to maximize the data rate to the execution units. Many algorithms that from a computational point of view can be implemented efficiently on parallel architectures fail to achieve significant speed-ups. The reason is very often that the speed-up possible with the available execution units are poorly utilized due to inefficient data access. This paper shows a method for improving the access time for sequences of data that are completely static at the cost of extra memory. This is done by resolving memory conflicts by using padding. The method can be automatically applied and it is shown to significantly reduce the data access time for sorting and FFTs. The execution time for the FFT is improved with up to a factor of 3.4 and for sorting by a factor of up to 8.
@inproceedings{diva2:720009,
author = {Sohl, Joar and Karlsson, Andr\'{e}as and Liu, Dake},
title = {{Conflict-free data access for multi-bank memory architectures using padding}},
booktitle = {High Performance Computing (HiPC), 2013},
year = {2013},
pages = {425--432},
publisher = {IEEE},
}
This paper introduces a finite-length impulse response (FIR) digital filter having both a variable fractional delay (VFD) and a variable phase shift (VPS). The realization is reconfigurable online without redesign and without transients. It can be viewed as a generalization of the VFD Farrow structure that offers a VPS in addition to the regular VFD. The overall filter is composed of a number of fixed subfilters and a few variable multipliers whose values are determined by the desired FD and PS values. It is designed offline in an iterative manner, utilizing reweighted l(1)-norm minimization. This design procedure generates fixed subfilters with many zero-valued coefficients, typically located in the impulse response tails.
@inproceedings{diva2:716605,
author = {Johansson, Håkan and Eghbali, Amir},
title = {{FIR Filter With Variable Fractional Delay and Phase Shift: Efficient Realization and Design Using Reweighted l(1)-Norm Minimization}},
booktitle = {2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)},
year = {2013},
series = {INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)},
volume = {2013},
pages = {81--84},
publisher = {IEEE},
}
This paper presents a reconfigurable FFT architecture for variable-length and multi-streaming WiMax wireless standard. The architecture processes 1 stream of 2048-point FFT, up to 2 streams of 1024-point FFT or up to 4 streams of 512-point FFT. The architecture consists of a modified radix-2 single delay feedback (SDF) FFT. The sampling frequency of the system is varied in accordance with the FFT length. The latch-free clock gating technique is used to reduce power consumption. The proposed architecture has been synthesized for the Virtex-6 XCVLX760 FPGA. Experimental results show that the architecture achieves the throughput that is required by the WiMax standard and the design has additional features compared to the previous approaches. The design uses 1% of the total available FPGA resources and maximum clock frequency of 313.67 MHz is achieved. Furthermore, this architecture can be expanded to suit other wireless standards.
@inproceedings{diva2:716596,
author = {Boopal, Padma Prasad and Garrido Gálvez, Mario and Gustafsson, Oscar},
title = {{A Reconfigurable FFT Architecture for Variable-Length and Multi-Streaming OFDM Standards}},
booktitle = {IEEE International Symposium on Circuits and Systems (ISCAS), 2013},
year = {2013},
series = {Circuits and Systems (ISCAS)},
pages = {2066--2070},
publisher = {IEEE},
}
Nonuniform sampling occurs in time-interleaved analog-to-digital converters (TI-ADC) due to timing mismatches between the individual channel analog-to-digital converters (ADCs). Such nonuniformly sampled output will degrade the achievable resolution in a TI-ADC. To restore the degraded performance, digital time-varying reconstructors can be used at the output of the TI-ADC, which in principle, converts the nonuniformly sampled output sequence to a uniformly sampled output. As the bandwidth of these reconstructors increases, their complexity also increases rapidly. Also, since the timing errors change occasionally, it is important to have a reconstructor architecture that requires fewer coefficient updates when the value of the timing error changes. Multivariate polynomial impulse response reconstructor is an attractive option for an M-channel reconstructor. If the channel timing error varies within a certain limit, these reconstructors do not need any online redesign of their impulse response coefficients. This paper proposes a technique that can be applied to multivariate polynomial impulse response reconstructors in order to further reduce the number of fixed-coefficient multipliers, and thereby reduce the implementation complexity.
@inproceedings{diva2:716591,
author = {Pillai, Anu Kalidas and Johansson, Håkan},
title = {{Low-complexity two-rate based multivariate impulse response reconstructor for time-skew error correction in m-channel time-interleaved ADCs}},
booktitle = {IEEE International Symposium on Circuits and Systems (ISCAS), 2013},
year = {2013},
series = {IEEE International Symposium on Circuits and Systems},
pages = {2936--2939},
publisher = {IEEE},
}
This paper presents a novel power amplifier (PA) architecture based on the combination of radio frequency pulse width modulation (RFPWM) and multilevel PWM. The architecture provides better dynamic range at high carrier frequency compared to RFPWM. The benefits of this architecture over multilevel PWM are that it only requires a single PA and no combiner. The average efficiency for an 802.11g baseband signal is better than multilevel PWM. Our results also shows that the proposed technique exhibit a constant dynamic range at carrier frequency of 3, 4 and 5 GHz, in contrast to RFPWM which shows a decrease in dynamic range for increase in carrier frequency.
@inproceedings{diva2:684509,
author = {Haque, Muhammad Fahim Ul and Johansson, Ted and Liu, Dake},
title = {{Combined RF and Multilevel PWM Switch Mode Power Amplifier}},
booktitle = {Norchip Conference},
year = {2013},
pages = {1--4},
publisher = {IEEE},
}
Sub-Nyquist sampling makes use of sparsities in analog signals to sample them at a rate lower than the Nyquist rate. The reduction in sampling rate, however, comes at the cost of additional digital signal processing (DSP) which is required to reconstruct the uniformly sampled sequence at the output of the sub-Nyquist sampling analog-to-digital converter. At present, this additional processing is computationally intensive and time consuming and offsets the gains obtained from the reduced sampling rate. This paper focuses on sparse multi-band signals where the user band locations can change from time to time and the reconstructor requires real-time redesign. We propose a technique that can reduce the computational complexity of the reconstructor. At the same time, the proposed scheme simplifies the online reconfigurability of the reconstructor.
@inproceedings{diva2:664487,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{Efficient reconfigurable scheme for the recovery of sub-Nyquist sampled sparse multi-band signals}},
booktitle = {IEEE Global Conference on Signal and Information Processing (GlobalSIP 2013), December 3-5, 2013, Austin, Texas, USA},
year = {2013},
pages = {1294--1297},
publisher = {IEEE conference proceedings},
}
In this work, we present an analytical study of aliasing image spurs problem in digital-RF modulators. The inherent finite image rejection ratio of this types modulators is conceptually discussed. A pulse amplitude modulation (PAM) model of the converter is used in the theoretical discussion. Behavioral level simulation of the digital-RF converter model is included. Finite image rejection is a limiting issue in this architecture, and Digital-IF mixing is used to alleviate the problem which is also reviewed and simulated.
@inproceedings{diva2:664207,
author = {Sadeghifar, Mohammad Reza and Wikner, Jacob},
title = {{Modeling and analysis of aliasing image spurs problem in digital-RF-converter-based IQ modulators}},
booktitle = {ISCAS 2013},
year = {2013},
series = {IEEE International Symposium on Circuits and Systems. Proceedings},
pages = {578--581},
publisher = {IEEE},
}
This paper presents the frequency compensation of high-speed, low-voltage multistage amplifiers. Two frequency compensation techniques, the Nested Miller Compensation with Nulling Resistors (NMCNR) and Reversed Nested Indirect Compensation (RNIC), are discussed and employed on two multistage amplifier architectures. A four-stage pseudo-differential amplifier with CMFF and CMFB is designed in a 1.2 V, 65-nm CMOS process. With NMCNR, it achieves a phase margin (PM) of 59° with a DC gain of 75 dB and unity-gain frequency (fug) of 712 MHz. With RNIC, the same four-stage amplifier achieves a phase margin of 84°, DC gain of 76 dB and fug of 2 GHz. Further, a three-stage single-ended amplifier is designed in a 1.1-V, 40-nm CMOS process. The three-stage OTA with RNIC achieves PM of 81°, DC gain of 80 dB and fug of 770 MHz. The same OTA achieves PM of 59° with NMCNR, while maintaining a DC gain of 75 dB and fug of 262 MHz. Pole-splitting, to achieve increased stability, is illustrated for both compensation schemes. Simulations illustrate that the RNIC scheme achieves much higher PM and fug for lower values of compensation capacitance compared to NMCNR, despite the growing number of low voltage amplifier stages.
@inproceedings{diva2:601005,
author = {Ahmed Aamir, Syed and Harikumar, Prakash and Wikner, Jacob J},
title = {{Frequency compensation of high-speed, low-voltage CMOS multistage amplifiers}},
booktitle = {IEEE International Symposium on Circuits and Systems (ISCAS), 2013},
year = {2013},
series = {International Symposium on Circuits and Systems (ISCAS)},
volume = {2013},
pages = {381--384},
publisher = {IEEE conference proceedings},
}
@inproceedings{diva2:581995,
author = {Gustafsson, Oscar and Ehliar, Andreas},
title = {{Low-complexity general FIR filters based on Winograd's inner product algorithm}},
booktitle = {IEEE International Symposium on Circuits and Systems (ISCAS 2013), 19-23 May 2013, Beijing, China},
year = {2013},
publisher = {IEEE conference proceedings},
}
A hardware efficient arrangement of digital-to-analog conversion blocks is presented by segmenting digital-to-analog converter (DAC). This segmenting of DAC is done by using buss-split design of digital sigma-delta modulator (DSDM). The reduction in the word length of input to both DSDM and DAC is analyzed with respect to performance because the input word length decides the complexity of these components. We show that effective performance can be achieved from the presented hardware efficient arrangement. All conclusions are drawn based on theory and simulations.
@inproceedings{diva2:773513,
author = {Afzal, Nadeem and Wikner, J. Jacob},
title = {{Power efficient arrangement of oversampling sigma-delta DAC}},
booktitle = {NORCHIP, 2012},
year = {2012},
pages = {1--4},
publisher = {IEEE},
}
Single Instruction Multiple Data (SIMD) architecture has been proved to be a suitable parallel processor architecture for media and communication signal processing. But the computing overhead such as memory access latency and vector data permutation limit the performance of conventional SIMD processor. Solutions such as combined VLIW and SIMD architecture are designed with an increased complexity for compiler design and assembly programming. This paper introduces the SIMD processor in the ePUMA1 platform which uses deep execution pipeline and flexible parallel memory to achieve high computing performance. Its deep pipeline can execute combined operations in one cycle. And the parallel memory architecture supports conflict free parallel data access. It solves the problem of large vector permutation in a short vector SIMD machine in a more efficient way than conventional vector permutation instruction. We evaluate the architecture by implementing the soft decision Viterbi algorithm for convolutional decoding. The result is compared with other architectures, including TI C54x, CEVA TeakLike III, and PowerPC AltiVec, to show ePUMA’s computing efficiency advantage.
@inproceedings{diva2:661711,
author = {Wang, Jian and Karlsson, Andr\'{e}as and Sohl, Joar and Liu, Dake},
title = {{Convolutional Decoding on Deep-pipelined SIMD Processor with Flexible Parallel Memory}},
booktitle = {Digital System Design (DSD), 2012},
year = {2012},
pages = {529--532},
publisher = {IEEE},
}
A significant portion of the execution time on current SIMD and VLIW processors is spent on data access rather than instructions that perform actual computations. The ePUMA architecture provides features that allow arbitrary data elements to be accessed in parallel as long as the elements reside in different memory banks. Using permutation to move data elements that are accessed in parallel, the overhead from memory access can be greatly reduced; and, in many cases completely removed. This paper presents a practical method for automatic permutation based on Integer Linear Programming (ILP). No assumptions are made about the structure of the access patterns other than their static nature. Methods for speeding up the solution time for periodic access patterns and reusing existing solutions are also presented. Benchmarks for e.g. FFTs show speedups of up to 3.4 when using permutation compared to regular implementations.
@inproceedings{diva2:661698,
author = {Sohl, Joar and Wang, Jian and Karlsson, Andr\'{e}as and Liu, Dake},
title = {{Automatic Permutation for Arbitrary Static Access Patterns}},
booktitle = {Parallel and Distributed Processing with Applications (ISPA), 2012},
year = {2012},
pages = {215--222},
publisher = {IEEE},
}
This paper introduces a new class of linear-phase Nyquist (Mth-band) FIR interpolators and decimators based on tree structures. Through design examples, it is shown that the proposed converter structures have a substantially lower computational complexity than the conventional single-stage converter structures. The complexity is comparable to that of multi-stage Nyquist converters, although the proposed ones tend to have a somewhat higher complexity. A main advantage of the proposed structures is however that they can be used for arbitrary integer conversion factors, thus including prime numbers which cannot be handled by the regular multi-stage Nyquist converters.
@inproceedings{diva2:642287,
author = {Johansson, Håkan and Eghbali, Amir and Lahti, Jimmie},
title = {{Tree-Structured Linear-Phase Nyquist FIR Filter Interpolators and Decimators}},
booktitle = {2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012)},
year = {2012},
pages = {2329--2332},
publisher = {IEEE},
}
Many contemporary FPGAs have introduced a pre-adder before the hard multipliers, primarily aimed at linear-phase FIR filters. In this work, structural modifications are proposed with the aim of reducing the LUT resource utilization and, finally, using the pre-adder for implementing single path delay feedback pipeline FFTs. The results show that two thirds of the LUT resources can be saved when the pre-adder has bypass functionality, as in the Xilinx 6 and 7 series, compared to a direct mapping.
@inproceedings{diva2:618605,
author = {Ingemarsson, Carl and Källström, Petter and Gustafsson, Oscar},
title = {{Using DSP block pre-adders in pipeline SDF FFT implementations in contemporary FPGAs}},
booktitle = {22nd International Conference on Field Programmable Logic and Applications (FPL)},
year = {2012},
pages = {71--74},
publisher = {IEEE Communications Society},
address = {Piscataway, NJ, USA},
}
This paper introduces a realization of finite-length impulse response (FIR) filters with simultaneously variable bandwidth and fractional delay (FD). The realization makes use of impulse responses which are two-dimensional polynomials in the bandwidth and FD parameters. Unlike previous polynomial-based realizations, it utilizes the fact that a variable FD filter is typically much less complex than a variable-bandwidth filter. By separating the corresponding subfilters in the overall realization, significant savings are thereby achieved. A design example, included in the paper, shows about 65 percent multiplication and addition savings compared to the previous polynomial-based realizations. Moreover, compared to a recently introduced alternative fast filter bank approach, the proposed method offers significantly smaller group delays and group delay errors.
@inproceedings{diva2:600988,
author = {Johansson, Håkan and Eghbali, Amir},
title = {{A realization of FIR filters with simultaneously variable bandwidth and fractional delay}},
booktitle = {Signal Processing Conference (EUSIPCO), 2012},
year = {2012},
pages = {2178--2182},
publisher = {IEEE},
}
The telecommunication industry has been successful in turning the Internet into a mobile service and stimulating the creation of a new set of networked, remote services. In this paper we argue that embracing cloud computing solutions is fundamental for the telecommunication industry to remain competitive. However, there are legal, regulatory, business, market-related and technical challenges that must be considered. In this paper we list such challenges and define a set of privacy, security and trust requirements that must be taken into account before cloud computing solutions can be fully integrated and deployed by telecommunication providers.
@inproceedings{diva2:589579,
author = {Martucci, Leonardo and Zuccato, Albin and Smeets, Ben and Habib, Sheikh M. and Johansson, Thomas and Shahmehri, Nahid},
title = {{Privacy, Security and Trust in Cloud Computing:
The Perspective of the Telecommunication Industry}},
booktitle = {Ubiquitous Intelligence \& Computing and 9th International Conference on Autonomic \& Trusted Computing (UIC/ATC), 2012},
year = {2012},
pages = {627--632},
publisher = {IEEE COMPUTER SOC},
}
Time-interleaved analog-to-digital converters (ADCs) exhibit offset, gain, and time-skew errors due to channel mismatches. The time skews give rise to a nonuniformly sampled signal instead of the desired uniformly sampled signal. This introduces the need for a digital signal reconstructor that takes the "nonuniform samples" and generates the "uniform samples". In the general case, the time skews are frequency dependent, in which case a generalization of nonuniform sampling applies. When the bandwidth of a digital reconstructor approaches the whole Nyquist band, the computational complexity may become prohibitive. This paper introduces a new scheme with reduced complexity. The idea stems from recent multirate-based efficient realizations of linear and time-invariant systems. However, a time-interleaved ADC (without correction) is a time-varying system which means that these multirate-based techniques cannot be used straightforwardly but need to be appropriately analyzed and extended for this context.
@inproceedings{diva2:589546,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{Efficient signal reconstruction scheme for time-interleaved ADCs}},
booktitle = {Proc. IEEE 10th Int. New Circuits and Systems Conf. (NEWCAS)},
year = {2012},
pages = {357--360},
publisher = {IEEE},
}
In this paper we present a study and simulation results of the structure and design of a redundant finite-impulse response (FIR) filter. The filter has been selected as an illustrative example for biologically-inspired circuits, but the structure can be generalized to cover other signal processing systems. In the presented study, we elaborate on signal processing properties of the filter if we apply a redundant architecture were different computing paths can be utilized. An option is to utilize different computing paths as inspired by biological architectures (BIAs). We present typical simulation results for a low-pass filter illustrating the trade-offs and costs associated with this architecture.
@inproceedings{diva2:578514,
author = {Alvbrant, Joakim and Wikner, J Jacob},
title = {{Study and Simulation Example of a Redundant FIR Filter}},
booktitle = {Proceedings 30th Norchip Conference},
year = {2012},
pages = {1--4},
publisher = {IEEE},
}
This paper proposes a method to design low-delay fractional delay (FD) filters, using the Farrow structure. The proposed method employs both linear-phase and nonlinear-phase finite-length impulse response (FIR) subfilters. This is in contrast to conventional methods that utilize only nonlinear-phase FIR subfilters. Two design cases are considered. The first case uses nonlinear-phase FIR filters in all branches of the Farrow structure. The second case uses linear-phase FIR filters in every second branch. These branches have milder restrictions on the approximation error. Therefore, even with a reduced order, for these linear-phase FIR filters, the approximation error is not affected. However, the arithmetic complexity, in terms of the number of distinct multiplications, is reduced by an average of 30%. Design examples illustrate the method.
@inproceedings{diva2:562678,
author = {Eghbali, Amir and Johansson, Håkan},
title = {{Complexity reduction in low-delay Farrow-structure-based variable fractional delay FIR filters utilizing linear-phase subfilters}},
booktitle = {Eur. Conf. Circuit Theory Design},
year = {2012},
publisher = {IEEE conference proceedings},
}
This paper introduces reconfigurable two-stage finite-length impulse response (FIR) Nyquist filters. In both stages, the Farrow structure realizes reconfigurable lowpass linear-phase FIR Nyquist filters. By adjusting the variable multipliers of the Farrow structure, various FIR Nyquist filters and integer interpolation/decimation structures are obtained, online. However, the filter design problem is solved only once, offline. Design examples illustrate the method.
@inproceedings{diva2:562673,
author = {Eghbali, Amir and Johansson, Håkan},
title = {{Reconfigurable two-stage Nyquist filters utilizing the Farrow structure}},
booktitle = {IEEE Int. Symp. Circuits Syst.},
year = {2012},
series = {Circuits and Systems (ISCAS), IEEE},
pages = {3186--3189},
publisher = {IEEE conference proceedings},
}
Rotations by angles that are fractions of the unit circle find applications in e.g. fast Fourier transform (FFT) architectures. In this work we propose a new rotator that consists of a series of stages. Each stage calculates a micro-rotation by an angle corresponding to a power-of-three fractional parts. Using a continuous powers-of-three range, it is possible to carry out all rotations required. In addition, the proposed rotators are compared to previous approaches, based of shift-and-add algorithms, showing improvements in accuracy and number of adders.
@inproceedings{diva2:558647,
author = {Källström, Petter and Garrido Gálvez, Mario and Gustafsson, Oscar},
title = {{Low-Complexity Rotators for the FFT Using Base-3 Signed Stages}},
booktitle = {APCCAS 2012 : 2012 IEEE Asia Pacific Conference on Circuits and Systems},
year = {2012},
pages = {519--522},
publisher = {IEEE},
address = {Piscataway, N.J., USA},
}
Even though time-interleaved analog-to-digital converters (ADCs) help to achieve higher bandwidth with simpler individual ADCs, gain, offset, and time-skew mismatch between the channels degrade the achievable resolution. Of particular interest is the time-skew error between channels which results in nonuniform samples and thereby introducing distortion tones at the output of the time-interleaved ADC. Time-varying digital reconstructors can be used to correct the time-skew errors between the channels in a time-interleaved ADC. However, the complexity of such reconstructors increases as their bandwidth approaches the Nyquist band. In addition to this, the reconstructor needs to be redesigned online every time the time-skew error varies. Design methods that result in minimum reconstructor order require expensive online redesign while those methods that simplify online redesign result in higher reconstructor complexity. This paper proposes a technique that can be used to simplify the online redesign and achieve a low complexity reconstructor at the same time.
@inproceedings{diva2:558654,
author = {Pillai, Anu Kalidas Muralidharan and Johansson, Håkan},
title = {{Time-skew error correction in two-channel time-interleaved ADCs based on a two-rate approach and polynomial impulse responses}},
booktitle = {Proc. IEEE 55th Int. Midwest Symp. Circuits Syst. (MWSCAS)},
year = {2012},
pages = {1136--1139},
}
This paper presents an analog receiver front-end design (AFE) for capacitive body-coupled digital baseband receiver. The most important theoretical aspects of human body electrical model in the perspective of capacitive body-coupled communication (BCC) have also been discussed and the constraints imposed by gain and input-referred noise on the receiver front-end are derived from digital communication theory. Three different AFE topologies have been designed in ST 40-nm CMOS technology node which is selected to enable easy integration in today's system-on-chip environments. Simulation results show that the best AFE topology consisting of a multi-stage AC-coupled preamplifier followed by a Schmitt trigger achieves 57.6 dB gain with an input referred noise PSD of 4.4 nV/√Hz at 6.8 mW.
@inproceedings{diva2:558641,
author = {Harikumar, Prakash and Kazim, Muhammad Irfan and Wikner, Jacob},
title = {{An Analog Receiver Front-End for Capacitive Body-Coupled Communication}},
booktitle = {NORCHIP, 2012},
year = {2012},
pages = {1--4},
publisher = {IEEE},
}
A Design-Build-Test (DBT) project course in electronics is presented. The course was developed during the first years of the CDIO Initiative, and it has been given successfully for almost ten years within two engineering programs at Linköping University. More than 2000 students have passed the course, and it is considered to be one of the most popular and also demanding courses within these programs. The key factors that have contributed to the success of the course are:
- Clearly defined learning outcomes.
- A suitable and well working course organization.
- A systematic method for project management.
- Challenging project tasks of sufficient complexity.
- Laboratory workspaces with modern equipment and high availability.
The aim of the paper is to describe these key factors in more detail based on the experiences that have been gained during the almost ten years the course has been given.
@inproceedings{diva2:543787,
author = {Svensson, Tomas and Gunnarsson, Svante},
title = {{Teaching Project Courses in Large Scale Using Industry Like Methods - Experiences After Ten Years}},
booktitle = {8th International CDIO Conference, Brisbane, AustraliaJuly 1-4},
year = {2012},
}
In this paper we discuss how a typical Block RAM in an FPGA can be extended to enable the implementation of more efficient caches in FPGAs with very minor modifications to the existing Block RAM architectures. In addition, the modifications also allow other components, such as hash tables, to be implemented more efficiently.
@inproceedings{diva2:546551,
author = {Ehliar, Andreas},
title = {{EBRAM - Extending the BlockRAMs in FPGAs to support caches and hash tables inan efficient manner}},
booktitle = {IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, April 29 - May 1 2012, Toronto, ON, Canada},
year = {2012},
pages = {242--242},
publisher = {IEEE Computer Society},
}
This paper presents a 512-point feedforward FFT architecture for wireless personal area network (WPAN). The architecture processes a continuous flow of 8 samples in parallel, leading to a throughput of 2.64 GSamples/s. The FFT is computed in three stages that use radix-8 butterflies. This radix reduces significantly the number of rotators with respect to previous approaches based on radix-2. Besides, the proposed architecture uses the minimum memory that is required for a 512-point 8-parallel FFT. Experimental results show that besides its high throughput, the design is efficient in area and power consumption, improving the results of previous approaches. Specifically, for a wordlength of 16 bits, the proposed design consumes 61.5 mW and its area is 1.43 mm2.
@inproceedings{diva2:927931,
author = {Ahmed, Tanvir and Garrido, Mario and Gustafsson, Oscar},
title = {{A 512-point 8-parallel pipelined feedforward FFT for WPAN}},
booktitle = {2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)},
year = {2011},
pages = {981--984},
publisher = {IEEE},
}
An analysis of frequency control techniques for inverter based ring oscillators is presented. The aim of this study is to aid the circuit designer in architecture selection appropriate for a specific application. A brief discussion on ring oscillators is presented followed by an overview of the various control schemes. The circuits are realized in a 40 nm CMOS technology and simulated using Spectre. Based on simulation results the different control schemes are characterized in terms power consumption, tuning range and noise performance so as to guide the designer about the control scheme selection.
@inproceedings{diva2:780452,
author = {Touqir Pasha, Muhammad and Vesterbacka, Mark},
title = {{Frequency control schemes for single ended ring oscillators}},
booktitle = {20th European Conference on Circuit Theory and Design (ECCTD), 2011, August 29-31, Linköping, Sweden},
year = {2011},
pages = {361--364},
publisher = {IEEE},
}
Predictable computing is common in embedded signal processing, which has communication characteristics of data independent memory access and long streaming data transfer. This paper presents a streaming network-on-chip (NoC) StreamNet for chip multiprocessor (CMP) platform targeting predictable signal processing. The network is based on circuit-switch and uses a two-level arbitration scheme. The first level uses fast hardware arbitration, and the second level is programmable software arbitration. Its communication protocol is designed to support free choice of network topology. Associated with its scheduling tool, the network can achieve high communication efficiency and improve parallel computing performance. This NoC architecture is used to design the Ring network in the ePUMA1 multiprocessor DSP. The evaluation by the multi-user signal processing application at the LTE base-station shows the low parallel computing overhead on the ePUMA multiprocessor platform.
@inproceedings{diva2:661722,
author = {Wang, Jian and Karlsson, Andr\'{e}as and Sohl, Joar and Pettersson, Magnus and Liu, Dake},
title = {{A multi-level arbitration and topology free streaming network for chip multiprocessor}},
booktitle = {ASIC (ASICON), 2011},
year = {2011},
pages = {153--158},
publisher = {IEEE},
}
Computing unto 100GOPS without cooling is essential for high-end embedded systems and much required by markets. A novel master-slave multi-SIMD architecture and its kernel (template) based parallel programming flow is thus introduced as a parallel signal processing platform, ePUMA, embedded Parallel DSP processor with Unique Memory Access. It is an on chip multi-DSP-processor (CMP) targeting to predictable signal processing for communications and multimedia. The essential technologies are to separate the processing of control stream from parallel computing, and to separate parallel data access from parallel arithmetic computing kernels. By separations, the computation and data access can be orthogonal both in hardware and in programs. Orthogonal operations can therefore be executed in parallel and the run time cost of data access can be minimized. Benchmark shows that the computing performance therefore reaches about 80% of the hardware limit. Less than 40% of the hardware limit can be reached by normal processors. The unique SIMD memory subsystem architecture offers programmable conflict free parallel data accesses. Programming flow and tools are also developed to support coding on the unique hardware architecture. A prototype on FPGA shows especially high performance over silicon cost.
@inproceedings{diva2:661720,
author = {Liu, Dake and Karlsson, Andr\'{e}as and Sohl, Joar and Wang, Jian and Petersson, Magnus and Zhou, Wenbiao},
title = {{ePUMA embedded parallel DSP processor with Unique Memory Access}},
booktitle = {Information, Communications and Signal Processing (ICICS), 2011},
year = {2011},
pages = {1--5},
publisher = {IEEE},
}
As more and more computing components are integrated into one digital signal processing (DSP) system to achieve high computing power by executing tasks in parallel, it is soon observed that the inter-processor and processor to memory communication overheads become the performance bottleneck and limit the scalability of a multi-processor platform. For chip multiprocessor (CMP) DSP systems targeting on predictable computing, an appreciation of the communication characteristics is essential to design an efficient interconnection architecture and improve performance. This paper presents a Star network designed for the ePUMA multi-core DSP processor based on analysis of the network communication models. As part of ePUMA’s multi-layer interconnection network, the Star network handles core to off-chip memory communications for kernel computing on slave processors. The network has short setup latency, easy multiprocessor synchronization, rich memory addressing patterns, and power efficient streaming data transfer. The improved network efficiency is evaluated in comparison with a previous study.
@inproceedings{diva2:661719,
author = {Wang, Jian and Sohl, Joar and Karlsson, Andr\'{e}as and Liu, Dake},
title = {{An Efficient Streaming Star Network for Multi-core Parallel DSP Processor}},
booktitle = {Networking and Computing (ICNC), 2011},
year = {2011},
pages = {332--336},
publisher = {IEEE},
}
The complexity of narrow transition band FIR filters is highand can be reduced by using frequency response masking (FRM) techniques. Thesetechniques use a combination of periodic model filters and masking filters. Inthis paper, we show that time-multiplexed FRM filters achieve lowercomplexity, not only in terms of multipliers, but also logic elements compared to time-multiplexed singlestage filters. The reduced complexity also leads to a lower power consumption. Furthermore, we show that theoptimal period of the model filter is dependent on the time-multiplexing factor.
@inproceedings{diva2:616969,
author = {Alam, Syed Asad and Gustafsson, Oscar},
title = {{Implementation of Narrow-Band Frequency-Response Masking for Efficient Narrow Transition Band FIR Filters on FPGAs}},
booktitle = {NORCHIP, 2011},
year = {2011},
pages = {1--4},
publisher = {IEEE conference proceedings},
}
This work is focused on structural approaches to studying diagnosability properties given a system model taking into account, both simultaneously or separately, integral and differential causal interpretations for differential constraints. We develop a model characterization and corresponding algorithms, for studying system diagnosability using a structural decomposition that avoids generating the full set of system ARRs. Simultaneous application of integral and differential causal interpretations for differential constraints results in a mixed causality interpretation for the system. The added power of mixed causality is demonstrated using a case study. Finally, we summarize our work and provide a discussion of the advantages of mixed causality over just derivative or just integral causality. © 2011 IFAC.
@inproceedings{diva2:613891,
author = {Åslund, Jan and Bregon, A. and Krysander, Mattias and Frisk, Erik and Pulido, B. and Biswas, G.},
title = {{Structural diagnosability analysis of dynamic models}},
booktitle = {Proceedings of the 18th IFAC World Congress, 2011},
year = {2011},
series = {IFAC Proceedings Volumes (IFAC-PapersOnline)},
pages = {4082--4088},
publisher = {Elsevier},
address = {Milano, Italy},
}
In this paper, modified, hybrid architectures for digital, oversampled sigma-delta digital-to-analog converters (ΣΔDACs) are explored in terms of signal-to-noise ratio (SNR) and power consumption. Two different architectures are investigated, both have variable configurations of the input and output word-length (i.e., the physical resolution of the DAC). A modified architecture, termed in this work as a composite architecture (CA), shows about 9 dB increase in SNR while maintaining a power-consumption at the same level as that of a so-called hybrid architecture (HA). The power estimation is done for modulators on the RTL level using a standard cell library in a 65-nm technology. The modulators are operated at a sampling frequency of 2 GHz.
@inproceedings{diva2:578668,
author = {Afzal, Nadeem and Sadeghifar, Reza and Wikner, Jacob},
title = {{A study on power consumption of modified noise-shaper architectures for Sigma-Delta DACs}},
booktitle = {Circuit Theory and Design (ECCTD), 2011},
year = {2011},
pages = {274--277},
publisher = {IEEE},
}
This paper introduces a new structure for reconfigurable two-stage finite-length impulse response (FIR) Nyquist filters using the Farrow structure. The Nyquist filter is split into two equal and linear-phase FIR spectral factors. In the first stage, the Farrow structure realizes reconfigurable lowpass linear-phase FIR interpolation/decimation filters whereas the second stage is composed of a fixed lowpass linear-phase FIR filter. By adjusting the variable multipliers of the Farrow structure, the overall filter can be modified. Hence, various FIR Nyquist filters and integer interpolation/decimation structures are obtained. However, the filter design problem is solved only once and offline. Design examples illustrate the method.
@inproceedings{diva2:562680,
author = {Eghbali, Amir and Johansson, Håkan and Saramäki, Tapio},
title = {{A new structure for reconfigurable two-stage Nyquist pulse shaping filters}},
booktitle = {Circuits and Systems (MWSCAS), 2011},
year = {2011},
series = {Midwest Symposium on Circuits and Systems. Conference Proceedings},
pages = {1--4},
publisher = {IEEE},
address = {Piscataway, NJ, United States},
}
Pipelined analog-to-digital converters (ADCs) achieve low to moderate resolutions at high bandwidths while sigma-delta (ΣΔ) ADCs provide high resolution at moderate bandwidths. A switched-capacitor (SC) block which can function as an integrator or an MDAC can be used to implement a reconfigurable ADC (R-ADC) which supports both these types of architectures. Through the use of high level models this work attempts to derive the capacitance and critical opamp parameters such as DC gain and bandwidth of the SC blocks in a reconfigurable ADC. Scaling of capacitance afforded by the noise shaping property of ΣΔ loops as well as the inter-stage gain of pipelined ADCs is used to minimize the total capacitance. This work can be used as reference material to understand some of the design trade-offs in R-ADCs.sigma-delta ADCs
@inproceedings{diva2:558669,
author = {Harikumar, Prakash and Pillai, Anu Kalidas Muralidharan and Wikner, Jacob J},
title = {{A Study on Switched-Capacitor Blocks for Reconfigurable ADCs}},
booktitle = {Electronics, Circuits and Systems (ICECS), 2011},
year = {2011},
pages = {649--652},
}
Commonly used design procedures for design of digital differentiators are based on various optimization techniques and are also iterative in nature. The order estimation, for differentiators is important from design point of view as it can help in reducing the design time by providing a good initial guess of the order to the iterative design procedures. Moreover, order estimation helps in giving a fairly good estimation of the computational complexity in the overall design. This paper presents the linear-phase, finite-length impulse response (FIR) filter order estimation for integral degree differentiators of up to fourth degree. The minimax optimization based technique for the filter design and the curve fitting is used.
@inproceedings{diva2:503675,
author = {Sheikh, Zaka Ullah and Eghbali, Amir and Johansson, Håkan},
title = {{Linear-Phase FIR Digital Differentiator Order Estimation}},
booktitle = {Proceedings of The 20th European Conference on Circuit Theory and Design, ECCTD2011},
year = {2011},
pages = {310--313},
}
In this work a systematic method to generate all possible fast Fourier transform (FFT) algorithms is proposed based on the relation to binary trees. The binary tree is used to represent the decomposition of a discrete Fourier transform (DFT) into sub-DFTs. The radix is adaptively changed according to compute sub-DFTs in proposed decomposition. In this work we determine the number of possible algorithms for 2n-point FFTs with radix-2 butterfly operation and propose a simple method to determine the twiddle factor indices for each algorithm based on the binary tree representation.
@inproceedings{diva2:491952,
author = {Qureshi, Fahad and Gustafsson, Oscar},
title = {{Generation of All Radix-2 Fast Fourier Transform Algorithms Using Binary Trees and Its Analysis}},
booktitle = {Proceedings of ECCTD 2011: 20th EuropeanConference on Circuit Theory and Design (ECCTD)},
year = {2011},
pages = {677--680},
publisher = {IEEE},
}
In this work, we consider the computational complexity of different polynomial evaluation schemes. By considering the number of operations of different types, critical path, pipelining complexity, and latency after pipelining, high-level comparisons are obtained. These can then be used to short list suitable candidates for an implementation given the specifications. Not only multiplications are considered, but they are divided into data-data multiplications, squarers, and data-coefficient multiplications, as the latter can be optimized depending on implementation architecture and application.
@inproceedings{diva2:478816,
author = {Abbas, Muhammad and Gustafsson, Oscar},
title = {{Computational and Implementation Complexity of Polynomial Evaluation Schemes}},
booktitle = {Proceedings of NORCHIP, 2011 Date:14-15 Nov. 2011},
year = {2011},
pages = {1--6},
publisher = {IEEE conference proceedings},
}
Matrix inversion is sensitive towards the number representation used. In this paper simulations of matrix inversion with numbers represented in the fixed-point and logarithmic number systems (LNS) are presented. A software framework has been implemented to allow extensive simulation of finite wordlength matrix inversion. Six different algorithms have been used and results on matrix condition number, wordlength, and to some extent matrix size are presented. The simulations among other things show that the wordlength requirements differ significantly between different algorithms in both fixed-point and LNS representations. The results can be used as a starting point for a matrix inversion hardware implementation.
@inproceedings{diva2:461965,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{Finite wordlength properties of matrix inversion algorithms in fixed-point and logarithmic number system}},
booktitle = {2011 20th European Conference on Circuit Theory and Design (ECCTD)},
year = {2011},
pages = {673--676},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
When generating a sine table to be used in, e.g., frequency synthesis circuits, a widely used way to assign the table content is to simply take a sine wave with the desired amplitude and quantize it using rounding.This results in uncontrolled rounding of up to 0.5 LSB, causing some noise.In this paper we present a method for increasing the signal quality, simply by adjust the amplitude within a ±0.5 range from the intended. This will not affect the maximum value of the sinusoid, but can increase the spurious free dynamic range with some dB.
@inproceedings{diva2:457452,
author = {Källström, Petter and Gustafsson, Oscar},
title = {{Magnitude Scaling for Increased SFDR in DDFS}},
booktitle = {29th Norchip Conference, Lund, Sweden, 14-15 November 2011},
year = {2011},
pages = {1--4},
publisher = {IEEE},
}
This paper discusses the use of partial reconfigurability in Xilinx FPGA designs in order to aid debugging. A debugging framework is proposed where the use of partial reconfigurability can allow for added flexibility by allowing a debugger to decide at run time what debugging module to use. This paper also presents an open source debugging tool which allows a user to read-out the contents of memory blocks in Xilinx designs without needing to use any JTAG adapter. This allows a user to debug an FPGA in situations which would otherwise be difficult, i.e. in the field.
@inproceedings{diva2:458860,
author = {Ehliar, Andreas and Siverskog, Jacob},
title = {{Using Partial Reconfigurability to aid Debugging of FPGA Designs}},
booktitle = {VII Southern Conference on Programmable Logic (SPL)},
year = {2011},
}
Matrix inversion is a key operation in for instance adaptivefilters and MIMO communication system receivers. For ill-conditionedchannel matrices long wordlengths are required for fixed-point implementationof matrix inversion. In this work, the wordlength/error tradeoffsfor matrix inversion using different algorithms with fixed-point andlogarithmic number systems (LNS) are considered. LNS provides higherresolution for small numbers and a larger dynamic range. Also, it willalter the cost of the basic operations in the algorithms. The results showthat also the wordlength required to achieve a comparable error differsignificantly between different algorithms and for most algorithms isreduced for LNS compared to fixed-point.
@inproceedings{diva2:447684,
author = {Ingemarsson, Carl and Gustafsson, Oscar},
title = {{On Using the Logarithmic Number System for Finite Wordlength Matrix Inversion}},
booktitle = {The 54th IEEE International Midwest Symposium on Circuits and Systems},
year = {2011},
series = {Midwest Symposium on Circuits and Systems. Conference Proceedings},
pages = {1--4},
publisher = {IEEE},
address = {Piscataway, NJ, USA},
}
The paper shows the implementation of digital FIR filter using ultra-low power logic components. Source coupled logic is used and operated at sub-threshold region to achieve low power consumption while keeping a satisfactory output swing. The STSCL (sub-threshold source coupled logic) circuit is added with controllable voltage-level feature to minimize overall leakage current flow, including both gate leakage and sub-threshold. Seven-stage ring oscillators are implemented in CMOS, STSCL and our proposed logic at similar supply voltage to observe the differences with power consumption for the proposed technique came at nW range. Later on the FIR was design in both CMOS and proposed with measurement results shown in the paper. All measurements for are shown using 65 nm process technology, at a supply voltage of 0.5 V.
@inproceedings{diva2:442314,
author = {Roy, Sajib and Nipun, Md Murad Kabir. and Wikner, Jacob},
title = {{Ultra-low power FIR filter using STSC-CVL logic}},
booktitle = {2011 IEEE International Conference on Integrated Circuit Design and Technology, ICICDT 2011},
year = {2011},
pages = {1--4},
publisher = {IEEE},
}
The use of rate-compatible error correcting codes offers severaladvantages as compared to the use of fixed-rate codes: a smooth adaptationto the channel conditions, the possibility of incremental Hybrid ARQschemes, as well as simplified code representations in the encoder anddecoder. In this paper, the implementation of a decoder for rate-compatiblequasi-cyclic LDPC codes is considered. The decoder uses check node mergingto increase the convergence speed of the algorithm. Check node mergingallows the decoder to achieve the same performance with a significantlylower number of iterations, thereby increasing the throughput.
The feasibility of a check node merging decoder is investigated for codesfrom IEEE 802.16e and IEEE 802.11n. The faster convergence rate of the checknode merging algorithm allows the decoder to be implemented using lowerparallelization factors, thereby reducing the logic complexity. The designshave been synthesized to an Altera Cyclone II FPGA, and results showsignificant increases in throughput at high SNR.
@inproceedings{diva2:441906,
author = {Blad, Anton and Gustafsson, Oscar},
title = {{FPGA implementation of rate-compatible QC-LDPC code decoder}},
booktitle = {European Conference on Circuit Theory and Design, August 29-31, Linköping, Sweden},
year = {2011},
pages = {777--780},
}
More embedded systems gain increasing multimedia capabilities, including computer graphics. Although this is mainly due to their increasing computational capability, optimizations of algorithms and data structures are important as well, since these systems have to fulfill a variety of constraints and cannot be geared solely towards performance. In this paper, the two most popular texture compression methods (DXT1 and PVRTC) are compared in both image quality and decoding performance aspects. For this, both have been ported to the ePUMA platform which is used as an example of energy consumption optimized embedded systems. Furthermore, a new DXT1 encoder has been developed which reaches higher image quality than existing encoders.
@inproceedings{diva2:437292,
author = {Ogniewski, Jens and Karlsson, Andr\'{e}as and Ragnemalm, Ingemar},
title = {{TEXTURE COMPRESSION IN MEMORY AND PERFORMANCE-CONSTRAINED EMBEDDED SYSTEMS}},
booktitle = {Computer Graphics, Visualization, Computer Vision and Image Processing 2011},
year = {2011},
pages = {19--26},
}
The ePUMA architecture is a novel parallel architecture being developed as a platform for low-power computing, typically for embedded or hand-held devices. As part of the exploration of the platform, we have implemented the Euclidean Distance Transform. We outline the ePUMA architecture and describe how the algorithm was implemented.
@inproceedings{diva2:437278,
author = {Ragnemalm, Ingemar and Karlsson, Andr\'{e}as},
title = {{Computing The Euclidean Distance Transform on the ePUMA Parallel Hardware}},
booktitle = {Computer Graphics, Visualization, Computer Vision and Image Processing 2011},
year = {2011},
pages = {228--232},
}
Frequency-response masking (FRM) is a set of techniques for lowering the computational complexity of narrow transition band FIR filters. These FRM use a combination of sparse periodic filters and non-sparse filters. In this work we consider the implementation of these filters in a time-multiplexed manner on FPGAs. It is shown that the proposed architectures produce lower complexity realizations compared to the vendor provided IP blocks, which do not take the sparseness into consideration. The designs are implemented on a Virtex-6 device utilizing the built-in DSP blocks.
@inproceedings{diva2:402334,
author = {Alam, Syed Asad and Gustafsson, Oscar},
title = {{Implementation of Time-Multiplexed Sparse Periodic FIR Filters for FRM on FPGAs}},
booktitle = {IEEE International Symposium on Circuits and Systems (ISCAS) 2011, 15-18 May, Rio de Janeiro, Brazil},
year = {2011},
pages = {661--664},
publisher = {IEEE},
}
In this work, a design method for narrow-band and wide-band frequency-response masking FIR filters is proposed. As opposed to most previous works, the design method is not based on a periodic model filter. Instead, the masking filter is designed for a given stopband edge. The model filter design is based on optimizing the sparseness of the filter, and, hence, the resulting model filter is not required to be periodic.
@inproceedings{diva2:503640,
author = {Sheikh, Zaka Ullah and Gustafsson, Oscar and Wanhammar, Lars},
title = {{Design of sparse non-periodic narrow-band and wide-band FRM-like FIR filters}},
booktitle = {Proceedings of the International Conference on Green Circuits and Systems (ICGCS), 2010},
year = {2010},
pages = {279--282},
}
In this work a new technique for design of narrow-band and wide-band linear-phase finite-length impulse response (FIR) frequency-response masking based filters is introduced. The technique is based on a sparse FIR filter design method for both the model (bandedge shaping) filter as well as the masking filter using mixed integer linear programming optimization. The proposed technique shows promising results for realization of efficient low arithmetic complexity structures.
@inproceedings{diva2:503639,
author = {Sheikh, Zaka Ullah and Gustafsson, Oscar},
title = {{Design of Narrow-Band and Wide-Band Frequency-Response Masking Filters Using Sparse Non-Periodic Sub-Filters}},
booktitle = {18th European Signal Processing Conference (EUSIPCO-2010), August 23-27, Aalborg, Denmark},
year = {2010},
series = {European Signal Processing Conference (EUSIPCO)},
}
This paper proposes a systematic method to design adjustable fractional delay (FD) filters using the Farrow structure. The Farrow structure has even-order subfilters and the maximum magnitude approximation error determines the number of these subfilters. In the Farrow structure, different powers of the FD value are multiplied by the subfilters. As both the FD value and its powers are smaller than unity, they are considered as weighting functions. The approximation error for each subfilter can then increase in proportion to the power of the FD value. With the proposed design method, the first Farrow subfilter is a pure delay whereas the remaining subfilters are digital differentiators. Examples illustrate the proposed design method and comparison to some earlier designs shows an average reduction of 20% in arithmetic complexity.
@inproceedings{diva2:397036,
author = {Eghbali, Amir and Johansson, Håkan and Saramäki, Tapio and Löwenborg, Per},
title = {{On the design of adjustable fractional delay FIR filters using digital differentiators}},
booktitle = {\emph{Proc. IEEE Int. Conf. Green Circuits Syst.}},
year = {2010},
pages = {289--292},
publisher = {IEEE},
}
This paper introduces reconfigurable nonuniform transmultiplexers (TMUXs) based on uniform modulated filter banks (FBs). Polyphase components, of any user, are processed by a number of synthesis FB and analysis FB branches of a uniform TMUX. One branch, of the TMUX, represents one granularity band and any user occupies integer multiples of a granularity band. By adjusting the number of branches, assigned to each user, a nonuniform TMUX is obtained. This only requires adjustable commutators which add no extra arithmetic complexity. The application of both cosine modulated and modified discrete Fourier transform FBs are considered and the formulations related to the appropriate choice of parameters are outlined. Examples are provided for illustration.
@inproceedings{diva2:397033,
author = {Eghbali, Amir and Johansson, Håkan and Löwenborg, Per},
title = {{Reconfigurable nonuniform transmultiplexers based on uniform filter banks}},
booktitle = {Proc. IEEE Int. Symp. Circuits Syst., Paris, France, May 30-June 2, 2010},
year = {2010},
pages = {2123--2126},
}
The use of rate-compatible error correcting codes offers several advantages as compared to the use of fixed-rate codes: a smooth adaptation to the channel conditions, the possibility of incremental Hybrid ARQ schemes, as well as sharing of the encoder and decoder implementations between the codes of different rates. In this paper, the implementation of a decoder for rate-compatible quasi-cyclic LDPC codes is considered. Assuming the use of a code ensemble obtained through puncturing of a low-rate mother code, the decoder achieves significantly reduced convergence rates by merging the check node neighbours of the punctured variable nodes. The architecture uses the min-sum algorithm with serial node processing elements to efficiently handle the wide spread of node degrees that results from the merging of the check nodes.
@inproceedings{diva2:396049,
author = {Blad, Anton and Gustafsson, Oscar and Zheng, Meng and Fei, Zesong},
title = {{Rate-compatible LDPC code decoder using check-node merging}},
booktitle = {Proceedings of Asilomar Conference on Signals, Systems and Computers},
year = {2010},
pages = {1119--1123},
publisher = {IEEE},
}
An optimization algorithm for the design of puncturing patterns for low-density parity-check codes is proposed. The algorithm is applied to the base matrix of a quasi-cyclic code, and is expanded for each block size used. Thus, storing puncturing patterns specific to each block size is not required. Using the optimization algorithm, the number of 1-step recoverable nodes in the base matrix is maximized. The obtained sequence is then used as a base to obtain longer puncturing sequences by a sequential increase of the allowed recovery delay. The proposed algorithm is compared to one previous greedy algorithm, and shows superior performance for high rates when the heuristics are applied to the base matrix in order to create block size-independent puncturing patterns.
@inproceedings{diva2:396045,
author = {Blad, Anton and Gustafsson, Oscar and Zheng, Meng and Fei, Zesong},
title = {{Integer linear programming based optimization of puncturing sequences for quasi-cyclic low-density parity-check codes}},
booktitle = {Proceedings of International Symposium on Turbo Codes and Iterative Information Processing},
year = {2010},
publisher = {IEEE},
}
@inproceedings{diva2:396039,
author = {Zheng, Meng and Fei, Zesong and Chen, Xiang and Kuang, Jingming and Blad, Anton},
title = {{Power Efficient Partial Repeated Cooperation Scheme with Regular LDPC Code}},
booktitle = {Proceedings of Vehicular Technology Conference, Spring},
year = {2010},
publisher = {IEEE},
}
In this work we consider high-speed FIR filter architectures implemented using, possibly pipelined, carry-save adder trees for accumulating the partial products. In particular we focus on the mapping between partial products and full adders and propose a technique to reduce the number of carry-save adders based on the inherent redundancy of the partial products. The redundancy reduction is performed on the bit-level to also work for short wordlength data such as those obtained from sigma-delta modulators.
@inproceedings{diva2:396026,
author = {Blad, Anton and Gustafsson, Oscar},
title = {{Redundancy reduction for high-speed FIR filter architectures based on carry-save adder trees}},
booktitle = {International Symposium on Circuits and Systems},
year = {2010},
publisher = {IEEE},
}
The core of many DSP tasks is Multiplication ofone data with several constants, i.e. in Digital filtering, image processing DCT and DFT. The Modern Portable equipments like Cellular phones and MP3 players which has DSP circuits,involve large number of multiplications of one variable with several constants (MCM) which leads to large area, delay and energy consumption in hardware. Multiplication operation can be realized using addition/subtraction and shifts without general multipliers. Different number representations are used in MCM algorithms and there are differnet views about different representations. Some of the authors termed the Canonic Signed Digit (CSD) representation as better for subexpression sharing. We have compared the results of CSD and Binary representations using our Generalized MCM Algorithm on Random Matrices and come to conclusion that binary representation is better compared to CSD when a system has multiple inputs and multiple outputs.
@inproceedings{diva2:397319,
author = {Imran, Muhammad and Khursheed, Khursheed and O' Nills, Mattias and Gustafsson, Oscar},
title = {{On the number representation in sub-expression sharing}},
booktitle = {International Conference on Signals and Electronic Systems, ICSES'10 - Conference Proceeding 2010,},
year = {2010},
pages = {17--20},
}
Sub-expression sharing is a technique that can be applied to reduce the complexity of linear time-invariant non-recursive computations by identifying common patterns. It has recently been proposed that it is possible to improve the performance of single and multiple constant multiplication by identifying overlapping digit patterns. In this work we extend the concept of overlapping digit patterns to arbitrary shift dimensions, such as shift in time (FIR filters). © 2010 IEEE.
@inproceedings{diva2:397325,
author = {Gustafsson, Oscar and Khursheed, Khursheed and Imran, Muhammad and Wanhammar, Lars},
title = {{Generalized overlapping digit patterns for multi-dimensional sub-expression sharing}},
booktitle = {1st International Conference on Green Circuits and Systems, ICGCS 2010},
year = {2010},
pages = {65--68},
}
In this paper, a novel parallel DSP platform based on master-multi-SIMD architecture is introduced. The platform is named ePUMA [1]. The essential technology is to use separated data access kernels and algorithm kernels to minimize the communication overhead of parallel processing by running the two types of kernels in parallel. ePUMA platform is optimized for predictable computing. The memory subsystem design that relies on regular and predictable memory accesses can dramatically improve the performance according to benchmarking results. As a scalable parallel platform, the chip area is estimated for different number of co-processors. The aim of ePUMA parallel platform is to achieve low power high performance embedded parallel computing with low silicon cost for communications and similar signal processing applications.
@inproceedings{diva2:343760,
author = {Wang, Jian and Joar, Sohl and Olof, Kraigher and Liu, Dake},
title = {{ePUMA: a novel embedded parallel DSP platform for predictable computing}},
booktitle = {International Conference on Information and Electroincs Engineering},
year = {2010},
publisher = {Institute of Electrical and Electronics Engineers, Inc.},
address = {Chengdu, China},
}
Flexible Application Specific Instruction set Processors (ASIP) are starting to replace monolithic ASICs in a wide variety of fields. However the construction of an ASIP is today associated with a substantial design effort. NoGap (Novel Generator of Micro Architecture and Processor) is a tool for ASIP designs, utilizing hardware multiplexed data paths. One of the main advantages of NoGap compared to other EDA tools for processor design, is that NoGap impose few limits on the architecture and thus design freedom. NoGap does not assume a fixed processor template and is not a data flow synthesizer. To reach this flexibility NoGap makes heavy use of the compositional design principle. This paper describe NoGapCL, a flexible common language for processor hardware description. A RISC processor using NoGapCL has been constructed with NoGap in less than a working day and synthesized to an FPGA. With no FPGA specific optimizations this processor met timing closure at 178MHz in a Virtex-4 LX80 speedgrade 12.
@inproceedings{diva2:337858,
author = {Zhou, Wenbiao and Karlström, Per and Liu, Dake},
title = {{NoGap$^{CL}$:
A flexible common language for processor hardware description}},
booktitle = {The IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems},
year = {2010},
}
ASIP processors and programmable accelerators are replacing monolithic ASICs in more and more areas. However the design and implementation of a new ASIP processor or programmable accelerator requires a substantial design effort. There are a number of existing tools that promise to ease this design effort, but using these tools usually means that the designer get locked into the tools a priori assumtions and it is therefore hard to develop truly novel ASIPs or accelerators. NoGAP is a tool that delivers design support while not locking the designer into any predefined template architecture. An important aspect of NoGAPs design process is the ability to design the data path of each instruction individually. Therefore the size of input/output ports can sometimes not be known while designing the individual functional units. For this reason we have introduced the concept of dynamic port sizes, which is an extension of the parameter/generic concept in Verilog/VHDL. A problem arises if the data path graph contains loops, either due to intra or inter instruction dependencies. This paper will present the algorithm used to solve this looping problem.
@inproceedings{diva2:337857,
author = {Karlström, Per and Zhou, Wenbiao and Liu, Dake},
title = {{Automatic Port and Bus Sizing in NoGAP}},
booktitle = {International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation},
year = {2010},
pages = {258--264},
}
@inproceedings{diva2:337856,
author = {Karlström, Per and Zhou, Wenbiao and Liu, Dake},
title = {{Automatic Assembler Generator for NoGAP}},
booktitle = {Ph.D. Research in Microelectronics and Electronics},
year = {2010},
}
Flexible Application Specific Instruction set Processors (ASIP) are starting to replace monolithic ASICs in a wide variety of fields. However the construction of an ASIP is today associated with a substantial design effort. NoGAP (Novel Generator of Micro Architecture and Processor) is a tool for ASIP designs utilizing hardware multiplexed data paths. One of the main advantages of NoGAP compared to other ADL tools is that it does not impose limits on the architecture and thus design freedom. NoGAP does not assume a fixed processor template and is not another data flow synthesizer. To reach this flexibility NoGAP makes heavy use of the compositional design principle and is therefore divided into three parts Mage, Mase, and Castle. This paper discusses the techniques used in NoGAP for control path synthetization. A RISC processor has been constructed with NoGAP in less than a working day and synthesized to an FPGA. With no FPGA specific optimizations this processor met timing closure at 178MHz in a Virtex-4 LX80 speedgrade 12.
@inproceedings{diva2:337855,
author = {Karlström, Per and Zhou, Wenbiao and Liu, Dake},
title = {{Operation Classification for Control Path Synthetization with NoGAP}},
booktitle = {Seventh International Conference on Information Technology},
year = {2010},
pages = {1195--1200},
}
Conference proceedings
@proceedings{diva2:395781,
title = {{2010 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)}},
year = {2010},
editor = {Ren, Junyan and Gustafsson, Oscar},
publisher = {IEEE},
}
Theses
The market for wireless portable devices has grown signicantly over the recent years.Wireless devices with ever-increased functionality require high rate data transmissionand reduced costs. High data rate is achieved through communication standards such asLTE and WLAN, which generate signals with high peak-to-average-power ratio (PAPR),hence requiring a power amplier (PA) that can handle a large dynamic range signal. Tokeep the costs low, modern CMOS processes allow the integration of the digital, analogand radio functions on to a single chip. However, the design of PAs with large dynamicrange and high eciency is challenging due to the low voltage headroom.
To prolong the battery life, the PAs have to be power-ecient as they consume a sizablepercentage of the total power. For LTE and WLAN, traditional transmitters operatethe PA at back-o power, below their peak efficiency, whereas pulse-width modulation(PWM) transmitters use the PA at their peak power, resulting in a higher efficiency.PWM transmitters can use both linear and SMPAs where the latter are more power efficient and easy to implement in nanometer CMOS. The PWM transmitters have a higher efficiency but suffer from image and aliasing distortion, resulting in a lower dynamic range,amplitude and phase resolution.
This thesis studies several new transmitter architectures to improve the dynamicrange, amplitude and phase resolution of PWM transmitters with relaxed filtering requirements.The architectures are suited for fully integrated CMOS solutions, in particular forportable applications.
The first transmitter (MAF-PWMT) eliminates aliasing and image distortions whileallowing the use of SMPAs by combining RF-PWM and band-limited PWM. The transmittercan be implemented using all-digital techniques and exhibits an improved linearity and spectral performance. The approach is validated using a Class-D PA based transmitter where an improvement of 10.2 dB in the dynamic range compared to a PWM transmitter for a 1.4 MHz of LTE signal is achieved.
The second transmitter (AC-PWMT) compensates for aliasing distortion by combining PWM and outphasing. It can be used with switch-mode PAs (SMPAs) or linear PAs at peak power. The proposed transmitter shows better linearity, improved spectral performanceand increased dynamic range as it does not suffer from AM-AM distortion of the PAs and aliasing distortion due to digital PWM. The idea is validated using push-pull PAs and the proposed transmitter shows an improvement of 9 dB in the dynamic rangeas compared to a PWM transmitter using digital pulse-width modulation for a 1.4 MHzLTE signal.
The third transmitter (MD-PWMT) is an all-digital implementation of the second transmitter. The PWM is implemented using a Field Programmable Gate Array(FPGA) core, and outphasing is implemented as pulse-position modulation using FPGA transceivers, which drive two class-D PAs. The digital implementation offers the exibility to adapt the transmitter for multi-standard and multi-band signals. From the measurement results, an improvement of 5 dB in the dynamic range is observed as compared to an all-digital PWM transmitter for a 1.4 MHz LTE signal.
The fourth transmitter (EP-PWMT) improves the phase linearity of an all-digital PWM transmitter using PWM and asymmetric outphasing. The transmitter uses PWM to encode the amplitude, and outphasing for enhanced phase control thus doubling the phase resolution. The measurement setup uses Class-D PAs to amplify a 1.4 MHz LTEup-link signal. An improvement of 2.8 dB in the adjacent channel leakage ratio is observed whereas the EVM is reduced by 3.3 % as compared to an all-digital PWM transmitter.
The fifth transmitter (CRF-ML-PWMT) combines multilevel and RF-PWM, whereas the sixth transmitter (CRF-MP-PMWT) combines multiphase PWM and RF-PWM. Both transmitters have smaller chip area as compared to the conventional multiphase and multilevel PWM transmitters, as a combiner is not required. The proposed transmitters also show better dynamic range and improved amplitude resolution as compared to conventional RF-PWM transmitters.
The solutions presented in this thesis aims to enhance the performance and simplify the digital implementation of PWM-based RF transmitters.
@phdthesis{diva2:1066473,
author = {Haque, Muhammad Fahim Ul},
title = {{Pulse-Width Modulated RF Transmitters}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1822}},
year = {2017},
address = {Sweden},
}
In the last ten years, limited clock frequency scaling and increasing power density has shifted IC design focus towards parallelism, heterogeneity and energy efficiency. Improving energy efficiency is by no means simple and it calls for a reevaluation of old design choices in processor architecture, and perhaps more importantly, development of new programming methodologies that exploit the features of modern architectures.
This thesis discusses the design of energy-efficient digital signal processors with application-specific instructions sets, so-called ASIP-DSPs, and their programming tools. Target applications for such processors include, but are not limited to, communications, multimedia, image processing, intelligent vision and radar. These applications are often implemented by a limited set of kernel algorithms, whose performance and efficiency are critical to the application's success. At the same time, the extreme non-recurring engineering cost of system-on-chip designs means that product life-time must be kept as long as possible. Neither general-purpose processors nor non-programmable ASICs can meet both the flexibility and efficiency requirements, and ASIPs may instead be the best trade-off between all the conflicting goals.
Traditional superscalar- and VLIW processor design focus has been to improve the throughput of fine-grained instructions, which results in high flexibility, but also high energy consumption. SIMD architectures, on the other hand, are often restricted by inefficient data access. The result is architectures which spend more energy and/or time on supporting operations rather than actual computing.
This thesis defines the performance limit of an architecture with an N-way parallel datapath as consuming 2N elements of compute data per clock cycle. To approach this performance, this work proposes coarse-grained higher-order functional (HOF) instructions, which encode the most frequently executed compute-, data access- and control sequences into single many-cycle instructions, to reduce the overheads of instruction delivery, while at the same time maintaining orthogonality. The work further investigates opportunities for operation fusion to improve computing performance, and proposes a flexible memory subsystem for conflict-free parallel memory access with permutation and lookup-table-based addressing, to ensure that high computing throughput can be sustained even in the presence of irregular data access patterns. These concepts are extensively studied by implementing a large kernel algorithm library with typical DSP kernels, to prove their effectiveness and adequacy. Compared to contemporary VLIW DSP solutions, our solution can practically eliminate instruction fetching energy in many scenarios, significantly reduce control path switching, simplify the implementation of kernels and reduce code size, sometimes by as much as 30 times.
The techniques proposed in this thesis have been implemented in the DSP platform ePUMA (embedded Parallel DSP processor with Unique Memory Access), a configurable control-compute heterogeneous platform with distributed memory, optimized for low-power predictable DSP computing. Hardware evaluation has been done with FPGA prototypes. In addition, several VLSI layouts have been created for energy and area estimations. This includes smaller designs, as well as a large design with 73 cores, capable of 1280 integer GOPS or 256 GFLOPS at 500MHz and which measures 45mm2 in 28nm FD-SOI technology.
In addition to the hardware design, this thesis also discusses parallel programming flow for distributed memory architectures and ePUMA application implementation. A DSP kernel programming language and its compiler is presented. This effectively demonstrates how kernels written in a high-level language can be translated into HOF instructions for very high processing efficiency.
@phdthesis{diva2:954326,
author = {Karlsson, Andr\'{e}as},
title = {{Design of Energy-Efficient High-Performance ASIP-DSP Platforms}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1772}},
year = {2016},
address = {Sweden},
}
FIR filters occupy a central place many signal processing applications which either alter the shape, frequency or the sampling frequency of the signal. FIR filters are used because of their stability and possibility to have linear-phase but require a high filter order to achieve the same magnitude specifications as compared to IIR filters. Depending on the size of the required transition bandwidth the filter order can range from tens to hundreds to even thousands. Since the implementation of the filters in digital domain requires multipliers and adders, high filter orders translate to a large number of these arithmetic units for its implementation. Research towards reducing the complexity of FIR filters has been going on for decades and the techniques used can be roughly divided into two categories; reduction in the number of multipliers and simplification of the multiplier implementation.
One technique to reduce the number of multipliers is to use cascaded sub-filters with lower complexity to achieve the desired specification, known as FRM. One of the sub-filters is a upsampled model filter whose band edges are an integer multiple, termed as the period L, of the target filter's band edges. Other sub-filters may include complement and masking filters which filter different parts of the spectrum to achieve the desired response. From an implementation point-of-view, time-multiplexing is beneficial because generally the allowable maximum clock frequency supported by the current state-of-the-art semiconductor technology does not correspond to the application bound sample rate. A combination of these two techniques plays a significant role towards efficient implementation of FIR filters. Part of the work presented in this dissertation is architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of the periodic model filters.
These time-multiplexed FRM filters not only reduce the number of multipliers but lowers the memory usage. Although the FRM technique requires a higher number delay elements, it results in fewer memories and more energy efficient memory schemes when time-multiplexed. Different memory arrangements and memory access schemes have also been discussed and compared in terms of their efficiency when using both single and dual-port memories. An efficient pipelining scheme has been proposed which reduces the number of pipelining registers while achieving similar clock frequencies. The single optimal point where the number of multiplications is minimum for non-time-multiplexed FRM filters is shown to become a function of both the period, L and time-multiplexing factor, M. This means that the minimum number of multipliers does not always correspond to the minimum number of multiplications which also increases the flexibility of implementation. These filters are shown to achieve power reduction between 23% and 68% for the considered examples.
To simplify the multiplier, alternate number systems like the LNS have been used to implement FIR filters, which reduces the multiplications to additions. FIR filters are realized by directly designing them using ILP in the LNS domain in the minimax sense using finite word length constraints. The branch and bound algorithm, a typical algorithm to implement ILP problems, is implemented based on LNS integers and several branching strategies are proposed and evaluated. The filter coefficients thus obtained are compared with the traditional finite word length coefficients obtained in the linear domain. It is shown that LNS FIR filters provide a better approximation error compared to a standard FIR filter for a given coefficient word length.
FIR filters also offer an opportunity in complexity reduction by implementing the multipliers using Booth or standard high-radix multiplication. Both of these multiplication schemes generate pre-computed multiples of the multiplicand which are then selected based on the encoded bits of the multiplier. In TDF FIR filters, one input data is multiplied with a number of coefficients and complexity can be reduced by sharing the pre-computation of the multiplies of the input data for all multiplications. Part of this work includes a systematic and unified approach to the design of such computation sharing multipliers and a comparison of the two forms of multiplication. It also gives closed form expressions for the cost of different parts of multiplication and gives an overview of various ways to implement the select unit with respect to the design of multiplexers.
Particle filters are used to solve problems that require estimation of a system. Improved resampling schemes for reducing the latency of the resampling stage is proposed which uses a pre-fetch technique to reduce the latency between 50% to 95% dependent on the number of pre-fetches. Generalized division-free architectures and compact memory structures are also proposed that map to different resampling algorithms and also help in reducing the complexity of the multinomial resampling algorithm and reduce the number of memories required by up to 50%.
@phdthesis{diva2:896498,
author = {Alam, Syed Asad},
title = {{Techniques for Efficient Implementation of FIR and Particle Filtering}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1716}},
year = {2016},
address = {Sweden},
}
Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs.
Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow.
More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain?
Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously.
One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed.
This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from.
@phdthesis{diva2:784329,
author = {Sohl, Joar},
title = {{Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1641}},
year = {2015},
address = {Sweden},
}
The integrated circuit has, since it was invented in the late 1950's, undergone a tremendous development and is today found in virtually all electric equipment. The small feature size and low production cost have made it possible to implement electronics in everyday objects ranging from computers and mobile phones to smart prize tags. Integrated circuits are typically used for data communication, signal processing and data storage. Data is usually stored in digital format but signal processing can be performed both in the digital and in the analog domain. For best performance, the right partition of signal processing between the analog and digital domain must be used. This is made possible by data converters converting data between the domains. A device converting an analog signal into a digital representation is called an analog-to-digital converter (ADC) and a device converting digital data into an analog representation is called a digital-to-analog converter (DAC). In this work we present research results on these data converters and the results are compiled in three different categories. The first contribution is an error correction technique for DACs called dynamic element matching, the second contribution is a power efficient time-to-digital converter architecture and the third is a design methodology for frequency synthesis using digital oscillators.
The accuracy of a data converter, i.e., how accurate data is converted, is often limited by manufacturing errors. One type of error is the so-called matching error and in this work we investigate an error correction technique for DACs called dynamic element matching (DEM). If distortion is limiting the performance of a DAC, the DEM technique increases the accuracy of the DAC by transforming the matching error from being signal dependent, which results in distortion, to become signal independent noise. This noise can then be spectrally shaped or filtered out and hereby increasing the overall resolution of the system. The DEM technique is investigated theoretically and the theory is supported by measurement results from an implemented 14-bit DAC using DEM. From the investigation it is concluded that DEM increases the performance of the DAC when matching errors are dominating but has less effect at conversion speeds when dynamic errors dominate.
The next contribution is a new time-to-digital converter (TDC) architecture. A TDC is effectively an ADC converting a time difference into a digital representation. The proposed architecture allows for smaller and more power efficient data conversion than previously reported and the implemented TDC prototype is smaller and more power efficient as compared to previously published TDCs in the same performance segment.
The third contribution is a design methodology for frequency synthesis using digital oscillators. Digital oscillators generate a sinusoidal output using recursive algorithms. We show that the performance of digital oscillators, in terms of amplitude and frequency stability, to a large extent depends on the start conditions of the oscillators. Further we show that by selecting the proper start condition an oscillator can be forced to repeat the same output sequence over and over again, hence we have a locked oscillator. If the oscillator is locked there is no drift in amplitude or frequency which are common problems for recursive oscillators not using this approach. To find the optimal start conditions a search algorithm has been developed which has been thoroughly tested in simulations. The digital oscillator output is used for test signal generation for a DAC or used to generate tones with high spectral purity using DACs.
@phdthesis{diva2:768594,
author = {Andersson, Niklas},
title = {{Design of Integrated Building Blocks for the Digital/Analog Interface}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1638}},
year = {2015},
address = {Sweden},
}
The market for low cost portable electronics is rapidly growing. Physical activity monitors, portable music players, and smart watches are fast becoming a part of daily life. As the market for wearable devices has grown, a primary concern for IC manufacturers is to provide low cost, low power and lightweight circuit solutions. In a bid to lower the costs and extend battery life there is an increased interest in using low-cost, low-power CMOS processes. As a result fully integrated systems on chips (SOC) have been realized that efficiently perform the required functions. These SOCs house digital, analog and in some cases radio circuits on a single die in a bid to reduce cost and improve productivity.
Phase Locked Loops (PLLs) are a key building block for all SOCs where they are used to generate clock signals for synchronous systems. In monolithic implementations the design cost of a circuit is measured in terms of the silicon area and not the number of devices in the circuit. With the advent of all-digital techniques, there is a renewed interest in the design of compact PLLs as the area occupied by the traditional PLLs is very large due to the presence of large passive components in the loop filter and the oscillator. As a result, various digital circuit design techniques are being explored to design compact all-digital PLLs (ADPLLs) while satisfying the performance requirements for the target applications.
The focus of this work is to explore new techniques for area, power and time efficient design of ADPLL component blocks. The first part of this works focuses on the feasibility of using automatic place and route (P&R) tools to synthesize a time-to-digital converter (TDC). An area efficient TDC is synthesized in a 65 nm CMOS process using automated P&R which exhibits a time resolution of 6.5 ps with an input sampling rate of 100 MS/s while occupying an area of 0.002 mm2. A modified switching scheme is also presented which reduces the power consumption of the thermometer-to-binary encoder by up to 40%.
The second part of this thesis proposes a power supply filter for mitigating the affect of cyclostationary noise on the voltage controlled ring oscillator. The key idea is to raise the impedance in the current supply during the sensitive periods and lower it during insensitive periods of the oscillator operation. To demonstrate the feasibility of the proposed filter, a pseudo differential ring oscillator is designed in a 65 nm CMOS process which exhibits an rms jitter of less than 14 ps at 2.4 GHz in the presence of a 500 mV noise tone in the power supply.
@phdthesis{diva2:780494,
author = {Touqir Pasha, Muhammad},
title = {{Circuit Design for All-Digital Frequency Synthesizers}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Thesis No. 1701}},
year = {2014},
address = {Sweden},
}
A number of state-of-the-art low power consuming digital delta-sigma modulator (ΔΣ) architectures for digital-to-analog converters (DAC) are presented in this thesis. In an oversampling ΔΣ DAC, the primary job of the modulator is to reduce the word length of the digital control signal to the DAC and spectrally shape the resulting quantization noise. Among the ΔΣ topologies, error-feedback modulators (EFM) are well suited for so called digital to digital modulation.
In order to meet the demands, various modifications to the conventional EFM architectures have been proposed. It is observed that if the internal and external digital signals of the EFM are not properly scaled then not only the design itself but also the signal processing blocks placed after it, may be over designed. In order to avoid the possible wastage of resources, a number of scaling criteria are derived. In this regard, the total number of signal levels of the EFM output is expressed in terms of the input scale, the order of modulation and the type of the loop filter.
Further on, it is described that the architectural properties of a unit element-based DAC allow us to move some of the digital processing of the EFM to the analog domain with no additional hardware cost. In order to exploit the architectural properties, digital circuitry of an arbitrary-ordered EFM is split into two parts: one producing the modulated output and another producing the filtered quantization noise. The part producing the modulated output is removed after representing the EFM output with a set of encoded signals. For both the conventional and the proposed EFM architectures, the DAC structure remains unchanged. Thus, savings are obtained since the bits to be converted are not accumulated in the digital domain but instead fed directly to the DAC.
A strategy to reduce the hardware of conventional EFMs has been devised recently that uses multiple cascaded EFM units. We applied the similar approach but used several cascaded modified EFM units. The compatibility issues among the units (since the output of each proposed EFM is represented by the set of encoded signals) are resolved by a number of architectural modifications. The digital processing is distributed among each unit by splitting the primary input bus. It is shown that instead of cascading the EFM units, it is enough to cascade their loop filters only. This leads not only to area reduction but also to the reduction of power consumption and critical path.
All of the designs are subjected to rigorous analysis and are described mathematically. The estimates of area and power consumption are obtained after synthesizing the designs in a 65 nm standard cell library provided by the foundry.
@phdthesis{diva2:773538,
author = {Afzal, Nadeem},
title = {{Complexity and Power Reduction in Digital Delta-Sigma Modulators}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1640}},
year = {2014},
address = {Sweden},
}
In today’s system-on-chip (SoC) implementations, power consumption is a key performance specification. The proliferation of mobile communication devices and distributed wireless sensor networks has necessitated the development of power-efficient analog, radio-frequency (RF), and digital integrated circuits. The rapid scaling of CMOS technology nodes presents opportunities and challenges. Benefits accrue in terms of integration density and higher switching speeds for the digital logic. However, the concomitant reduction in supply voltage and reduced gain of transistors pose obstacles to the design of highperformance analog and mixed-signal circuits such as analog front-ends (AFEs) and data converters.
To achieve high DC gain, multistage amplifiers are becoming necessary in AFEs and analog-to-digital converters (ADCs) implemented in the latest CMOS process nodes. This thesis includes the design of multistage amplifiers in 40 nm and 65 nm CMOS processes. An AFE for capacitive body-coupled communication is presented with transistor schematic level results in 40 nm CMOS. The AFE consists of a cascade of amplifiers to boost the received signal followed by a Schmitt trigger which provides digital signal levels at the output. Low noise and reduced power consumption are the important performance criteria for the AFE. A two-stage, single-ended amplifier incorporating indirect compensation using split-length transistors has been designed. The compensation technique does not require the nulling resistor used in traditional Miller compensation. The AFE consisting of a cascade of three amplifiers achieves 57.6 dB DC gain with an input-referred noise power spectral density (PSD) of 4.4 nV/ while consuming 6.8 mW.
Numerous compensation schemes have been proposed in the literature for multistage amplifiers. Most of these works investigate frequency compensation of amplifiers which drive large capacitive loads and require low unity-gain frequency. In this thesis, the frequency compensation schemes for high-speed, lowvoltage multistage CMOS amplifiers driving small capacitive loads have been investigated. Existing compensation schemes such as the nested Miller compensation with nulling resistor (NMCNR) and reversed nested indirect compensation (RNIC) have been applied to four-stage and three-stage amplifiers designed in 40 nm and 65 nm CMOS, respectively. The performance metrics used for comparing the different frequency compensation schemes are the unity gain frequency, phase margin (PM), and total amount of compensation capacitance used. From transistor schematic simulation results, it is concluded that RNIC is more efficient than NMCNR.
Successive approximation register (SAR) analog-to-digital converters (ADCs) are becoming increasingly popular in a wide range of applications due to their high power efficiency, design simplicity and scaling-friendly architecture. Singlechannel SAR ADCs have reached high resolutions with sampling rates exceeding 50 MS/s. Time-interleaved SAR ADCs have pushed beyond 1 GS/s with medium resolution. The generation and buffering of reference voltages is often not the focus of published works. For high-speed SAR ADCs, due to the sequential nature of the successive approximation algorithm, a high-frequency clock for the SAR logic is needed. As the digital-to-analog converter (DAC) output voltage needs to settle to the desired accuracy within half clock cycle period of the system clock, a speed limitation occurs due to imprecise DAC settling. The situation is exacerbated by parasitic inductance of bondwires and printed circuit board (PCB) traces especially when the reference voltages are supplied off-chip. In this thesis, a power efficient reference voltage buffer with small area has been implemented in 180 nm CMOS for a 10-bit 1 MS/s SAR ADC which is intended to be used in a fingerprint sensor. Since the reference voltage buffer is part of an industrial SoC, critical performance specifications such as fast settling, high power supply rejection ratio (PSRR), and low noise have to be satisfied under mismatch conditions and over the entire range of process, supply voltage and temperature (PVT) corners. A single-ended, current-mirror amplifier with cascodes has been designed to buffer the reference voltage. Performance of the buffer has been verified by exhaustive simulations on the post-layout extracted netlist.
Finally, we describe the design of a 10-bit 50 MS/s SAR ADC in 65 nmCMOS with a high-speed, on-chip reference voltage buffer. In a SAR ADC, the capacitive array DAC is the most area-intensive block. Also a binary-weighted capacitor array has a large spread of capacitor values for moderate and high resolutions which leads to increased power consumption. In this work, a split binary-weighted capacitive array DAC has been used to reduce area and power consumption. The proposed ADC has bootstrapped sampling switches which meet 10-bit linearity over all PVT corners and a two-stage dynamic comparator. The important design parameters of the reference voltage buffer are derived in the context of the SAR ADC. The impact of the buffer on the ADC performance is illustrated by simulations using bondwire parasitics. In post-layout simulation which includes the entire pad frame and associated parasitics, the ADC achieves an ENOB of 9.25 bits at a supply voltage of 1.2 V, typical process corner, and sampling frequency of 50 MS/s for near-Nyquist input. Excluding the reference voltage buffer, the ADC achieves an energy efficiency of 25 fJ/conversion-step while occupying a core area of 0.055 mm2.
@phdthesis{diva2:762365,
author = {Harikumar, Prakash},
title = {{Building Blocks for Low-Voltage Analog-to-Digital Interfaces}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Thesis No. 1666}},
year = {2014},
address = {Sweden},
}
The physical scaling following Moore’s law is saturated while the requirement on computing keeps growing. The gain from improving silicon technology is only the shrinking of the silicon area, and the speed-power scaling has almost stopped in the last two years. It calls for new parallel computing architectures and new parallel programming methods.
Traditional ASIC (Application Specific Integrated Circuits) hardware has been used for acceleration of Digital Signal Processing (DSP) subsystems on SoC (System-on-Chip). Embedded systems become more complicated, and more functions, more applications, and more features must be integrated in one ASIC chip to follow up the market requirements. At the same time, the product lifetime of a SoC with ASIC has been much reduced because of the dynamic market. The life time of the design for a typical main chip in a mobile phone based on ASIC acceleration is about half a year and the NRE (Non-Recurring Engineering) cost of it can be much more than 50 million US$.
The current situation calls for a new solution than that of ASIC. ASIP (Application Specific Instruction set Processor) offers comparable power consumption and silicon cost to ASICs. Its greatest advantage is the functional flexibility in a predefined application domain. ASIP based SoC enables software upgrading without changing hardware. Thus the product life time can be 5-10 times more than that of ASIC based SoC.
This dissertation will present an ASIP based SoC, a new unified parallel DSP subsystem named ePUMA (embedded Parallel DSP Platform with Unique Memory Access), to target embedded signal processing in communication and multimedia applications. The unified DSP subsystem can further reduce the hardware cost, especially the memory cost, of embedded SoC processors, and most importantly, provide full programmability for a wide range of DSP applications. The ePUMA processor is based on a master-slave heterogeneous multi-core architecture. One master core performs the central control, and multiple Single Instruction Multiple Data (SIMD) coprocessors work in parallel to offer a majority of the computing power.
The focus and the main contribution of this thesis are on the memory subsystem design of ePUMA. The multi-core system uses a distributed memory architecture based on scratchpad memories and software controlled data movement. It is suitable for the data access properties of streaming applications and the kernel based multi-core computing model. The essential techniques include the conflict free access parallel memory architecture, the multi-layer interconnection network, the non-address stream data transfer, the transitioned memory buffers, and the lookup table based parallel memory addressing. The goal of the design is to minimize the hardware cost, simplify the software protocol for inter-processor communication, and increase the arithmetic computing efficiency.
We have so far proved by applications that most DSP algorithms, such as filters, vector/matrix operations, transforms, and arithmetic functions, can achieve computing efficiency over 70% on the ePUMA platform. And the non-address stream network provides equivalent communication bandwidth by less than 30% implementation cost of a crossbar interconnection.
@phdthesis{diva2:711712,
author = {Wang, Jian},
title = {{Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1532}},
year = {2014},
address = {Sweden},
}
Complexity reduction is one of the major issues in today’s digital system designfor many obvious reasons, e.g., reduction in area, reduced power consumption,and high throughput. Similarly, dynamically adaptable digital systems requireflexibility considerations in the design which imply reconfigurable systems, wherethe system is designed in such a way that it needs no hardware modificationsfor changing various system parameters. The thesis focuses on these aspects ofdesign and can be divided into four parts.
The first part deals with complexity reduction for non-frequency selectivesystems, like differentiators and integrators. As the design of digital processingsystems have their own challenges when various systems are translated from theanalog to the digital domain. One such problem is that of high computationalcomplexity when the digital systems are intended to be designed for nearly fullcoverage of the Nyquist band, and thus having one or several narrow don’t-carebands. Such systems can be divided in three categories namely left-band systems,right-band systems and mid-band systems. In this thesis, both single-rate andmulti-rate approaches together with frequency-response masking techniques areused to handle the problem of complexity reduction in non-frequency selectivefilters. Existing frequency response masking techniques are limited in a sensethat they target only frequency selective filters, and therefore are not applicabledirectly for non-frequency selective filters. However, the proposed approachesmake the use of frequency response masking technique feasible for the non-frequency filters as well.
The second part of the thesis addresses another issue of digital system designfrom the reconfigurability perspective, where provision of flexibility in the designof digital systems at the algorithmic level is more beneficial than at any otherlevel of abstraction. A linear programming (minimax) based technique forthe coefficient decimation FIR (finite-length impulse response) filter design isproposed in this part of thesis. The coefficient decimation design method findsuse in communication system designs in the context of dynamic spectrum accessand in channel adaptation for software defined radio, where requirements can bemore appropriately fulfilled by a reconfigurable channelizer filter. The proposedtechnique provides more design margin compared to the existing method whichcan in turn can be traded off for complexity reduction, optimal use of guardbands, more attenuation, etc.
The third part of thesis is related to complexity reduction in frequencyselective filters. In context of frequency selective filters, conventional narrow-band and wide-band frequency response masking filters are focused, where variousoptimization based techniques are proposed for designs having a small number ofnon-zero filter coefficients. The use of mixed integer linear programming (MILP)shows interesting results for low-complexity solutions in terms of sparse andnon-periodic subfilters.
Finally, the fourth part of the thesis deals with order estimation of digitaldifferentiators. Integral degree and fractional degree digital differentiators areused in this thesis work as representative systems for the non-frequency selectivefilters. The thesis contains a minimax criteria based curve-fitting approach fororder estimation of linear-phase FIR digital differentiators of integral degree upto four.
@phdthesis{diva2:495364,
author = {Sheikh, Zaka Ullah},
title = {{Efficient Realizations of Wide-Band and Reconfigurable FIR Systems}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1424}},
year = {2012},
address = {Sweden},
}
The aims of this thesis are to reduce the complexity and increasethe accuracy of rotations carried out inthe fast Fourier transform (FFT) at algorithmic and arithmetic level.In FFT algorithms, rotations appear after every hardware stage, which are alsoreferred to as twiddle factor multiplications.
At algorithmic level, the focus is on the development and analysisof FFT algorithms. With this goal, a new approach based on binary tree decompositionis proposed. It uses the Cooley Tukey algorithm to generate a large number ofFFT algorithms. These FFT algorithms have identical butterfly operations and data flow but differ inthe value of the rotations. Along with this, a technique for computing the indices of the twiddle factors based on the binary tree representation has been proposed. We have analyzed thealgorithms in terms of switching activity, coefficient memory size, number of non-trivial multiplicationsand round-off noise. These parameters have impact on the power consumption, area, and accuracy of the architecture.Furthermore, we have analyzed some specific cases in more detail for subsets of the generated algorithms.
At arithmetic level, the focus is on the hardware implementation of the rotations.These can be implemented using a complex multiplier,the CORDIC algorithm, and constant multiplications. Architectures based on the CORDIC and constant multiplication use shift and add operations, whereas the complex multiplication generally uses four real multiplications and two adders.The sine and cosine coefficients of the rotation angles fora complex multiplier are normally stored in a memory.The implementation of the coefficient memory is analyzed and the best possible approaches are analyzed.Furthermore, a number of twiddle factor multiplication architectures based on constant multiplications is investigated and proposed. In the first approach, the number of twiddle factor coefficients is reduced by trigonometric identities. By considering the addition aware quantization method, the accuracy and adder count of the coefficients are improved. A second architecture based on scaling the rotations such that they no longer have unity gain is proposed. This results in twiddle factor multipliers with even lower complexity and/or higher accuracy compared to the first proposed architecture.
@phdthesis{diva2:490459,
author = {Qureshi, Fahad},
title = {{Optimization of Rotations in FFTs}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1423}},
year = {2012},
address = {Sweden},
}
The main focus in this thesis is on the aspects related to the implementation of integer and non-integer sampling rate conversion (SRC). SRC is used in many communication and signal processing applications where two signals or systems having different sampling rates need to be interconnected. There are two basic approaches to deal with this problem. The first is to convert the signal to analog and then re-sample it at the desired rate. In the second approach, digital signal processing techniques are utilized to compute values of the new samples from the existing ones. The former approach is hardly used since the latter one introduces less noise and distortion. However, the implementation complexity for the second approach varies for different types of conversion factors. In this work, the second approach for SRC is considered and its implementation details are explored. The conversion factor in general can be an integer, a ratio of two integers, or an irrational number. The SRC by an irrational numbers is impractical and is generally stated for the completeness. They are usually approximated by some rational factor.
The performance of decimators and interpolators is mainly determined by the filters, which are there to suppress aliasing effects or removing unwanted images. There are many approaches for the implementation of decimation and interpolation filters, and cascaded integrator comb (CIC) filters are one of them. CIC filters are most commonly used in the case of integer sampling rate conversions and often preferred due to their simplicity, hardware efficiency, and relatively good anti-aliasing (anti-imaging) characteristics for the first (last) stage of a decimation (interpolation). The multiplierless nature, which generally yields to low power consumption, makes CIC filters well suited for performing conversion at higher rate. Since these filters operate at the maximum sampling frequency, therefore, are critical with respect to power consumption. It is therefore necessary to have an accurate and efficient ways and approaches that could be utilized to estimate the power consumption and the important factors that are contributing to it. Switching activity is one such factor. To have a high-level estimate of dynamic power consumption, switching activity equations in CIC filters are derived, which may then be used to have an estimate of the dynamic power consumption. The modeling of leakage power is also included, which is an important parameter to consider since the input sampling rate may differ several orders of magnitude. These power estimates at higher level can then be used as a feed-back while exploring multiple alternatives.
Sampling rate conversion is a typical example where it is required to determine the values between existing samples. The computation of a value between existing samples can alternatively be regarded as delaying the underlying signal by a fractional sampling period. The fractional-delay filters are used in this context to provide a fractional-delay adjustable to any desired value and are therefore suitable for both integer and non-integer factors. The structure that is used in the efficient implementation of a fractional-delay filter is know as Farrow structure or its modifications. The main advantage of the Farrow structure lies in the fact that it consists of fixed finite-impulse response (FIR) filters and there is only one adjustable fractional-delay parameter, used to evaluate a polynomial with the filter outputs as coefficients. This characteristic of the Farrow structure makes it a very attractive structure for the implementation. In the considered fixed-point implementation of the Farrow structure, closed-form expressions for suitable word lengths are derived based on scaling and round-off noise. Since multipliers share major portion of the total power consumption, a matrix-vector multiple constant multiplication approach is proposed to improve the multiplierless implementation of FIR sub-filters.
The implementation of the polynomial part of the Farrow structure is investigated by considering the computational complexity of different polynomial evaluation schemes. By considering the number of operations of different types, critical path, pipelining complexity, and latency after pipelining, high-level comparisons are obtained and used to short list the suitable candidates. Most of these evaluation schemes require the explicit computation of higher order power terms. In the parallel evaluation of powers, redundancy in computations is removed by exploiting any possible sharing at word level and also at bit level. As a part of this, since exponents are additive under multiplication, an ILP formulation for the minimum addition sequence problem is proposed.
@phdthesis{diva2:476337,
author = {Abbas, Muhammad},
title = {{On the Implementation of Integer and Non-Integer Sampling Rate Conversion}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1420}},
year = {2012},
address = {Sweden},
}
Since their rediscovery in 1995, low-density parity-check (LDPC) codes have received wide-spread attention as practical capacity-approaching code candidates. It has been shown that the class of codes can perform arbitrarily close to the channel capacity, and LDPC codes are also used or suggested for a number of important current and future communication standards. However, the problem of implementing an energy-efficient decoder has not yet been solved. Whereas the decoding algorithm is computationally simple, with uncomplicated arithmetic operations and low accuracy requirements, the random structure and irregularity of a theoretically well-defined code does not easily allow efficient VLSI implementations. Thus the LDPC decoding algorithm can be said to be communication-bound rather than computation-bound.
In this thesis, a modification to the sum-product decoding algorithm called earlydecision decoding is suggested. The modification is based on the idea that the values of the bits in a block can be decided individually during decoding. As the sumproduct decoding algorithm is a soft-decision decoder, a reliability can be defined for each bit. When the reliability of a bit is above a certain threshold, the bit can be removed from the rest of the decoding process, and thus the internal communication associated with the bit can be removed in subsequent iterations. However, with the early decision modification, an increased error probability is associated. Thus, bounds on the achievable performance as well as methods to detect graph inconsistencies resulting from erroneous decisions are presented. Also, a hybrid decoder achieving a negligible performance penalty compared to the sum-product decoder is presented. With the hybrid decoder, the internal communication is reduced with up to 40% for a rate-1/2 code with a length of 1152 bits, whereas increasing the rate allows significantly higher gains.
The algorithms have been implemented in a Xilinx Virtex 5 FPGA, and the resulting slice utilization and energy dissipation have been estimated. However, due to increased logic overhead of the early decision decoder, the slice utilization increases from 14.5% to 21.0%, whereas the logic energy dissipation reduction from 499 pJ to 291 pJ per iteration and bit is offset by the clock distribution power, increased from 141 pJ to 191 pJ per iteration and bit. Still, the early decision decoder shows a net 16% estimated decrease of energy dissipation.
@phdthesis{diva2:434603,
author = {Blad, Anton},
title = {{Low Complexity Techniques for Low Density Parity Check Code Decoders and Parallel Sigma-Delta ADC Structures}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1385}},
year = {2011},
address = {Sweden},
}
While general purpose processors reach both high performance and high application flexibility, this comes at a high cost in terms of silicon area and power consumption. In systems where high application flexibility is not required, it is possible to trade off flexibility for lower cost by tailoring the processor to the application to create an Application Specific Instruction set Processor (ASIP) with high performance yet low silicon cost.
This thesis demonstrates how ASIPs with application specific data types can provide efficient solutions with lower cost. Two examples are presented, an audio decoder ASIP for audio and music processing and a matrix manipulation ASIP for MIMO radio baseband signal processing.
The audio decoder ASIP uses a 16-bit floating point data type to reduce the size of the data memory to about 60% of other solutions that use a 32-bit data type. Since the data memory occupies a major part of the silicon area, this has a significant impact on the total silicon area, and thereby also the static and dynamic power consumption. The data width reduction can be done without any noticeable artifacts in the decoded audio due to the natural masking effect ofthe human ear.
The matrix manipulation SIMD ASIP is designed to perform various matrix operations such as matrix inversion and QR decomposition of small complex-valued matrices. This type of processing is found in MIMO radio baseband signal processing and the matrices are typically not larger than 4x4. There have been solutions published that use arrays of fixed-function processing elements to perform these operations, but the proposed ASIP performs the computations in less time and with lower hardware cost.
The matrix manipulation ASIP data path uses a floating point data type to avoid data scaling issues associated with fixed point computations, especially those related to division and reciprocal calculations, and it also simplifies the program control flow since no special cases for certain inputs are needed which is especially important for SIMD architectures.
These two applications were chosen to show how ASIPs can be a suitable alternative and match the requirements for different types of applications, to provide enough flexibility and performance to support different standards and algorithms with low hardware cost.
@phdthesis{diva2:395174,
author = {Eilert, Johan},
title = {{ASIP for Wireless Communication and Media}},
school = {Linköping University},
type = {{Linköping Studies in Science and Technology. Dissertations No. 1298}},
year = {2010},
address = {Sweden},
}
Student theses
High performance computing is a topic that has risen to the top in the era ofdigitalization, AI and automation. Therefore, the search for more cost and timeeffective ways to implement HPC work is always a subject extensively researched.One part of this is to have hardware that is capable to improve on these criteria. Different hardware usually have different code languages to implement theseworks though, cross-platform solution like Intel’s oneAPI framework is startingto gaining popularity.In this thesis, the capabilities of Intel’s oneAPI framework to implement andexecute HPC benchmarks on different hardware platforms will be discussed. Using the hardware available through Intel’s DevCloud services, Intel’s Xeon Gold6128, Intel’s UHD Graphics P630 and the Arria10 FPGA board were all chosento use for implementation. The benchmarks that were chosen to be used wereGEMM (General Matrix Multiplication) and BUDE (Bristol University DockingEngine). They were implemented using DPC++ (Data Parallel C++), Intel’s ownSYCL-based C++ extension. The benchmarks were also tried to be improved uponwith HPC speed-up methods like loop unrolling and some hardware manipulation.The performance for CPU and GPU were recorded and compared, as the FPGAimplementation could not be preformed because of technical difficulties. Theresults are good comparison to related work, but did not improve much uponthem. This because the hardware used is quite weak compared to industry standard. Though further research on the topic would be interesting, to compare aworking FPGA implementation to the other results and results from other studies. This implementation also probably has the biggest improvement potential,so to see how good one could make it would be interesting. Also, testing someother more complex benchmarks could be interesting.
@mastersthesis{diva2:1810974,
author = {Frick-Lundgren, Martin},
title = {{Evaluation of FPGA-based High Performance Computing Platforms}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/554--SE}},
year = {2023},
address = {Sweden},
}
Applying machine learning to various applications has gained significant momentum in recent years. However, the increasing complexity of networks introduces challenges such as a larger memory footprint and decreased throughput. This thesis aims to address these challenges by exploring the use of 8-bit floating-point numbers for machine learning. The numerical accuracy was evaluated empirically by implementing software models of the arithmetic and running experiments on a neural network provided by MediaTek. While the initial findings revealed poor accuracy when performing computations solely with 8-bit floating-point arithmetic, a significant improvement could be achieved by using a higher-precision accumulator register.
The hardware cost was evaluated using a synthesis tool by measuring the increase in silicon area and impact on clock frequency after four new vector instructions had been implemented. A large increase in area was measured for the functional blocks, but the hardware cost for interconnect and instruction decoding were negligible. A slight decrease in system clock frequency was observed, although marginally. Ideas that likely could improve the accuracy of inference calculations and decrease the hardware cost are proposed in the section for future work.
@mastersthesis{diva2:1800089,
author = {Lindberg, Theodor},
title = {{Investigation of 8-bit Floating-Point Formats for Machine Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5606--SE}},
year = {2023},
address = {Sweden},
}
This thesis describes the design of a digital-to-analog converter using a DeltaSigma modulator. The converter’s purpose is to be used for Hi-Fi audio.
The thesis is limited to the digital domain of the converter, which is the digital interpolation filter and digital Delta-Sigma modulator. A design for implementing the digital filter and modulator is presented together with the cost of the implementation. The process for evaluating the performance is described.
The design presented is satisfactory to be used for Hi-Fi audio, achieving the SNR performance of 146 dB, and an implementation of the design seems feasible on an FPGA.
@mastersthesis{diva2:1745070,
author = {Malmström, Gustav},
title = {{Designing and Evaluating a Delta-Sigma DAC for Hi-Fi Audio:
To be used in an FPGA implementation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5545--SE}},
year = {2023},
address = {Sweden},
}
Replacing a data acquisition system (DAQ) is a substantial and long-term investment. Siemens Energy’s R&D department is considering a new DAQ system. The equipment should be relevant for at least ten years. The system performance needs to reflect today’s requirements but also consider those of the future. Together with measurement engineers at the company, a list of specifications is written, highlighting the DAQ system requirements. Also, two parallel use cases are defined.
The general architecture of a data acquisition system is analyzed. Different implementation techniques are compared when navigating through the DAQ system design process. The comparison forms a relation between the architectural constraints and the requirements. Several tradeoffs in DAQ system design have been found and discussed. Furthermore, the review of market trends indicates the technical direction of modern DAQ systems Thus, questioning the current instrument/measurement strategy.
@mastersthesis{diva2:1696623,
author = {Deurell, Jonas},
title = {{Architectural Constraints in Data Acquisition Design}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5445--SE}},
year = {2022},
address = {Sweden},
}
Manually designing hardware for fpga implementations is time consuming. Onepossible way to accelerate the development of hardware is to use high level syn-thesis (hls) tools. Such tools synthesizes a high level model written in a languagesuch as c++ into hardware. This thesis investigates hls and the efficacy of using hls in the hardware design flow.
A 3780-point fast Fourier transform optimized for area is used to compare Vitis hls with a manual hardware implementation. Different ways of writing the highlevel model used in hls and their impacts in the synthesized hardware together with other optimizations is investigated.
This thesis concludes that the results from the hls implementation are not comparable with the manual implementation, they are significantly worse. Further, high level code written from a non-hardware point of view needs to be rewritten from a hardware point of view to provide good results. The use of high level synthesis is not best used by designers from an algorithm or software background, but rather another tool for hardware designers. High level synthesis can be used as an initial design tool, allowing for quick exploration of different designs andarchitectures.
@mastersthesis{diva2:1704848,
author = {Hejdström, Christoffer},
title = {{Using HLS for Acceleration of FPGA Development: A 3780-Point FFT Case Study}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5438--SE}},
year = {2022},
address = {Sweden},
}
In this work, two MPPT algorithms have been implemented and evaluated in an existing solar charger from the company Solar Bora. This with the aim of evaluating which algorithm performs best in terms of power production and stability in the system in question. Another goal in this work was to develop the solar cell charger with support for several input channels to be able to connect several solar cell panels. This with the aim of giving more redundancy to the system and further increasing power production. The challenge that arose regarding multiple channels was about how the channels would be balanced among themselves in a stable way. As a minor part of this work, it was also included to evaluate whether a minor hardware optimization was possible. The purpose of this was also to increase power production. The result of this work regarding how the MPPT algorithms perform is in line with the theory presented in the theory part.
@mastersthesis{diva2:1688980,
author = {Selld\'{e}n, Oscar and Andersson, Markus},
title = {{Implementering och jämförelse mellan två MPPT-algoritmer}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--22/0508--SE}},
year = {2022},
address = {Sweden},
}
Application Specific Integrated Circuits (ASICs) have been playing a vital role in the semiconductor industry due to advantages like compact design, better power consumption, and better performance, among other things in comparison to general application integrated circuits (ICs). Unlike software applications, which can be tested post-deployment, thorough testing and verification are of utmost importance during the design of an ASIC, which is one of the reasons why a major portion of time is spent on ASIC verification. Having an optimized and reusable verification flow significantly reduces the time-to-market in a competitive industry like IC manufacturing. UVM-based verification is widely adopted for Intellectual Property (IP) verification due to its ability to reuse verification environments for multiple IPs. This thesis focuses on optimizing an existing verification flow for an ASIC IP by using a decimation filter as a case study. The filter is used as a reference model to test the Device Under Test (DUT) model by integrating it into a UVM testbench. The flow developed in the thesis is aimed at providing an efficient, reusable, and good time-to-market verification environment for ASICs.
@mastersthesis{diva2:1669188,
author = {Muniswamy, Harsha Vardhan and Abraham, Stefin},
title = {{Optimization of ASIC Verification Flow:
A Decimation Filter Case Study}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5449--SE}},
year = {2022},
address = {Sweden},
}
Processor performance has increased far faster than memories have been able to keep up with, forcing processor designers to use caches in order to bridge the speed difference. This can increase performance significantly for programs that utilize the caches efficiently but results in significant performance penalties when data is not in cache. One way to mitigate this problem is to to make sure that data is cached before it is needed using memory prefetching.
This thesis focuses on different ways to perform prefetching in systems with strict area and energy requirements by evaluating a number of prefetch techniques based on performance in two programs as well as metrics such as coverage and accuracy. Both data and instruction prefetching are investigated. The studied techniques include a number of versions of next line prefetching, prefetching based on stride identification and history as well as post-increment based prefetching.
While the best increase in program performance is achieved using next 2 lines prefetching it comes at a significant energy cost as well as drastically increased memory traffic making it unsuitable for use in energy-constrained applications. RPT-based prefetching on the other hand gives a good balance between performance and cost managing to improve performance by 4% and 7% for two programs while keeping the impact on both area and energy minimal.
@mastersthesis{diva2:1668689,
author = {Nyholm, Gustav},
title = {{Evaluation of Memory Prefetching Techniques for Modem Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5479--SE}},
year = {2022},
address = {Sweden},
}
In recent years, neural networks have seen increased interest from both the cognitive computing and computation neuroscience fields. Neuromorphic computing systems simulate neural network efficiently, but have not yet reached the amount of neurons that a mammal has. Increasing this quantity is an aspiration, but more neurons will also increase the traffic load of the system. The placement of the neurons onto the neuromorphic computing system has a significant effect on the network load. This thesis introduces algorithms for placing a large amount of neurons in an efficient and agile way. First, an analysis of placement algorithms for very large scale integration design is done, displaying that computing complexity of these algorithms is high. When using the predefined underlying structure of the neural network, more rapid algorithms can be used. The results show that the population placement algorithm has high computing speed as well as providing exceptional result.
@mastersthesis{diva2:1640094,
author = {Pettersson, Fredrik},
title = {{Place and Route Algorithms for a Neuromorphic Communication Network Simulator}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5431--SE}},
year = {2021},
address = {Sweden},
}
This thesis explores the use of an ASIP for handling O-RAN control data. A model application was constructed, optimized and profiled on a simple RV32-IMC core. The compiled code was analyzed, and the instructions “byte swap”, “pack”, bitwise extract/deposit” and “bit field place” were implemented. Synthesis of the core, and profiling of the model application, was done with and without each added instruction. Byte swap had the largest impact on performance (14% improvement per section, and 100% per section extension), followed by bitwise extract/deposit (10% improvement per section but no impact on section extensions). Pack and bit field place had no impact on performance. All instructions had negligible impact on core size, except for bitwise extract/deposit, which increased size by 16%. Further studies, with respect to both overall architecture and further evaluation of instructions to implement, would be necessary to design an ideal ASIP for the application.
@mastersthesis{diva2:1636414,
author = {Södergren, Oskar},
title = {{RISC-V Based Application-Specific Instruction Set Processor for Packet Processing in Mobile Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5439--SE}},
year = {2021},
address = {Sweden},
}
This thesis investigates the possibility of porting a neural network model trained and modeled in TensorFlow to a low-power AI inference accelerator for IoT edge computing. A slightly modified LeNet-5 neural network model is presented and implemented such that an input frequency of 10 frames per second is possible while consuming 4mW of power. The system is simulated in software and synthesized using the FreePDK45 technology library. The simulation result shows no loss of accuracy, but the synthesis results do not show the same positive results for the area and power. The default version of the accelerator uses single-precision floating-point format, float32, while a modified accelerator using the bfloat16 number representation shows significant improvements in area and power with almost no additional loss of accuracy.
@mastersthesis{diva2:1617643,
author = {Hansson, Olle},
title = {{A Low Power AI Inference Accelerator for IoT Edge Computing}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5397--SE}},
year = {2021},
address = {Sweden},
}
Parallelized implementations of FIR-filters are often used to meet throughput and power requirements. The most common methods to optimize coefficient multiplication in FIR-filters are developed for single rate filters, thus the added redundancy of parallel implementations cannot be utilized in the optimization. In this work optimization methods utilizing the redundancy of parallel filter implementations are evaluated for a set of low-pass and interpolation filters. Results show that the proposed methods offer parallelization with less than linear increases in hardware for several evaluated filters with up to 47% reduction in adder count compared to conventional methods. Furthermore, an optimization algorithm for retiming of algorithmic delays is evaluated both with and without pipelining. Synthesis results show that the retiming algorithm can reduce the power consumption with up to 48% without added latency for high throughput applications.
@mastersthesis{diva2:1612501,
author = {Månsson, Jens},
title = {{Adder Minimization and Retiming in Parallel FIR-Filters:
Targeting Power Consumption in ASICs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5411--SE}},
year = {2021},
address = {Sweden},
}
@mastersthesis{diva2:1604624,
author = {Bangalore Kumara Swamy, Vishal},
title = {{FPGA-Implementation of NNLS-Based mMTC User Detector for Pilot-Hopping Sequences}},
school = {Linköping University},
type = {{}},
year = {2021},
address = {Sweden},
}
This thesis investigates building a network-on-chip for a multi-core chip computing convolutional neural networks (CNNs) using Imsys processors in a tree architecture. The division of work on a multi-core chip is investigated. Key patterns of communication are identified and three designs allowing for increasingly more advanced communication patterns are implemented in VHDL. Each design is evaluated on throughput, latency and design size by running tests on the communication patterns in simulation. A relation between design size and throughput is shown, though the throughput decreases for different communication patterns when resorting to networks with lower design size. Depending on what layers are present in a CNN of interest, a network can be chosen with as small design size as possible while still achieving desired results. Aspects such as implementation and usage difficulties and energy consumption are discussed in the thesis as well; however, only on a theoretical level.
@mastersthesis{diva2:1599480,
author = {Evaldsson, Mattias},
title = {{NoC for Versatile Micro-Code Programmable Multi-Core Processor Targeting Convolutional Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5347--SE}},
year = {2021},
address = {Sweden},
}
Powerful System-on-Chip (SoC) produced today has an increasing complexity, featuringmore processors and integrated specialized hardware. This is the case with the EricssonMany-Core Architecture (EMCA) that runs the complex radio modulation standardswithin 3G, 4G and 5G. Such complicated systems require trace data to debug and verifyits behavior. Massive amounts of hardware and software traces can be produced in ashort time. Data compression is a technique to reduce the amount of memory spacerequired by reducing redundancy in the information. Compression of trace data leads toincreased throughput out of the SoC and less space required to store the data. However,it doesn’t come for free since the algorithms used for compression are computationaldemanding. This results in trade-offs between compression factor, consumed clock cyclesand occupied memory space.
This master thesis investigates the possibility to compress the trace data produced inreal-time by the EMCA with a software implementation. The EMCA real-time tracearchitecture and its memory layers limit the possible software solutions. By a thoroughinvestigation of suitable compression algorithms and MATLAB experiments, the LZSSalgorithm is the given choice for the EMCA. Three different variants of the LZSS algorithmwere implemented resulting in a trade-off curve between compression factor and clockcycles. The average result of software and hardware trace data compressed ranging from1.7 to 2.4 compression factor, which is good for a lightweight SoC solution. Though, thepure software compression were quite slow as the algorithm consumed 34 to 371 clockcycles per byte encoded, to achieve the respective compression factor. The results showeda highly diminishing return in compression factor when investing more clock cycles.
@mastersthesis{diva2:1599019,
author = {Höglund, Simon},
title = {{Lightweight Real-Time Lossless Software Compression of Trace Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5435--SE}},
year = {2021},
address = {Sweden},
}
This work investigates the possibility to accelerate a procedure in 4G/LTE systems, known as control channel analysis. The aim is to perform the procedure in real-time on cheap and accessible hardware.An LTE decoder implemented in software is modified to perform the procedure.The modified software is analyzed and profiled. The most time-consuming decoding steps are identified and implemented in hardware description language.The results show an acceleration of the most time-consuming steps of almost 50 times faster compared to implementation in software only. Furthermore, the resource utilization of the hardware design scales linearly with respect to faster decode time, if necessary the acceleration can be increased. However, the results from the profiling and time measurements of the software show that the time requirement is violated by other decoding steps.The thesis concludes that an acceleration in hardware of the most time-consuming steps is possible. However, to satisfy the time requirement further decode steps are required to be accelerated and/or a faster processor can be used.
@mastersthesis{diva2:1593353,
author = {Thelin, William},
title = {{FPGA-Based Acceleration of LTE Protocol Decoding}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5416--SE}},
year = {2021},
address = {Sweden},
}
By adapting Mitchell's algorithm for floating-point numbers, one can efficiently perform arithmetic floating-point operations in an approximate logarithmic domain in order to perform approximate computations of functions such as multiplication, division, square root and others. This work examines how this algorithm can be improved in terms of accuracy and hardware complexity by applying a set of various methods that are parametrized and offer a large design space. Optimal coefficients for a large portion of this space is determined and used to synthesize circuits for both ASIC and FPGA circuits using the bfloat16 format\@. Optimal configurations are then extracted to create an optimal curve where one can select an acceptable error range and obtain a circuit with a minimal hardware cost.
@mastersthesis{diva2:1590166,
author = {Hellman, Noah},
title = {{Mitchell-Based Approximate Operations on Floating-Point Numbers}},
school = {Linköping University},
type = {{LiTH-ISY-EX-21/5413-SE}},
year = {2021},
address = {Sweden},
}
The Swedish Defence Materiel Administration provides tests and evaluations of military Aircrafts and their systems as well as provide services in connection with military exercises. Testing aircraft against a radar antenna and training crews with this radar is part of that offering. The radar is deployed in a container rig and controlled by a computer running Windows 2000. The current option to control this computer is a mouse and keyboard. In this thesis, a system will be designed that is able to improve the ease of use of this rig while minimizing any need to modify the radar rig’s already established hardware and software. The resulting system designed used a commercially available joystick and off the shelf single board micro-controllers in combination with a graphical user interface to supply the radar rig with a converted input from the joystick in the form of mouse and keyboard commands, simplifying the end-user experience.
@mastersthesis{diva2:1547887,
author = {Gustafsson, Christopher},
title = {{Joystick Radar Control:
Implementing joystick control of a radar rig using single board micro-controllers by emulating generic mouse and keyboard commands}},
school = {Linköping University},
type = {{}},
year = {2021},
address = {Sweden},
}
Standards exist to unify requirements and to make it possible to make sure that equipment is tested in the same way, even when several different test labs perform the test. But as new technology comes to market, and old technology evolves, so must the standards. The International Organization for Standardization are continuously developing new standards and updating existing standards, and sometimes the specified tests changes, rendering old test equipment obsolete.
In this thesis, we will look at the differences between the old and the current versions of the ISO 7637 standards as well as how we can verify if older test equipment lives up to the new requirements. A verification method will be designed, partly implemented and evaluated. Several of the aspects for automating the verification will be considered. The results will show that older equipment most likely will be usable with the newer version of the standard, as well as point out some of the difficulties of verifying that this is the case.
@mastersthesis{diva2:1544467,
author = {Gezelius, Jonatan},
title = {{Reuse and Verification of Test Equipment for ISO 7637}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--21/0501--SE}},
year = {2021},
address = {Sweden},
}
This thesis investigates the use of field-programmable gate arrays (FPGAs) to implement a time-to-digital converter (TDC) with on-chip calibration and temperature correction.Using carry-chains on the Xilinx Kintex UltraScale architecture to create a tapped delay line (TDL) has previously been proven to give good time resolution.This project improves the resolution further by using a bit-counter to handle bubbles in the TDL without removing any taps.The bit counter also adds the possibility of using a wave-union approach previously dismissed as unusable on this architecture.The final implementation achieves an RMS resolution of 1.8 ps.
@mastersthesis{diva2:1498797,
author = {Sven, Engström},
title = {{A 1.8 ps Time-to-Digital Converter (TDC) Implemented in a 20 nm Field-Programmable Gate Array (FPGA) Using a Ones-Counter Encoding Scheme with Embedded Bin-Width Calibrations and Temperature Correction}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5343--SE}},
year = {2020},
address = {Sweden},
}
In this work, a coordinate wise descent algorithm is implemented which serves the purpose of estimating active users in a base station/client wireless communication setup. The implemented algorithm utilizes the sporadic nature of users, which is believed to be the norm with 5G Massive MIMO and Internet of Things, meaning that only a subset of all users are active simultaneously at any given time. This work attempts to estimate the viability of a direct algorithm implementation to test if the performance requirements can be satisfied or if a more sophisticated implementation, such as a parallelized version, needs to be created.The result is an isomorphic ASIC implementation made in a 28 nm FD-SOI process, with proper internal word lengths extracted through simulation. Some techniques to lessen the burden on hardware without losing performance is presented which helps reduce area and increase speed of the implementation. Finally, a parallelized version of the algorithm is proposed, if one should desire to explore an implementation with higher system throughput, at almost no furtherexpense of user estimation error.
@mastersthesis{diva2:1486517,
author = {Henriksson, Mikael},
title = {{Implementation of a Hardware Coordinate Wise Descend Algorithm with Maximum Likelihood Estimator for Use in mMTC Activity Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5326--SE}},
year = {2020},
address = {Sweden},
}
@mastersthesis{diva2:1463209,
author = {Ekudd, Anton},
title = {{Elmätare i mjukvara}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--20/0497--SE}},
year = {2020},
address = {Sweden},
}
High speed laser-scanning cameras such as Ranger3 from SICK send 3D images with high resolution and dynamic range. Typically the bandwidth of the transmission link set the limit for the operational frequency of the system. This thesis show how a lossless image compression system in most cases can be used to reduce bandwidth requirements and allow for higher operational frequencies. A hardware encoder is implemented in pl on the ZC-706 development board featuring a ZYNQ Z7045 SoC. In addition, a software decoder is implemented in C++. The encoder is based on the felics and jpeg-ls lossless compression algorithms and the implementation operate at 214.3 MHz with a max throughput of 3.43 Gbit/s. The compression ratio is compared to that of competing implementations from Teledyne DALSA Inc. and Pleora Technologies on a set of typical 3D range data images. The proposed algorithm achieve a higher compression ratio while maintaining a small hardware footprint.
@mastersthesis{diva2:1426363,
author = {Hinnerson, Martin},
title = {{A Resource Efficient, HighSpeed FPGA Implementation of Lossless Image Compression for 3D Vision}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--19/5205--SE}},
year = {2019},
address = {Sweden},
}
A convolutional neural network (CNN) is a deep learning framework that is widely used in computer vision. A CNN extracts important features of input images by perform- ing convolution and reduces the parameters in the network by applying pooling operation. CNNs are usually implemented with programming languages and run on central process- ing units (CPUs) and graphics processing units (GPUs). However in recent years, research has been conducted to implement CNNs on field-programmable gate array (FPGA).
The objective of this thesis is to implement a CNN on an FPGA with few hardware resources and low power consumption. The CNN we implement is for digits recognition. The input of this CNN is an image of a single digit. The CNN makes inference on what number it is on that image. The performance and power consumption of the FPGA is compared with that of a CPU and a GPU.
The results show that our FPGA implementation has better performance than the CPU and the GPU, with respect to runtime, power consumption, and power efficiency.
@mastersthesis{diva2:1367984,
author = {Wang, Zhenyu},
title = {{A Digits-Recognition Convolutional Neural Network on FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5264--SE}},
year = {2019},
address = {Sweden},
}
The FFT support in an Ericsson's proprietary DSP is to be improved in order to achieve high performance without disrupting the current DSP architecture too much. The FFT:s and inverse FFT:s in question should support FFT sizes ranging from 12-2048, where the size is a multiple of prime factors 2, 3 and 5. Especially memory access conflicts could cause low performance in terms of speed compared with existing hardware accelerator. The problem addressed in this thesis is how to minimise these memory access conflicts. The studied FFT is a mixed-radix DIT FFT where the butterfly results are written back to addresses of a certain order. Furthermore, different buffer structures and sizes are studied, as well as different order in which to perform the operations within each FFT butterfly stage, and different orders in which to shuffle the samples in the initial stage.
The study shows that for both studied buffer structures there are buffer sizes giving good performance for the majority of the FFT sizes, without largely changing the current architecture. By using certain orders for performing the operations and shuffling within the FFT stages for remaining FFT sizes, it is possible to reach good performance also for these cases.
@mastersthesis{diva2:1358405,
author = {Jonsson, Sofia},
title = {{Minimising Memory Access Conflicts for FFT on a DSP}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5263--SE}},
year = {2019},
address = {Sweden},
}
The main goal of this thesis will be the design and implementation of a 2048-point FFT on an FPGA through the use of VHDL code.The FFT will use a butterfly Radix-2 architecture with focus on the comparison of the parameters between the system with different Worlengths, Coefficient Wordlengths and Symbol Error rates as well as different modulation types, comparing 64QAM and 256QAM for the 5Gsystem.This implementation will replace an FFT function block in a Matlab based open source 5G NR simulator based on the 3GPP 15 standard and simulate spectrum, MSE payload,and SER performance.
@mastersthesis{diva2:1357116,
author = {Vasilica, Vlad Valentin},
title = {{FFT Implemention on FPGA for 5G Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5259--SE}},
year = {2019},
address = {Sweden},
}
The propagation paths of sound in water can be somewhat complicated due to the fact that the sound speed in water varies with properties such as water temperature and pressure, which has the effect of curving the propagation paths. This thesis shows how sound propagation in water can be simulated using a ray-tracing based approach on a GPU using Nvidia’s OptiX ray-tracing engine. In particular, it investigates how much speed-up can be achieved compared to CPU based implementations and whether the RT cores introduced in Nvidia’s Turing architecture, which provide hardware accelerated ray-tracing, can be used to speed up the computations. The presented GPU implementation is shown to be up to 310 times faster then the CPU based Fortran implementation Bellhop. Although the speed-up is significant, it is hard to say how much speed-up is gained by utilizing the RT cores due to not having anything equivalent to compare the performance to.
@mastersthesis{diva2:1352170,
author = {Ulmstedt, Mattias and Stålberg, Joacim},
title = {{GPU Accelerated Ray-tracing for Simulating Sound Propagation in Water}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5247--SE}},
year = {2019},
address = {Sweden},
}
The fuel usage of a hybrid electric vehicle can be reduced by strategically combining the usage of the combustion engine with the electric motor. One method to determine an optimal split between the two is to use dynamic programming. However, the amount of computations grows exponentially with the amount of states which makes its usage difficult on sequential hardware. This thesis project explores the usage of FPGAs for speeding up the required computations to possibly allow the optimisation to run in real time in the vehicle. A tool to convert a vehicle model to a hardware description language was developed and evaluated. The current version does not run fast enough to run in real time, but some optimisations which would allow that are proposed.
@mastersthesis{diva2:1327672,
author = {Skarman, Frans},
title = {{High Level Synthesis for Optimising Hybrid Electric Vehicle Fuel Consumption Using FPGAs and Dynamic Programming}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5253--SE}},
year = {2019},
address = {Sweden},
}
Geographically distributed networks of acoustic sensors can be used to identify and localize the origin of acoustic phenomena. One area of use is localization of snipers by detecting the bullet's shock wave and the muzzle blast. At FOI Linköping, this system is planned to be adapted from a wire bounded sensor network into a wireless sensor network (WSN). When changing from wire bounded communication to wireless, the issue of synchronization becomes present. Synchronization can be achieved in multiple ways with different benefits depending of the method of choice. This thesis studies the synchronization method of using the highly accurate clock in Global Navigation Satellite System (GNSS) modules. This synchronization method is developed into an independent time stamping device that can be connected to each sensor in the WSN. This ensure that all sensors are synchronized to Coordinated Universal Time (UTC). The thesis starts with a pre-study where different solutions are investigated and evaluated. After the pre-study, a development stage is begun where the best solution is developed into a model to be easily implemented in the future. The result is a model existing of a microcontroller, a timing module and an ADC with built in filter and amplification.
@mastersthesis{diva2:1328076,
author = {Johansson, Malin},
title = {{Synchronization of Acoustic Sensors in a Wireless Network}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--19/0485--SE}},
year = {2019},
address = {Sweden},
}
Applications run on embedded processors are constantly evolving. They are for the most part growing more complex and the processors have to increase their performance to keep up. In this thesis, an embedded DSP SIMT processor with decoupled execution units is under investigation. A SIMT processor exploits the parallelism gained from issuing instructions to functional units or to decoupled execution units. In its basic form only a single instruction is issued per cycle. If the control of the decoupled execution units become too fine-grained or if the control burden of the master core becomes sufficiently high, the fetching and decoding of instructions can become a bottleneck of the system.
This thesis investigates how to parallelize the instruction fetch, decode and issue process. Traditional parallel fetch and decode methods in superscalar and VLIW architectures are investigated. Benefits and drawbacks of the two are presented and discussed. One superscalar design and one VLIW design are implemented in RTL, and their costs and performances are compared using a benchmark program and synthesis. It is found that both the superscalar and the VLIW designs outperform a baseline scalar processor as expected, with the VLIW design performing slightly better than the superscalar design. The VLIW design is found to be able to achieve a higher clock frequency, with an area comparable to the area of the superscalar design.
This thesis also investigates how instructions can be encoded to lower the decode complexity and increase the speed of issue to decoupled execution units. A number of possible encodings are proposed and discussed. Simulations show that the encodings have a possibility to considerably lower the time spent issuing to decoupled execution units.
@mastersthesis{diva2:1327324,
author = {Pettersson, Andreas},
title = {{Parallel Instruction Decoding for DSP Controllers with Decoupled Execution Units}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5218--SE}},
year = {2019},
address = {Sweden},
}
I detta examensarbete har tidsstämplar extraherats ur ett forensiskt perspektiv från ett hemautomationssystem med styrenheten Homey från Athom. Först konstruerades ett fiktivt händelsescenario gällande ett inbrott i en lägenhet med ett hemautomationssystem. Hemautomationssystemet bestod av flera perifera enheter som använde olika trådlösa nätverksprotokoll. Enheterna triggades under händelsescenariot. Därefter testades olika metoder för att få ut data i form av tidsstämplar. De metoder som testades var rest-API, UART och chip-off på flashminnet medan JTAG inte hanns med på grund av tidsbrist. Den metod som gav bäst resultat var rest-API:t som möjliggjorde extrahering av alla tidsstämplar samt information om alla enheter. I flashminnet hittades alla tidsstämplar, men det var inte möjligt att koppla ihop dessa tidsstämplar med en specifik enhet utan att använda information från rest-API:t. Trots att rest-API:t gav bäst resultat så var det den metod som krävde en mängd förutsättningar i form av bland annat inloggningsuppgifter eller en rootad mobil. Med hjälp av de extraherade tidsstämplarna rekonstruerades sedan händelsescenariot för inbrottet.
@mastersthesis{diva2:1325892,
author = {Baghyari, Roza and Nykvist, Carolina},
title = {{Händelsekonstruktion genom säkrande och analys av data från ett hemautomationssystem}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--19/0486--SE}},
year = {2019},
address = {Sweden},
}
This master’s thesis covers the design and implementation of a monopulse directionof arrival (DOA) estimation algorithm on an FPGA. The goal is to implement a complete system that is capable of estimating the bearing of an incident signal. In order to determine the estimate quality both a theoretical and practical noise analysis of the signal chain is performed.
Special focus is placed on the statistical properties of the transformation from I/Q-demodulated signals with correlated noise to a polar representation. The pros and cons for three different methods of calculating received signal phasors are also covered.The system is limited to two receiving channels which constrains this report to a 2D analysis. In addition the used hardware is limited to C-band signals. We show that an FPGA implementation of monopulse techniques is definitely viable and that an SNR higher than ten dB allows for a gaussian approximation of the polar representationof an I/Q signal.
@mastersthesis{diva2:1319006,
author = {Patriksson, Alfred},
title = {{Radio signal DOA estimation:
Implementing radar signal direction estimation on an FPGA.}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5199--SE}},
year = {2019},
address = {Sweden},
}
A digital filter is a system or a device that modifies a signal. This is an essential feature in digital communication. Using optical fibers in the communication has various advantages like higher bandwidth and distance capability over copper wires. However, at high-rate transmission, chromatic dispersion arises as a problem to be relieved in an optical communication system. Therefore, it is necessary to have a filter that compensates chromatic dispersion. In this thesis, we introduce the implementation of a new architecture of the filter and compare it with a previously proposed architecture.
@mastersthesis{diva2:1271001,
author = {Bae, Cheolyong and Gokhale, Madhur},
title = {{Implementation of High-Speed 512-Tap FIR Filters for Chromatic Dispersion Compensation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5179--SE}},
year = {2018},
address = {Sweden},
}
Finite-length impulse response (FIR) filters are one of the most commonly used digital signal processing algorithms used nowadays where a FPGA is the device used to implement it. The continued development of the FPGA device through the insertion of dedicated blocks raised the need to study the advantages offered by different FPGA families. The work presented in this thesis study the special features offered by FPGAs for FIR filters and introduce a cost model of resource utilization. The used method consist of several stages including reading, classification of features and generating coefficients. The results show that FPGAs have common features but also specific differences in features as well as resource utilization. It has been shown that there is misconception when dealing with FPGAs when it comes to FIR filter as compared to ASICs.
@mastersthesis{diva2:1256720,
author = {Akif, Ahmed},
title = {{FIR Filter Features on FPGA}},
school = {Linköping University},
type = {{}},
year = {2018},
address = {Sweden},
}
For every year the importance of lowering energy consumptionof our devices gets more important. Wireless devicesget smaller which leads to the fact that they need smallerbatteries than earlier versions. At the same time the customersstill have high requirements on the battery time. So what followsis that new technologies are needed to meet the customerrequirements by lowering the energy consumption for the devicesto maintain the same battery time as earlier.Today it is very common that these wireless devices makesuse of the wireless Bluetooth protocol in order to communicatewith other devices, for example with a mobile application.Bluetooth is in many cases more energy consuming thannecessary. In this report the wireless Bluetooth Low Energyprotocol will be tested and evaluated to see if the energy consumptionof a battery driven ground station for weather measurementscan be reduced.
@mastersthesis{diva2:1250910,
author = {Gustafsson, Viktor and Waller, Calle},
title = {{Usage of Bluetooth Low Energy for Weather Measurements}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--18/0472--SE}},
year = {2018},
address = {Sweden},
}
The primary focus of this thesis has been to design a network packet scheduler for the 5G (fifth generation) network at Ericsson in Linköping, Sweden. Network packet scheduler manages in which sequences the packages in a network will be transmitted, and will put them in a queue accordingly. Depending on the requirement for the system different packet schedulers will work in different ways. The scheduler that is designed in this thesis has a timing wheel as its core. The packages will be placed in the timing wheel depending on its final transmission time and will be outputted accordingly. The algorithm will be implemented on an FPGA (Field Programmable gate arrays). The FPGA itself is located in a cloud environment. The platform in which the FPGA is located on is called "Amazon EC2 F1", this platform can be rented with a Linux instance which comes with everything that is necessary to develop a synthesized file for the FPGA. Part of the thesis will discuss the design of the algorithm and how it was customized for a hardware implementation and part of the thesis will describe using the instance environment for development.
@mastersthesis{diva2:1247373,
author = {Jonsson, Simon},
title = {{Designing a Scheduler for Cloud-Based FPGAs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5162--SE}},
year = {2018},
address = {Sweden},
}
Time synchronization between systems having no external reference can be an issue in small wireless node-based systems. In this thesis a transceiver is designed and implemented in two separate systems. Then the timing algorithm of "TwoWay Time Transfer" is then chosen to correct any timing error between the two free running clocks of the systems. In conclusion the results are compared towards having both systems get their timing based on GPS timing.
@mastersthesis{diva2:1239516,
author = {Carlsson, Erik},
title = {{Synchronization of Distributed Units without Access to GPS}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5153--SE}},
year = {2018},
address = {Sweden},
}
This thesis presents Monza, a system for accelerating the simulation of modelsof physical systems described by ordinary differential equations, using a generalpurpose computer with a PCIe FPGA expansion card. The system allows bothautomatic generation of an FPGA implementation from a model described in theModelica programming language, and simulation of said system.Monza accomplishes this by using a customizable hardware architecture forthe FPGA, consisting of a variable number of simple processing elements. A cus-tom compiler, also developed in this thesis, tailors and programs the architectureto run a specific model of a physical system.Testing was done on two test models, a water tank system and a Weibel-lung,with up to several thousand state variables. The resulting system is several timesfaster for smaller models and somewhat slower for larger models compared to aCPU. The conclusion is that the developed hardware architecture and softwaretoolchain is a feasible way of accelerating model execution, but more work isneeded to ensure faster execution at all times.
@mastersthesis{diva2:1191000,
author = {Lundkvist, Herman and Yngve, Alexander},
title = {{Accelerated Simulation of Modelica Models Using an FPGA-Based Approach}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5106--SE}},
year = {2018},
address = {Sweden},
}
The goal of this thesis has been to implement a hardware architecture for FPGA that calculates the fast Fourier transform (FFT) of a signal using one million samples. The FFT has been designed using a single-delay feedback architecture withrotators and butterflies, including a three-stage rotator with one million rotation angles. The design has been implemented onto a single FPGA and has a throughput of 233 Msamples/s. The calculated FFT has high accuracy with a signal to quantization noise ratio (SQNR) of 95.6 dB.
@mastersthesis{diva2:1184623,
author = {Mellqvist, Tobias and Kanders, Hans},
title = {{One Million-Point FFT}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5111--SE}},
year = {2018},
address = {Sweden},
}
As autonomous driving is rapidly becoming the next major challenge in the auto- motive industry, the problem of Simultaneous Localization And Mapping (SLAM) has never been more relevant than it is today. This thesis presents the idea of examining SLAM algorithms by implementing such an algorithm on a radio con- trolled car which has been fitted with sensors and microcontrollers. The software architecture of this small-scale vehicle is based on the Robot Operating System (ROS), an open-source framework designed to be used in robotic applications.
This thesis covers Extended Kalman Filter (EKF)-based SLAM, FastSLAM, and GraphSLAM, examining these algorithms in both theoretical investigations, simulations, and real-world experiments. The method used in this thesis is model- based development, meaning that a model of the vehicle is first implemented in order to be able to perform simulations using each algorithm. A decision of which algorithm to be implemented on the physical vehicle is then made backed up by these simulation results, as well as a theoretical investigation of each algorithm.
This thesis has resulted in a dynamic model of a small-scale vehicle which can be used for simulation of any ROS-compliant SLAM-algorithm, and this model has been simulated extensively in order to provide empirical evidence to define which SLAM algorithm is most suitable for this application. Out of the algo- rithms examined, FastSLAM was proven to the best candidate, and was in the final stage, through usage of the ROS package gMapping, successfully imple- mented on the small-scale vehicle.
@mastersthesis{diva2:1218791,
author = {Alexandersson, Johan and Nordin, Olle},
title = {{Implementation of SLAM Algorithms in a Small-Scale Vehicle Using Model-Based Development}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5101--SE}},
year = {2017},
address = {Sweden},
}
High Level Synthesis (HLS) is a new method for developing applications for use on FPGAs. Instead of the classic approach using a Hardware Descriptive Language (HDL), a high level programming language can be used. HLS has many perks, including high level debugging and simulation of the system being developed. This shortens the development time which in turn lowers the development cost. In this thesis an evaluation is made regarding the feasibility of using SDAccel as the HLS tool in the OpenCL environment. Two image processing algorithms are implemented using OpenCL C and then synthesized to run on a Kintex Ultrascale FPGA. The implementation focuses both on low latency and throughput as the target environment is a video distribution network used in vehicles. The network provides the driver with video feeds from cameras mounted on the vehicle. Finally the test result of the algorithm runs are presented, displaying how well the HLS tool has preformed in terms of system performance and FPGA resource utilization.
@mastersthesis{diva2:1159832,
author = {Isaksson, Johan},
title = {{FPGA-Accelerated Image Processing Using High Level Synthesis with OpenCL}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5091--SE}},
year = {2017},
address = {Sweden},
}
Digital filters play a key role in many DSP applications and FIR filters are usually selected because of their simplicity and stability against IIR filters.In this thesis eight architectures for multi-stream FIR filtering are studied. Primarily, three kinds of architectures are implemented and evaluated: one-toone mapping, time-multiplexed and pipeline interleaving. During implementation, practical considerations are taken into account such as implementation approach and number representation. Of interest is to see the performance comparison of different architectures, including area and power. The trade-off between area and power is an attractive topic for this work. Furthermore, the impact of the filter order and pipeline interleaving are studied.The result shows that the performance of different architectures differ a lot even with the same sample rate for each stream. It also shows that the performance of different architectures are affected by the filter order differently. Pipeline interleaving improves area utilization at the cost of rapid increment of power. Moreover, it has negative impact on the maximum working frequency.All the FIR filter architectures are synthesized in a 65nm technology.
@mastersthesis{diva2:1143816,
author = {Jiang, Yang},
title = {{Implementation and Evaluation of Architectures for Multi-Stream FIR Filtering}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5068--SE}},
year = {2017},
address = {Sweden},
}
The aim of this thesis is to understand the current implementation, how different hardware and output frequency affects the hydraulic actuators in the current platform and Then an improve the controller should be presented. This needs to be both faster then the current controller and should not use more CPU recurses then necessary. With the understanding of current controller, three new regulators where implemented and tested. One uses a PI regulator and the other two uses an adaptive algorithm to generate the control signal. All where faster than the current one and the PI-implementation uses the lowest amount of CPU recurses, on the other hand this needs to be calibrated for the different hardware and output frequency’s. ThetwoadaptivecontrollersrequiresahigheramountofCPUrecurses, instead it requires less calibration to work.
@mastersthesis{diva2:1102614,
author = {Kleback, Oskar},
title = {{Study on Low Voltage Power Electronics Used for Actuator Control}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5030--SE}},
year = {2017},
address = {Sweden},
}
The aim of this thesis is to explore the possibility of integrating an AC3 audio de- coding module into the company’s current product. Due to limited left resources on the FPGA chip in the company’s current product, the focus of this thesis is to be resource efficient. In this thesis, a system for AC3 audio decoding is designed and implemented. In order to use less logic on FPGA, PicoBlaze soft processor is used to control the whole processing flow. The system is designed and synthe- sized for a Spartan-6 FPGA which can be easily ported to the company’s current platform.
@mastersthesis{diva2:1088539,
author = {Han, Dapeng},
title = {{FPGA Implementation of an AC3 Decoder}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5028--SE}},
year = {2017},
address = {Sweden},
}
Ett projekt har genomförts i python för att läsa och analysera nätlistor från eCAD programmet Altium. Projektet är en prototyp till en mjukvara som färdigutvecklad ska kunna användas till att automatisera kontakttest på mönsterkort mha JTAG Boundary Scan. Projektet undersöker hur stor andel av kontaktbanorna på några godtyckligt valda mönsterkort som är tillgängliga för Boundary Scan test och finner att i snitt 39% av kontaktbanorna är observerbara.
@mastersthesis{diva2:1083741,
author = {Berggren, Erik},
title = {{Testverktyg för JTAG Boundary Scan}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--17/0463--SE}},
year = {2017},
address = {Sweden},
}
Massive MIMO is an emerging technology for future wireless systems that has received much attention from both academia and industry recently. The most prominent feature of Massive MIMO is that the base station is equiped with a large number of antennas. It is therefore important to create scalable architectures to enable simple deployment in different configurations.
In this thesis, a distributed architecture for performing the baseband processing in a massive OFDM MU-MIMO system is proposed and analyzed. The proposed architecture is based on connecting several identical nodes in a K-ary tree. It is shown that, depending on the chosen algorithms, all or most computations can be performed in a distrbuted manner. Also, the computational load of each node does not depend on the number of nodes in the tree (except for some timing issues) which implies simple scalability of the system.
It is shown that it should be enough that each node contains one or two complex multipliers and a few complex adders running at a couple of hundres MHz to support specifications similar to LTE. Additionally the nodes must communicate with each other over links with data rates in the order of some Gbps.
Finally, a VHDL implementation of the system is proposed. The implementation is parameterized such that a system can be generated from a given specification.
@mastersthesis{diva2:1066262,
author = {Bertilsson, Erik},
title = {{A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5019--SE}},
year = {2017},
address = {Sweden},
}
Smartphones’ usages are growing rapidly. Smart phone usages are not limited to the receiving/calling or SMSing anymore. People use smartphone for online shopping, searching various information in the web, bank transactions, games, different applications for different usages etc. Anything is possible by just having a smartphone and the internet. The more usages of the smartphone also increase keeping more secrete information about the user in the phone. The popularity is increasing and so is different ways to steal/hack the phones. There are many areas which require further investigation in the field of smartphone security and authentication.
This thesis work evaluates the scope of different inbuilt sensors in smartphones for mobile authentication based techniques. The Android Operating system was used in the implementation phase. Android OS has many open source library and Services which have been used for the sensor identification using Java Android platform.
Two applications using Accelerometer sensor and one using Magnetometer sensor were developed. Two foremost objectives of this thesis work were-1) To figure it out the possibilities of sensor based authentication technique. 2) To check the end user perception/opinion about the applications.
Usability testing was conducted to gather the user’s assessments/vision of the applications. Two methods which were used for usability testing are named Magical move and Tapping. Users (Most of them) have shown interest and inclination towards tapping application. Although, some users were also expressed inhibitions using both sensor based methods.
@mastersthesis{diva2:1065933,
author = {Bhide, Priyanka},
title = {{Design and Evaluation of Aceelerometer Based Mobile Authentication Techniques}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5020--SE}},
year = {2017},
address = {Sweden},
}
The upcoming 5G mobile communications system promises to enable use cases requiring ultra-reliable and low latency communications. Researchers therefore require more detailed information about aspects such as channel coding performance at very low block error rates. The simulations needed to obtain such results are very time consuming and this poses achallenge to studying the problem. This thesis investigates the use of hardware acceleration for performing fast simulations of turbo code performance. Special interest is taken in investigating different methods for generating normally distributed noise based on pseudorandom number generator algorithms executed in DSP:s. A comparison is also done regarding how well different simulator program structures utilize the hardware. Results show that even a simple program for utilizing parallel DSP:s can achieve good usage of hardware accelerators and enable fast simulations. It is also shown that for the studied process the bottleneck is the conversion of hard bits to soft bits with addition of normally distributed noise. It is indicated that methods for noise generation which do not adhere to a true normal distribution can further speed up this process and yet yield simulation quality comparable to methods adhering to a true Gaussian distribution. Overall, it is show that the proposed use of hardware acceleration in combination with the DSP software simulator program can in a reasonable time frame generate results for turbo code performance at block error rates as low as 10−9.
@mastersthesis{diva2:1098448,
author = {Nordmark, Oskar},
title = {{Turbo Code Performance Analysis Using Hardware Acceleration}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5010--SE}},
year = {2016},
address = {Sweden},
}
The energy consumption to make the off-chip memory writing and readings are aknown problem. In the image processing field structure from motion simpler compressiontechniques could be used to save energy. A balance between the detected features suchas corners, edges, etc., and the degree of compression becomes a big issue to investigate.In this thesis a deeper study of this balance are performed. A number of more advancedcompression algorithms for processing of still images such as JPEG is used for comparisonwith a selected number of simpler compression algorithms. The simpler algorithms canbe divided into two categories: individual block-wise compression of each image andcompression with respect to all pixels in each image. In this study the image sequences arein grayscale and provided from an earlier study about rolling shutters. Synthetic data setsfrom a further study about optical flow is also included to see how reliable the other datasets are.
@mastersthesis{diva2:1071408,
author = {Ferdeen, Mats},
title = {{Reducing Energy Consumption Through Image Compression}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4995--SE}},
year = {2016},
address = {Sweden},
}
In digital video broadcasting, sometimes many sources are used. When handling this broadcast a problem is a limited interface that has a fixed number to input channels but overcapacity in data transfer rate. To be able to connect more inputs to the interface a protocol that lets the user send more than one channel on a connection is needed. The important part for the protocol is that it keeps the input equal to the output both in timing and in what data is sent. These are done by encapsulating the data and use a header containing information for recreating the input. To solve the timing constraint dynamic buffer are used that makes all data evenly delayed. To validate the functionality of the protocol a test designed is implemented in VHDL and simulated.
@mastersthesis{diva2:1057270,
author = {Werin, Atle},
title = {{Use of a Multiplexer to get Multiple Streams Through a Limited Interface:
Encapsulation of digital video broadcasting streams}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET-16/0460--SE}},
year = {2016},
address = {Sweden},
}
This thesis proposes and implements a flexible platform to perform Hardware-In-the-Loop (HIL) co-simulation using a Field-Programmable-Gate-Array (FPGA). The HIL simulations are performed with SystemModeler working as a software simulator and the FPGA as the co-simulator platform for the digital hardware design. The work presented in this thesis consists of the creation of: A communication library in the host computer, a system in the FPGA that allows implementation of different digital designs with varying architectures, and an interface between the host computer and the FPGA to transmit the data. The efficiency of the proposed system is studied with the implementation of two common digital hardware designs, a PID controller and a filter. The results of the HIL simulations of those two hardware designs are used to verify the platform and measure the timing and area performance of the proposed HIL platform.
@mastersthesis{diva2:1059834,
author = {Acevedo, Miguel},
title = {{FPGA-Based Hardware-In-the-Loop Co-Simulator Platform for SystemModeler}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5013--SE}},
year = {2016},
address = {Sweden},
}
The purpose of this thesis is to extend an existing autopilot with automatic takeoff and landing algorithms for small fixed wing unmanned aircrafts. The work has been done from a systems engineering perspective and as for solution candidates this thesis has a bias towards solutions utilizing fuzzy logic. The coveted promises of fuzzy logic was primarily the idea to have a design that was easily tunable with very little knowledge beyond flight experience for a particular aircraft. The systems engineering perspective provided a way to structure and reason about the project where the problem has been decoupled from different solutions and the work has been divided in a way that would allow multiple aspects of the project to be pursued simultaneously. Though the fuzzy logic controllers delivered functional solutions the promises related to ease of tuning was not fulfilled in a landing context. This might have been a consequence of the designs attempted but in the end a simpler solution outperformed the implemented fuzzy logic controllers. Takeoff did not present the same issues in tuning but did require some special care to handle the initial low airspeeds in an hand launch.
@mastersthesis{diva2:1055556,
author = {Magnus, Vestergren},
title = {{Automatic Takeoff and Landing of Unmanned Fixed Wing Aircrafts:
A Systems Engineering Approach}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5009--SE}},
year = {2016},
address = {Sweden},
}
FPGAs are attractive devices as they enable the designer to make changes to the system during its lifetime. This is important in the early stages of development when all the details of the final system might not be known yet. In a research environment like at CERN there are many FPGAs used for this very reason and also because they enable high speed communication and processing. The biggest problem at CERN is that the systems might have to operate in a radioactive envi- ronment which is very harsh on electronics. ASICs can be designed to withstand high levels of radiation and are used in many places but they are expensive in terms of cost and time and they are not very flexible. There is therefore a need to understand if it is possible to use FPGAs in these places or what needs to be done to make it possible.
Mitigation techniques can be used to avoid that a fault caused by radiation is disrupting the system. How this can be done and the importance of under- standing the underlying architecture of the FPGA is discussed in this thesis. A simulation tool used for injecting faults into the design is proposed in order to verify that the techniques used are working as expected which might not always be the case. The methods used during simulation which provided the best protec- tion against faults is added to a system design which is implemented on a flash based FPGA mounted on a board. This board was installed in the CERN Proton Synchrotron for 99 days during which the system was continuously monitored. During this time 11 faults were detected and the system was still functional at the end of the test. The result from the simulation and hardware test shows that with reasonable effort it is possible to use commercially available FPGAs in a radioactive environment.
@mastersthesis{diva2:1052529,
author = {Sandberg, Hampus},
title = {{Radiation Hardened System Design with Mitigation and Detection in FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5008--SE}},
year = {2016},
address = {Sweden},
}
This thesis implements a system for calculating the displacement between two consecutive video frames. The displacement is calculated using a polynomial expansion-based algorithm. A unit tested bottoms-up approach is successfully used to design and implement the system. The designed and implemented system is thoroughly elaborated upon. The chosen algorithm and its computational details are presented to provide context to the implemented system. Some of the major issues and their impact on the system are discussed.
@mastersthesis{diva2:1048981,
author = {Ehrenstråhle, Carl},
title = {{Polynomial Expansion-Based Displacement Calculation on FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4981--SE}},
year = {2016},
address = {Sweden},
}
This master thesis investigates how implantable devices can operate without the use of internal batteries. The idea is to be able to drive a circuit inside human tissue to i.e. monitor blood flow in patients. Methods such as harvesting energy from the environment to power up the devices and wireless energy transferring such as electromagnetic induction have been investigated. Implantable devices as this communicates wirelessly, this means that data will be transferred through the air. Sending data streams through air have security vulnerabilities. These vulnerabilities can be prevented and have been discussed. Measurements of the electromagnetic induction have been made with tissue-like material, to see how tissue affects the received signal strength indication levels. Optimization have been made to make printed inductors as efficient as possible by looking at the parameters that have an impact on it. This to get the most out of the inductor, while still keeping it small when it comes implantable devices. Smaller size is better for implantable device.
@mastersthesis{diva2:1046830,
author = {Chizarie, Anders},
title = {{Driving Implantable Circuits Without Internal Batteries}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4999--SE}},
year = {2016},
address = {Sweden},
}
This thesis consists of designing and implementing a bus system for a specific computersystem for MediaTek Sweden AB . The focus of the report is to show the considerations andchoices made in the design of a suitable bus system. Implementation details describe howthe system is constructed. The results show that it is possible to maintain a high bandwidthin many parts of the system if an appropriate topology is chosen. If all units in a bus systemare synchronous it is difficult to reach low latency in the communication.
@mastersthesis{diva2:940128,
author = {Svensk, Gustav},
title = {{Bus System for Coresonic SIMT DSP}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4944--SE}},
year = {2016},
address = {Sweden},
}
With increasing demands on mobile communication transfer rates the circuits in mobile phones must be designed for higher performance while maintaining low power consumption for increased battery life. One possible way to improve an existing architecture is to implement instruction prefetching. By predicting which instructions will be executed ahead of time the instructions can be prefetched from memory to increase performance and some instructions which will be executed again shortly can be stored temporarily to avoid fetching them from the memory multiple times.
By creating a trace driven simulator the existing hardware can be simulated while running a realistic scenario. Different methods of instruction prefetch can be implemented into this simulator to measure how they perform. It is shown that the execution time can be reduced by up to five percent and the amount of memory accesses can be reduced by up to 25 percent with a simple loop buffer and return stack. The execution time can be reduced even further with the more complex methods such as branch target prediction and branch condition prediction.
@mastersthesis{diva2:935885,
author = {Lind, Tobias},
title = {{Evaluation of Instruction Prefetch Methods for Coresonic DSP Processor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4959--SE}},
year = {2016},
address = {Sweden},
}
This project is designed in order create a system that is simple and highly functional for the purpose of maintaining the well-being of plant life through use of Internet of Things (IoT). This project will focus around the idea of a self-sustaining system using a microcontroller board with access via Wi-Fi communications and ability to use the photolytic sensors to recharge the systems power supply. This project is focused on small scale home gardening.
@mastersthesis{diva2:934190,
author = {de Maris, Jay},
title = {{Multi-Function Automatic Wireless Irrigation System (MAWIS)}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--16/0456--SE}},
year = {2016},
address = {Sweden},
}
A partially parallel reconfigurable channel switch is constructed for use in DFBR. Its permutation can be changed while running without any interruption in the streams of data. Three approaches are tried: one based on asorting network, one based on memories and multiplexers and one based on a Clos network. Variants with the pattern stored in memories and in shift registers are tried. They are implemented in automatically generated Verilog and synthesized for an FPGA. Their cost in terms of area use, memory use and maximum clock frequency is compared and the results show that the Clos based approach is superior in all aspects and that pattern data should not be saved in shift registers. The work is open source and available for download at https://github.com/channelswitch/channelswitch.
@mastersthesis{diva2:931953,
author = {Stenholm, Roland},
title = {{Time-Multiplexed Channel Switches for Dynamic Frequency Band Reallocation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4949--SE}},
year = {2016},
address = {Sweden},
}
Since its introduction in the 1980s, field-programmable gate arrays have seen a growing use over the years. Nowadays FPGAs are found in everything from planetary rovers and base transceiver stations to bitcoin miners. With the technological advancements and the growth of the market, there has been a steady flow of new models with increasing capacity. To make it possible to use this capacity in an efficient way, also the software tools have been improved.
The applications in research have grown and so has the will to compare both the speed and size between different implementations that try to solve the same or similar problem. However, how to make a good comparison is not well defined. Since few research papers have source code available, such comparisons are hard to make and there is a high risk of comparing apples to pears.
In this thesis, we will study the impact of different software settings and design constraints on the FPGA design flows to better understand how to report research results. This will be done by running selected designs through different EDA tools, using various settings and finally analyse the data the tools provide. At the end we will begin to define guidelines for how to report and compare implementation data, to give a good account of their performance compared to other designs.
@mastersthesis{diva2:931269,
author = {Persson, Stefan},
title = {{FPGA Design Tools -:
the Challenges of Reporting Performance Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4935--SE}},
year = {2016},
address = {Sweden},
}
Image convolution is a common algorithm that can be found in most graphics editors. It is used to filter images by multiplying and adding pixel values with coefficients in a filter kernel. Previous research work have implemented this algorithm on different platforms, such as FPGAs, CUDA, C etc. The performance of these implementations have then been compared against each other. When the algorithm has been implemented on an FPGA it has almost always been with a single convolution. The goal of this thesis was to investigate and in the end present one possible way to implement the algorithm with 16 parallel convolutions on a Xilinx Spartan 6 LX9 FPGA and then compare the performance with results from previous work. The final system performs better than multi-threaded implementations on both a GPU and CPU.
@mastersthesis{diva2:930724,
author = {Ström, Henrik},
title = {{A Parallel FPGA Implementation of Image Convolution}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4931--SE}},
year = {2016},
address = {Sweden},
}
The division of computer engineering at Linköping’s university is currentlydeveloping an innovative parallel DSP processor architecture called ePUMA. Onepossible future purpose of the ePUMA that has been thought of is to implement itin base stations for mobile communication. In order to investigate the performanceand potential of the ePUMA as a processing unit in base stations, a model of theLTE physical layer uplink receiving chain has been simulated in Matlab and thenpartially mapped onto the ePUMA processor.The project work included research and understanding of the LTE standard andsimulating the uplink processing chain in Matlab for a transmission bandwidth of5 MHz. Major tasks of the DSP implementation included the development of a300-point FFT algorithm and a channel equalization algorithm for the SIMD unitsof the ePUMA platform. This thesis provides the reader with an introduction tothe LTE standard as well as an introduction to the ePUMA processor. Furthermore,it can serve as a guidance to develop mixed point radix FFTs in general orthe 300 point FFT in specific and can help with a basic understanding of channelequalization. The work of the thesis included the whole developing chain from understandingthe algorithms, simplifying and mapping them onto a DSP platform,and testing and verification of the results.
@mastersthesis{diva2:926644,
author = {Keller, Markus},
title = {{Implementation of LTE Baseband Algorithms for a Highly Parallel DSP Platform}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4941--SE}},
year = {2016},
address = {Sweden},
}
The computer vision problem of object tracking is introduced and explained. An approach to interest point based feature detection and tracking using FAST and BRIEF is presented and the selection of algorithms suitable for implementation on a Xilinx Zynq7000 with an XC7Z020 field-programmable gate array (FPGA) is detailed. A modification to the smoothing strategy of BRIEF which significantly reduces memory utilization on the FPGA is presented and benchmarked against a reference strategy. Measures of performance and resource efficiency are presented and utilized in an iterative development process. A system for interest point based object tracking that uses FAST for feature detection and BRIEF for feature description with the proposed smoothing modification is implemented on the FPGA. The design is described and important design choices are discussed.
@mastersthesis{diva2:898361,
author = {Mollberg, Alexander},
title = {{A Resource-Efficient and High-Performance Implementation of Object Tracking on a Programmable System-on-Chip}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4914--SE}},
year = {2016},
address = {Sweden},
}
Time to digital converter (TDC) is a digital unit that measures the time interval between two events.This is useful to determine the characteristics and patterns of a signal or an event. In this thesis ahybrid TDC is presented consisting of a tapped delay line and a clock counter principle.
The TDC is used to measure the time between received data in a QKD application. If the measuredtime does not exceed a certain value then data had been sent without any interception. It is alsopossible to use TDCs in other fields such as laser-ranging and time-of-flight applications.
The TDC consists of two carry chains, an encoder, a FIFO and a counter for each channel, anAXI-module and a control unit to generate command signals to all channels that are implemented.The time is measured by sampling the signal that has propagated through the carry chain and from thissample encode the propagation length.
In this thesis a TDC is implemented that has a 10 ns dead time and a resolution below 28 psin a four channel mode. The propagation variation is approximately two percent of the total valueduring testing. For the implementation an FPGA-board with a Zynq XC7Z020 SoC is used withSystemVerilog that is a hardware describing language (HDL).
@mastersthesis{diva2:1074964,
author = {Andersson Holmström, Simon},
title = {{Adaptive TDC:
Implementation and Evaluation of an FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET-15/0428-SE}},
year = {2015},
address = {Sweden},
}
Denna rapport syftar till att beskriva arbetet kring att kunna detektera närvaro i ett rum medhjälp av så enkla sensorer som möjligt, kopplade till en Arduino. Samtidigt som detta skerså används också systemet till att med samma sensorer visa klimatet i rummet. Läsaren fåren inblick i problematiken med att detektera människor samt inom funktionen av de valdasensorerna. Utöver detta studeras energiförbrukningen i systemet. Rapportenmynnar ut i enslutsats där en procentuell chans för närvaro presenteras via en internetuppkoppling medhjälp av en omfattande testning av sensorernas beteende.
@mastersthesis{diva2:874843,
author = {Hjelmberg, Eric and Rowell, Henrik},
title = {{Persondetektering i inomhusmiljö med enkla sensorer}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET-15/0449-SE}},
year = {2015},
address = {Sweden},
}
The goal of the thesis is to investigate and propose a new design for a contactor platform, both in terms of hardware and embedded software, which incorporates support to implement new state-of-the-art functions. The platform must support a wide range of contactors from basic ones with only core functions to advanced contactors using modern microcontrollers to provide efficient, quick and reliable operation.
Further, a significant focus of the thesis is on the interaction between electrical engineering and computer engineering. The electronics needs to interact seamlessly with a microcontroller running a versatile software to provide industry-leading performance. To achieve this, the software and hardware is evaluated with focus to develop an optimal platform.
The proposed embedded software uses development techniques rarely used in embedded applications such as UML code generation, compile-time initiation of objects and an object-oriented design, while maintaining the performance of traditional embedded programming. The thesis also provides suggestions to hardware changes to further improve to the contactor’s operation.
@mastersthesis{diva2:858910,
author = {Sandvik, Fredrik and Tingstam, Olle},
title = {{Design and Prototyping of a Scalable Contactor Platform Adapted to State-of-the-Art Functions}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4891--SE}},
year = {2015},
address = {Sweden},
}
In today’s modern society the process of handling crops in an accountable way withoutloss have become more and more important. By letting a gardener evaluate the progressof his plants from relevant data one can reduce these losses and increase effectiveness ofthe whole plantation. This work is about the construction of such a system composedfrom a developers perspective of three different platforms, from the start of data samplingwithin the context of gardening to and end user easily able to understand the data thentranslated. The first platform will be created from scratch with both hardware andsoftware, the next assembled from already finished hardware components and build withsimpler software. The last will essentially only be a software solution in an alreadyfinished hardware environment.
@mastersthesis{diva2:848005,
author = {von Hacht, Karl-Johan},
title = {{Garden Monitoring with Embedded Systems}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0435--SE}},
year = {2015},
address = {Sweden},
}
This thesis presents a way of performing multi dimensional FFT in a continuousflow environment by calculating the FFT of each dimension separately ina pipeline. The result is a three dimensional pipelined FFT implemented on aStratix III FPGA. It can calculate the three dimensional FFT of a data set containing2563 samples with a word size of 32 bits. The biggest challenge and themain part of the work are the data permutations in between the one dimensionalFFT modules, this part of the design make use of an external DDR2 SDRAMas well as on-chip BRAM to store and permute data between the modules. Theevaluations show that the design is hardware efficient and the latency is relativelylow and determined to be 84.2 ms.
@mastersthesis{diva2:842420,
author = {Öhlin, Andreas},
title = {{Real-Time Multi-Dimensional Fast Fourier Transforms on FPGAs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4854--SE}},
year = {2015},
address = {Sweden},
}
When considering processor architectures (either existing ones or when developing new ones), native code for functional testing and performance evaluation will generally be required. In theory, the work load involved in developing such code can be alleviated by compiling existing test cases written in a higher level language.
This thesis focuses on evaluating the feasibility of this approach by developing a basic C compiler using the LLVM framework and porting it to a number of architectures, finishing by comparing the performance of the compiled code with existing results obtained using the CoreMark benchmark. The resulting comparison can serve as a guideline when deciding which approach to choose when taking on a new architecture. The developed compiler and its back end ports can also serve as reference implementations.
While not conclusive, the final results indicate that the approach is highly feasible for certain applications on certain architectures.
@mastersthesis{diva2:825604,
author = {Nielsen, Emil},
title = {{Performance Evaluation of an easily retargeted C compiler using the LLVM framework}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4781--SE}},
year = {2015},
address = {Sweden},
}
A solution to reduce exhaust emissions from heavy commercial vehicles are to haul the vehicles completely or partially electric. This means that the vehicle must contain a significant electric energy source. The large capacity of the energy source causes the vehicle to either sacrifice a large part of its up time to charge the source or apply a higher charge power at the cost of power losses and lifetime of the energy source. This thesis contains a pre-study of high-power DC-charge of hybrid batteries from existing infrastructure suited to electric hybrid cars. Following parts are included in the thesis: modeling of a battery pack and a DC-DC converter, formulation of a MPC controller for the battery pack, analysis of charging strategies and battery restrictions through simulations. The thesis results shows that a longer charging time increases the energy efficiency and reduces the degradation in the battery. It also shows that a charging strategy similar to constant-current-constant-voltage charging should be used for a full charge of an empty battery.
@mastersthesis{diva2:825073,
author = {Hällman, Oscar},
title = {{DC Charging of Heavy Commercial Plug-in Hybrid Electric Vehicles}},
school = {Linköping University},
type = {{LITH-ISY-EX--15/4878--SE}},
year = {2015},
address = {Sweden},
}
In the automotive industry today, embedded systems have reached a level of complexity which is not maintainable with the traditional approach of design- ing automotive embedded systems. For this purpose, many of the worlds leading automotive manufacturers have formed an alliance to apprehend this problem. This has resulted in AUTOSAR, an open standardized architecture for automotive embedded systems, which strives for increased flexibility and safety regulations. This thesis will explore the possibilities of implementing a CAN Communication stack using the AUTOSAR architecture and its corresponding methodology. As a result of this thesis, a complete AUTOSAR CAN communication stack has been implemented, as well has a simulator application with the purpose of testing its functionality.
@mastersthesis{diva2:822343,
author = {Alexandersson, Johan and Nordin, Olle},
title = {{Implementation of CAN Communication Stack in AUTOSAR}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0440--SE}},
year = {2015},
address = {Sweden},
}
Examensarbetet har utförts på företaget Husqvarna AB vid avdelning Concept & Features electric products (EN-NEP). Uppdraget var att utvärdera en alternativ kollisionsmetod till deras robotgräsklippare. Metoden som utvärderats går ut på att detektera kollision med hjälp av samplad data från en accelerometer samt samplad strömnivå från de bägge drivhjulens motorer.
Den metod som används för att detektera kollision på nuvarande robotar fungerar väl men kräver att robotens kaross och chassi rör sig ifrån varandra för att krock skall detekteras. För att kunna reducera antalet komponenter och priset på roboten är andra metoder intressanta att utvärdera för uppdragsgivaren.
En algoritm har designats i simulationsmiljö som sedan testats på ”riktigt” genom implementation i en Raspberry Pi som kommunicerar med robotgräsklipparen. Om den implementerade algoritmen detekterat krock på den samplade datan skickas ett meddelande till roboten att utföra sitt inbyggda krockmönster.
Resultatet som erhölls var ett fungerande system med stor potential. Med fortsatt arbete skulle metoden kunna bli en framtida ersättare alternativt ett komplement till nuvarande metod.
@mastersthesis{diva2:821881,
author = {Ståhl, Johan},
title = {{Kollisionsdetekteringssystem för autonom robot}},
school = {Linköping University},
type = {{LITH-ISY-EX-ET--15/0436--SE}},
year = {2015},
address = {Sweden},
}
Vid högvolymsproduktion av konsumentelektronik är testtid, mätnogrannhet och fabriksgolvsutrymme synonymt med kostnader. Detta har gjort att man på sektionen Test Engineering vid Ericsson Mobile Communications fabrik i Linköping tagit fram ett testkoncept, kallat Pelle-konceptet, där datorkraft och mätutrustning flyttas in i små utrymmessnåla testfixturer anpassade för både robotiserade produktionslinor och manuella. Det nya testkonceptet saknade på våren 1998 ett generellt mät- och stimulikort för mobiltelefoner vilket specificerades och konstruerades under sommren och hösten 1998 av författaren och Stefan Lantz. I rapporten beskrivs arbetet med att specificera kortets funktionsblock samt detaljkonstruktion av de funktionsblock som författaren ansvarat för. Rapporten ger även en inblick i designarbetet och dess svårigheter för ett mät- och stimulikort med många olika funktionsblock på en mycket begränsad yta, samt de problem som uppstår i många projekt till följd av förändrade krav, missupfattningar, kommunikationsmissar och saknad dokumentation.
@mastersthesis{diva2:821906,
author = {Wikström, Rolf},
title = {{Mysak Konstruktion av ett mät- och stimulikort för mobiltelefoner}},
school = {Linköping University},
type = {{LITH-ISY-EX-ET--15/0146--SE}},
year = {2015},
address = {Sweden},
}
Future development in high performance cameras and machine vision applications results in a need for faster vision communication standards. This thesis compares four high speed vision communication standards on the machine vision market. The standards considered are CoaXPress, GigE Vision over 10 Gigabit Ethernet, Camera Link HS and USB3 Vision, all which are capable of higher speeds than their forerunners. The standards are compared in general based on the theory available and with the help of the voting systems Borda count and the Kemeny-Young method. From the result of the general comparison two of the standards are chosen, CoaXPress and 10 GigE Vision, for an in depth comparison. The vision communication standards are tested on a Xilinx ZC706 development board for the Zynq-7000 SoC where resource allocation and power consumption are measured. The thesis gives an overview of the performance of the standards and with no obvious winner the voting system gives an unbiased comparison of the standards with interesting results.
@mastersthesis{diva2:820561,
author = {Löfström, Daniel},
title = {{Comparison of High Speed Vision Communication Standards}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4847--SE}},
year = {2015},
address = {Sweden},
}
This master thesis report covers an investigation of how FPGA based hardware can be used to create customizable measurement instruments, for test of electrical equipment in JAS 39 Gripen. The investigation is done at Saab Support and Services in Arboga.
Electrical equipment are gradually replacing functions, which previously have been obtained by other systems, in safety critical environments. Since the functions are safety critical, they require regular testing in order to verify proper operation. The aircraft JAS 39 Gripen, which is manufactured and developed by Saab, is an example of such system. Proper operation of the avionics in it are essential in order to maintain flying safety.
There already exist systems today that can verify the functionality of electronics in JAS 39 Gripen. However, there are a number of scenarios where those test systems are somewhat inflexible. More flexible test systems are often desired. This flexibility can be obtained by using congurable hardware, suggestively with FPGAs. This approach is investigated in this master thesis.
@mastersthesis{diva2:818674,
author = {Stavström, Marcus},
title = {{Evaluation of FPGA based Test Systems}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4866--SE}},
year = {2015},
address = {Sweden},
}
Rapporten presenterar det examensarbetet som har gått ut på att undersöka möjligheterna att konstruera en sensor som mäter kadens med hjälp av en accelerometer. Implementation av kadensprofilen till ANT+ har gjorts för att möjliggöra synkronisering mellan en sportklocka och sensorn. Kadens är hur fort cyklisten trampar med pedalerna mätt i enheten Varv per minut vanligt förkortat RPM från engelskans Revolutions Per Minute. Hur fort en cyklist trampar påverkar kroppen på många olika sätt och ofta vill cyklisten veta vad aktuell kadens är för att optimera sin prestation. Den undersökta principen att använda en accelerometer för att mäta kadens syftar till att en eventuell prototyp skulle vara lämplig till inomhuscykling även kallad spinning. På en vanlig traditionell cykel har man oftast två hårdvarudelar för att mäta kadens, en monterad på pedalarmen och den andra på cykelramen. Cykelramen på en spinningcykel skiljer sig så pass mot en vanlig cykel att hårdvarudelen som ska sitta på cykelramen inte kan monteras med samma lätthet. Med en accelerometer behövs bara en hårdvarudel som lätt kan monteras på pedalarmen på cykeln. Programutvecklingen har skett med ett Arduino Uno som består av en ATmega328 mikrokontroller från Atmel. Sensorenheten som mäter kadensen består av Arduino Uno, accelerometern LSM303DLHC från STMicroelectronics och ANT-chippet nRF24AP2 från Nordic Semiconductor. Huvudenheten har bestått av en persondator som har agerat mottagare med programmet ANT+ Simulator. Det utvecklade programmet på mikrokontrollen upptäcker när det sker ett pedalvarv och skickar den totala varvtiden tillsammans med antal pedalvarv som totalt inträffat till nRF24AP2 vidare till huvudenheten. Kadensprofilen är den som räknar ut vad aktuell kadens är. Avslutningsvis presenteras ett minimumkrav av hårdvaran och ett förslag av en energisnål mikrokontroller för en eventuell prototyp.
@mastersthesis{diva2:818417,
author = {Westerholm, Glenn},
title = {{Kadenssensor med en accelerometer och ANT+}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0434--SE}},
year = {2015},
address = {Sweden},
}
Detta examensarbete är utfört vid den tekniska högskolan vid Linköping Universitet på programmet högskoleingenjör elektronik. Uppdragsgivaren, Flextronics, är ett företag som utvecklar generell testutrustning inom elektronikproduktion. Den testutrustning som finns behöver uppdateras och examensarbetet går ut på att bygga en ny strömförsörjningsmodul till denna. Den största skillnaden mot tidigare system är att den nya strömförsörjningsmodulen ska klara av högre uteffekt. Eftersom den nya testutrustningen redan är påbörjad finns några krav att ta hänsyn till och ett av dem är att det ska finnas en mikrokontroller i strömförsörjningsmodulen. Mikrokontrollern kan ha funktioner som är användbara såsom inbyggda DAC:ar och ADC:er och designen har gjorts om så att dessa kan utnyttjas och till och med sköta regleringen. Efter en del databladsläsande och simulerande utvecklades en lösning som har med två regulatorer vilka styrs av mikrokontrollern. Denna lösning har också konstruerats och utvärderats.
@mastersthesis{diva2:814041,
author = {Vidlid, Marija},
title = {{Konstruktion av strömförsörjningsmodul till testsystem}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0432--SE}},
year = {2015},
address = {Sweden},
}
För att styra utrustningen i olika fordon inom industrin används olika typer av strömbrytare som sitter monterade på en operatörspanel. Från dessa går det ofta individuella signalkablar fram till enheterna som ska styras. Ett alternativ för att att slippa den störa mängden kablar detta kan leda till är att istället koppla strömbrytarna till en mikrokontroller, denna vidarebefodrar sedan signalerna via en CAN-buss till en ECU som styr alla enheter från en central position i fordonet.
Under detta examensarbete, som har utförts hos Syncore Technologies AB, har därför en operatörspanelsplattform, både hård- och mjukvara, utvecklats för att uppnå en minimal NRE-kostnad för varje ny kundkonfiguration som inkluderar olika bestyckningar av strömbrytare samt beteenden för dessa.
@mastersthesis{diva2:811682,
author = {Isberg Martinsson, Linus},
title = {{Modulariserbar operatörspanel baserad på ett CAN-buss gränssnitt}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0430--SE}},
year = {2015},
address = {Sweden},
}
The focus of this thesis is primarily in electronic construction and describes the design process for a microcontroller circuit board from concept development to prototyping. The client develops test fixtures for automated testing of products within the electronics industry and needs a new controller circuit for the test fixtures that can handle controls and basic testing. An investigation into the needs of such a system is conducted and a prototype printed circuit board assembly is manufactured.
The prototype is developed with focus on protection against electrostatic discharges and overvoltage. Among the functions that are included are voltage measurements, communication interfaces and control of input and output currents. Firmware for the prototype is developed and configured to communicate with a PC through USB interface for control and collecting of measurements.
@mastersthesis{diva2:796360,
author = {Hedin, Adam},
title = {{Konstruktion av styrelektronik till testfixtur}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0426--SE}},
year = {2015},
address = {Sweden},
}
Before 1971, all the electronics were based on three basic circuit elements. Until a professor from UCBerkeley reasoned that another basic circuit element exists, which he called memristor; characterized bythe relationship between the charge and the flux-linkage. A memristor is essentially a resistor withmemory. The resistance of a memristor (memristance) depends on the amount of current that is passingthrough the device. In 2008, a research group at HP Labs succeeded to build an actual physical memristor. HP's memristorwas a nanometer scale titanium dioxide thin film, composed of two doped and undoped regions,sandwiched between two platinum contacts. After this breakthrough, a huge amount of research startedwith the aim of better realization of the device and discovering more possible applications of thememristor. In this report, it is attempted to cover a proper amount of information about the history, introduction,implementation, modeling and applications of the device. But the main focus of this study is onmemristor modeling. Four papers on modeling of the memristor were considered, and since there wereno cadence models available in the literature at the time, it was decided to develop some cadencemodels. So, cadence models from the mentioned papers were designed and simulated. From the samemodeling papers some veriloga models were written as well. Unfortunately, due to some limitation of thedesign tool, some of the models failed to provide the expected results, but still the functioning modelsshow satisfactory results that can be used in the circuit simulations of memristors.
@mastersthesis{diva2:774476,
author = {Keshmiri, Vahid},
title = {{A Study of the Memristor Models and Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX-10/4455--SE}},
year = {2014},
address = {Sweden},
}
This thesis explores the study and design of an all-digital VCO-based ADC in a 65 nm CMOS technology. As the CMOS process enters the deep submicron region, the voltage-domain-based ADCs begins to suffer in improving their performance due to the use of complex analog components. A promising solution to improve the performance of an ADC is to employ as many as possible digital components in a time-domain-based ADC, where it uses the time resolution of an analog signal rather than the voltage resolution. In comparison, as the CMOS process scales down, the time resolution of an analog signal has found superior than the voltage resolution of an analog signal. In recent years, such time-domain-based ADCs have been taken an immense interest due to its inherent features and their design reasons.
In this thesis work, the VCO-based ADC design, falls under the category of time-based ADCs which consists of a VCO and an appropriate digital processing circuitry. The employed VCO is used to convert a voltage-based signal into a time signal and thereby it also acts as a time-based quantizer. Then the resulting quantized-time signal is converted into a digital signal by an appropriate digital technique. After different architecture exploration, a conventional VCO-based ADC architecture is implemented in a high-level model to understand the characteristic behaviour of this time-based ADC and then a comprehensive functional schematic-level is designed in reference with the implemented behavioural model using cadence design environment. The performance has been verified using the mixed-levels, of transistor and behavioural-levels due to the greater simulation time of the implemented design.
ADC’s dynamic performance has been evaluated using various experiments and simulations. Overall, the simulation experiments showed that the design was found to reach an ENOB of 4.9-bit at 572 MHz speed of sample per second, when a 120 MHz analog signal is applied. The achieved peak performance of the design was a SNR of 40 dB, SFDR of 34 dB and an SNDR of 31 dB over a 120 MHz BW at a 1 V supply voltage. Without any complex building blocks, this VCO-based all-digital ADC design provided a key feature of inherent noise shaping property and also found to be well compatible at the deep submicron region.
@mastersthesis{diva2:774426,
author = {Thangamani, Manivannan and Prabaharan, Allen Arun},
title = {{The design of an all-digital VCO-based ADC in a 65nm CMOS technology}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4682--SE}},
year = {2014},
address = {Sweden},
}
Particle filters are a class of sequential Monte-Carlo methods which are used commonly when estimating various unknowns of the time-varying signals presented in real time, especially when dealing with nonlinearity and non-Gaussianity in BOT applications. This thesis work is designed to perform one such estimate involving tracking a person using the road information available from an IR surveillance video. In this thesis, a parallel custom hardware is implemented in Altera cyclone IV E FPGA device utilizing SIRF type of particle filter. This implementation has accounted how the algorithmic aspects of this sampling based filter relate to possibilities and constraints in a hardware implementation. Using 100MHz clock frequency, the synthesised hardware design can process almost 50 Mparticles/s. Thus, this implementation has resulted in tracking the target, which is defined by a 5-dimensional state variable, using the noisy measurements available from the sensor.
@mastersthesis{diva2:774383,
author = {Kota Rajasekhar, Rakesh},
title = {{Parallel Hardware for Sampling Based Nonlinear Filters in FPGAs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4821--SE}},
year = {2014},
address = {Sweden},
}
Kommunikation genom att använda människokroppen som överföringsmedium, med kapacitiv koppling mellan hud och sensor, har varit ett pågående forskningsområde för PAN (Personal Area Network) sedan Thomas Guthrie Zimmerman introducerade tekniken 1995. Anledningen till detta är att undersöka fördelar och användningsområden för en kommunikationsmetod som ej sänder ut RF-signaler och därmed minska risken för obehörig avlyssning.
Denna rapport beskriver ett examensarbete som undersöker möjligheten till eliminering av USB- till UART-konverterare på Microchip BodyCom genom mjukvaru-USB-stack och kombinera denna med Body Coupled Communication funktionalitet i en gemensam mikrokontroller. Vidare studeras om programkoden i Body Coupled Communication sändare kan modifieras för att utöka funktionaliteten.
Det var givet i förutsättningarna att mikrokontroller från Microchip skulle användas, vidare var lågt pris respektive låg strömförbrukning viktigt, särskilt för sändaren. Metoden för att uppnå detta har varit användning av Microchip BodyCom development kit tillsammans med Microchip USB low pin count development kit och Microchip USB firmwareframework.
Resultatet blev att USB- till UART-omvandlare kunde integreras med Microchip BodyCom genom att använda mjukvaru-USB-stack och en modifierad programkod för BodyCom i en gemensam mikrokontroller.
Endast fantasin sätter gränsen för vad Body Coupled Communication kan användas till. Det skulle t.ex. vara möjligt att utbyta elektroniska visitkort genom en handskakning eller öppna en låst dörr endast genom att ta i handtaget.
@mastersthesis{diva2:773132,
author = {Andersson, Isak and Karlsson, Melki},
title = {{Body Coupled Communication: Ändring av prototypkort}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--14/0418--SE}},
year = {2014},
address = {Sweden},
}
The scaling down of technologies presents new challenges in reliability, one of them being electromigration. Electromigration was not cause of concern until interconnects widths shrunk to the micrometer scale. At this point, research was focused into electromigration analysis of interconnects. International conferences on reliability have recognized electromigration as one of the biggest problems in reliability.
This thesis focuses on electromigration analysis of signal nets and was carried out in Design Methodology department at a company in Eindhoven. The purpose of this thesis work was to setup a flow for electromigration analysis using existing tools at the company. Comparison of tools and theoretical study of electromigration also forms a big part of this internship.
A summary of theoretical studies on electromigration phenomenon and their implications on design parameters is investigated in this thesis report. The approach of setting up tools, evaluation strategy and results of the evaluation are also documented in this report. Lastly a conclusion in a form of an effective design methodology and comparison of tools are presented.
This report also contains challenges encountered while setting up of tools and motivation for enabling different options for electromigration analysis. Trade-offs between simulation run time, parasitic extraction, chip area and reliability concerns are also discussed in this report.
@mastersthesis{diva2:773542,
author = {Nadgouda, Rahul},
title = {{Electromigration Analysis of Signal Nets}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4811--SE}},
year = {2014},
address = {Sweden},
}
H.264/AVC (Advance Video Coding) standard developed by ITU-T Video Coding Experts Group(VCEG) and ISO/IEC JTC1 Moving Picture Experts Group (MPEG), is one of the most powerful andcommonly used format for video compression. It is mostly used in internet streaming sources i.e.from media servers to end users.
This Master thesis aims at designing a CODEC targeting the Baseline profile on FPGA.Uncompressed raw data is fed into the encoder in units of macroblocks of 16×16 pixels. At thedecoder side the compressed bit stream is taken and the original frame is restored. Emphasis isput on the implementation of CODEC at RTL level and investigate the effect of certain parameterssuch as Quantisation Parameter (QP) on overall compression of the frame rather than investigatingmultiple solutions of a specified block of CODEC.
@mastersthesis{diva2:770563,
author = {ASLAM, UMAIR},
title = {{H.264 CODEC Blocks Implementation on FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4815--SE}},
year = {2014},
address = {Sweden},
}
The purpose of this thesis is to linearize a power amplifier using digital predistortion. A power amplifier is a nonlinear system, meaning that when fed with a pure input signal the output will be distorted. The idea behind digital predistortion is to distort the signal before feeding it to the power amplifier. The combined distortions from the predistorter and the power amplifier will then ideally cancel each other. In this thesis, two different approaches are investigated and implemented on an FPGA. The first approach uses a nonlinear model that tries to cancel out the nonlinearities of the power amplifier. The second approach is model-free and instead makes use of a look-up table that maps the input to a distorted output. Both approaches are made adaptive so that the parameters are continuously updated using adaptive algorithms. First the two approaches are simulated and tested thoroughly with different parameters and with a power amplifier model extracted from the real amplifier. The results are shown satisfactory in the simulations, giving good linearization for both the model and the model-free technique. The two techniques are then implemented on an FPGA and tested on the power amplifier. Even though the results are not as well as in the simulations, the system gets more linear for both the approaches. The results vary widely due to different circumstances such as input frequency and power. Typically, the distortions can be attenuated with around 10 dB. When comparing the two techniques with each other, the model-free method shows slightly better results.
@mastersthesis{diva2:764912,
author = {Andersson, Erik and Olsson, Christian},
title = {{Linearization of Power Amplifier using Digital Predistortion, Implementation on FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4803--SE}},
year = {2014},
address = {Sweden},
}
The Cholesky factorisation is an efficient tool that, when used correctly, significantlycan reduce the computational complexity in many applications. This thesiscontains an in-depth study of the factorisation, some of its applications andan implementation on the Coresonic SIMT DSP architecture.
@mastersthesis{diva2:753391,
author = {Winqvist, Arvid},
title = {{DSP implementation of the Cholesky factorisation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4796--SE}},
year = {2014},
address = {Sweden},
}
Advances in communication technologies continue to increase information sharing among the people.~Short-range wireless networking technologies such as Bluetooth or ZigBee, which are mainly used for data transfer over short range, will, however, suffer from network congestion, high power consumption and security issues in the future.
The body-coupled communication (BCC), a futuristic short-range wireless technology, uses the human body as a transmission medium. In BBC channel, a small electric field is induced onto the human body which enables the propagation of a signal between communication devices that are in the proximity or direct contact with the human body. The direct baseband transmission and simple architecture make BCC an attractive candidate for a future short-range wireless communication technology in particular applications such as body area network.
The main focus of this thesis is on the design and implementation of digital baseband transmitter and receiver for the body-coupled communication. The physical layer (PHY) implementation of the digital baseband transmitter and receiver is inspired from the IEEE 802.3 Ethernet transmission protocol. The digital design is implemented at RTL level using hardware description language (VHDL). The functionality of the digital baseband transmitter and receiver is demonstrated by developing data transfer application layers.
@mastersthesis{diva2:746635,
author = {Ali, Rahman},
title = {{Design of Building Blocks in Digital Baseband Transceivers for Body-Coupled Communication}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4648--SE}},
year = {2014},
address = {Sweden},
}
Power dissipation has become one of the major limiting factors in the design of digital ASICs. Low power dissipation will increase the mobility of the ASIC by reducing the system cost, size and weight. DSP blocks are a major source of power dissipation in modern ASICs. The residue number system (RNS) has, for a long time, been proposed as an alternative to the regular two's complement number system (TCS) in DSP applications to reduce the power dissipation. The basic concept of RNS is to first encode the input data into several smaller independent residues. The computational operations are then performed in parallel and the results are eventually decoded back to the original number system. Due to the inherent parallelism of the residue arithmetics, hardware implementation results in multiple smaller design units. Therefore an RNS design requires low leakage power cells and will result in a lower switching activity.
The residue number system has been analyzed by first investigating different implementations of RNS adders and multipliers (which are the basic arithmetic functions in a DSP system) and then deriving an optimal combination of these. The optimum combinations have been used to implement an FIR filter in RNS that has been compared with a TCS FIR filter.
By providing different input data and coefficients to both the RNS and TCS FIR filter an evaluation of their respective performance in terms of area, power and operating frequency have been performed. The result is promising for uniform distributed random input data with approximately 15 % reduction of average power with RNS compared to TCS. For a realistic DSP application with normally distributed input data, the power reduction is negligible for practical purposes.
@mastersthesis{diva2:743281,
author = {Classon, Viktor},
title = {{Low Power Design Using RNS}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4792--SE}},
year = {2014},
address = {Sweden},
}
This thesis work consists of constructing and validating a module designed tofacilitate automatic testing of digitizers at SP Devices. The focus of the report isthe use of simulations of transmission lines to maximise signal integrity. Signalintegrity is discussed mainly from an electromagnetic point of view and the parametersaffecting the signal integrity are presented and discussed. It is shownthat simulations using 2D field solvers work well in the cases where 2D modelsare applicable, while 3D field solvers should be used in other cases. The importanceof simulating all transmission line features is seen in the resulting measurementsas the characteristic impedance misses the mark in the cases that are notsufficiently simulated.The design of a trigger generation circuitry is presented and the resulting mismatchbetween Spice simulations and results are discussed and analysed.
@mastersthesis{diva2:740624,
author = {Johan, Berneland},
title = {{Design and Construction of Relay-Based RF-SignalSwitching Module for High Signal Integrity}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4765--SE}},
year = {2014},
address = {Sweden},
}
As more services move on to the web and more people use the cloud for storage of important information, it is important that providers of such services can guarantee that information is kept safe. The most common way of protecting that data is to make it impossible to access without being authenticated as the user owning the data. The most common way for a user to authenticate and thereby becoming authorized to access the data, or service, is by making use of a password. The one trying to safeguard that password must make sure that it is not easy to come by for someone trying to attack the system. The most common way to store a password is by first running that password through a one way function, known as a hash function, that obfuscates it into something that does not at all look related to the password itself. Whenever a user tries to authenticate, they type in their password and it goes through the same function and the results are compared. While this model makes sure that the password is not stored in plain text it contains no way of taking action in case the database of hashed passwords is leaked. Knowing that it is nearly impossible to be fully protected from malevolent users, the ones trying to safe guard information always need to try to make sure that it is difficult to extract information about users' passwords. Since the 70s the password storage has to a large extent looked the same. What is researched and implemented in this thesis is a different way of handling passwords, where the main focus is on making sure there are countermeasures in case the database leaks. The model described and implemented consist of software that make use of the current best practices, with the addition of encrypting the passwords with a symmetric cipher. This is all done in a distributed way to move towards a paradigm where a service provider does not need to rely on one point of security. The end result of this work is a working proof-of-concept software that runs in a distributed manner to derive users' passwords to an obfuscated form. The system is at least as secure as best current practice for storing users passwords but introduces the notion of countermeasures once information has found its way into an adversary's hands.
@mastersthesis{diva2:724532,
author = {Odelberg, David and Holm, Carl Rasmus},
title = {{Distributed cipher chaining for increased security in password storage}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4764--SE}},
year = {2014},
address = {Sweden},
}
The design of analog and complex mixed-signal circuits in a deep submicron CMOS process technology is a big challenge. This makes it desirable to shift data converter design towards the digital domain. The advantage of using a fully digital ADC design rather than a traditional analog ADC design is that the circuit is defined by an HDL description and automatically synthesized by tools. It offers low power consumption, low silicon area and a fully optimized gate-level circuit that reduces the design costs, etc. The functioning of an all-digital ADC is based on the time domain signal processing approach, which brings a high time resolution obtained by the use of a nanometer CMOS process. An all-digital ADC design is implemented by using a combination of the digital Voltage-Controlled Oscillator (VCO) and a Time-to-Digital Converter (TDC). The VCO converts the amplitude-domain analog signal to a phase-domain time-based signal. In addition, the VCO works as a time based quantizer. The time-based signal from the VCO output is then processed by the TDC quantizer in order to generate the digital code sequences. The fully digital VCO-based ADC has the advantage of superior time resolution. Moreover, the VCO-based ADC offers a first order noise shaping property of its quantization noise.
This thesis presents the implementation of a VCO-based ADC in STM 65 nm CMOS process technology using digital tools such as ModelSim simulator, Synopsys Design Compiler and Cadence SOC Encounter. The circuit level simulations have been done in Cadence Virtuoso ADE. A multi-phase VCO and multi-bit quantization architecture has been chosen for this 8-bit ADC. The power consumption of the ADC is approximately 630 μW at 1.0 V power supply and the figure of merit is around 410 fJ per conversion step.
@mastersthesis{diva2:731090,
author = {Pathapati, Srinivasa Rao},
title = {{All-Digital ADC Design in 65 nm CMOS Technology}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4758--SE}},
year = {2014},
address = {Sweden},
}
The complexity among embedded systems has increased dramatically in recent years. During the same time has the capacity of the hardware grown to astonishing levels. These factors have contributed to that software has taken a leading role and time-consuming role in embedded system development.Compared with regular software development, embedded development is often more restrained by factors such as hardware performance and testing capability. A solution to some of these problem has been proposed and that is a concept called virtual platforms. By emulating the hardware in a software environment, it is possible to avoid some of the problems associated with embedded software development. For example is it possible to execute a system faster than in reality and to provide a more controllable testing environment. This thesis presents a case study of an application specific virtual platform. The platform is based on already existing embedded system that is located in an industrial control system. The virtual platform is able to execute unmodified application code at a speed twice of the real system, without causing any software faults. The simulation can also be simulated at even higher speed if some accuracy losses are regarded as acceptable.The thesis presents some tools and methods that can be used to model hardware on a functional level in an software environment. The thesis also investigates the accuracy of the virtual platform by comparing it with measurements from the physical system. In this case are the measurements mainly focused of the data transactions in a controller area network bus (CAN).
@mastersthesis{diva2:724049,
author = {Sandstedt, Adam},
title = {{Implementation and analysis of a virtual platform based on an embedded system}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4774--SE}},
year = {2014},
address = {Sweden},
}
På bullriga arbetsplatser använder personal ofta hörselskydd med inbyggda högtalare för att lyssna på exempelvis musik i underhållningssyfte. Om användaren lyssnar på höga ljudnivåer under långa perioder kan bullerskador uppstå i dennes öron. Enligt lagstiftning måste nivån därför begränsas i förebyggande syfte.
Bullernivån är ett genomsnitt av de ljudnivåer användaren exponerats för under en arbetsdag. Användaren måste vila öronen om gränsvärdet för bullernivån nås.Om man utnyttjar att det är ett genomsnitt kan användaren tillåtas lyssna på en hög ljudnivå under en begränsad tid för att sedan sänka den. Det går att bevara både säkerheten och lyssningsupplevelsen om en sänkning införs långsamt.
Detta arbete beskriver hur en algoritm till en digital signalprocessor kan konstrueras för att reglera ljudnivån.Målsättningen var att algoritmen skulle skydda användarens hörsel utan att försämra lyssningsupplevelsen, och utan att förbruka mer energi än nödvändigt.
I algoritmen ingick en prediktor som predikterar mängden buller användaren riskerar att utsättas för, om denne fortsätter lyssna på samma nivå.Långsamma sänkningar av ljudnivån kan då utföras i tid innan gränsvärdet nås.
Det visade sig att algoritmen endast behövde ett fåtal samplingar per sekund för att skatta och reglera ljudnivån tillräckligt precist, vilket reducerade energiförbrukningen.Resultatet visar möjligheten att kombinera målen för säkerhet, lyssningsupplevelse och energieffektivitet i hörselskydd.
Algoritmen implementerades inte på ett skarpt system.Den hade enbart tillgång till ljudsignalen användaren ämnade lyssna på i underhållningssyfte.
@mastersthesis{diva2:716807,
author = {Axelsson, Anders},
title = {{Automatisk bullerdosreglering i hörselskydd}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4754--SE}},
year = {2014},
address = {Sweden},
}
When designing an ADC it is desirable to test its performance at two different points in the development process. The first is characterization and verification testing when a chip containing the ADC has been taped-out for the first time, and the second is production testing when the chip is manufactured in large scale. It is important to have a good correlation between the results of characterization and the results of production testing.
This thesis project investigates the feasibility of using a built-in self-test to evaluate the performance of embedded ADCs in FPGAs, by using the FPGA fabric to run necessary test algorithms. The idea is to have a common base of C code for both characterization and production testing. The code can be compiled and run on a computer for a characterization test setup, but it can also be synthesized using a high-level synthesis (HLS) tool, and written to FPGA fabric as part of a built-in self-test for production testing. By using the same code base, it is easier to get a good correlation between the results, since any difference due to algorithm implementation can be ruled out. The algorithms include a static test where differential nonlinearity (DNL), integral nonlinearity (INL), offset and gain error are calculated using a sine-wave based histogram approach. A dynamic test with an FFT algorithm, that for example calculates signal-to-noise ratio (SNR) and total harmonic distortion (THD), is also included. All algorithms are based on the IEEE Standard for Terminology and Test Meth- ods for Analog-to-Digital Converters (IEEE Std 1241). To generate a sine-wave test signal it is attempted to use a delta-sigma DAC implemented in the FPGA fabric.
Synthesizing the C code algorithms and running them on the FPGA proved successful. For the static test there was a perfect match of the results to 10 decimal places, between the algorithms running on a computer and on the FPGA, and for the dynamic test there was a match to two decimal places. Using a delta-sigma DAC to generate a test sine-wave did not prove feasible in this case. Assuming a brick-wall bandpass filter the performance of the delta-sigma DAC is estimated to an SNR of 53dB, and this signal is not pure enough to test the test case ADC with a specified SNR of 60dB.
@mastersthesis{diva2:716974,
author = {Nilsson, Petter},
title = {{Built-in self-test of analog-to-digital converters in FPGAs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4747--SE}},
year = {2014},
address = {Sweden},
}
The task of this master thesis is to develop a communication system for underwater communication with acoustic waves using simple hardware to keep cost low and time to market short. Simple hardware means trying to do most of the work in digital domain instead of analog domain, modern DSP/FPGA/microprocessors/processors contain much processing power. The communication range should be 100 meters underwater, and should be able to transmit the wanted data at least once every couple of seconds.
- 100 meters range
- Raw data rate of one-two kbit/s
- Use as little analog circuitry as possible
- Use o the shelf transducer
Using little analog circuitry and an o shelf transducer will lower the cost of the hardware, the development will also be easier and flexible.
@mastersthesis{diva2:715869,
author = {Karlsson, Erik},
title = {{Software Acoustic Modem}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4740--SE}},
year = {2014},
address = {Sweden},
}
This thesis analyzes the configuration and security requirements of an auto-mated assignment testing system. The requirements for a flexible yet powerfulconfiguration format is discussed in depth, and an appropriate configurationformat is chosen. Additionally, the overall security requirements of this systemis discussed, analyzing the different alternatives available to fulfill the require-ments.
@mastersthesis{diva2:709562,
author = {Lindgren, Jonas},
title = {{Analysis of requirements for an automated testing and grading assistance system}},
school = {Linköping University},
type = {{LIU-IDA/LITH-EX-A--13/048--SE}},
year = {2014},
address = {Sweden},
}
The constant strive for improvement of digital video capturing speeds together with power efficiency increase, has lead to tremendous research activities in the image sensor readout field during the past decade. The improvement of lithography and solid-state technologies provide the possibility of manufacturing higher resolution image sensors. A double resolution size-up, leads to a quadruple readout speed requirement, if the same capturing frame rate is to be maintained. The speed requirements of conventional serial readout techniques follow the same curve and are becoming more challenging to design, thus employing parallelism in the readout schemes appears to be inevitable for relaxing the analog readout circuits and keeping the same capturing speeds. This transfer however imposes additional demands to parallel ADC designs, mainly related to achievable accuracy, area and power.
In this work a 12-bit Cyclic ADC (CADC) aimed for column-parallel readout implementation in CMOS image sensors is presented. The aim of the conducted study is to cover multiple CADC sub-component architectures and provide an analysis onto the latter to a mid-level of depth. A few various Multiplying DAC (MDAC) structures have been re-examined and a preliminary redundant signed-digit CADC design based on a 1.5-bit modified flip-over MDAC has been conducted. Three comparator architectures have been explored and a dynamic interpolative Sub-ADC is presented. Finally, some weak spots degrading the performance of the carried-out design have been analyzed. As an architectural improvement possibility two MDAC capacitor mismatch error reduction techniques have been presented.
@mastersthesis{diva2:687644,
author = {Levski Dimitrov, Deyan},
title = {{A Cyclic Analog to Digital Converter for CMOS image sensors}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4674--SE}},
year = {2014},
address = {Sweden},
}
The aim of this work is to investigate the possibility to implement a configurable NPU (Network Processing Unit) in the next generation of Ericsson’s EMCAs (Ericsson Multi Core Architecture). The NPU is constructed so that it can be configured for either Ethernet or xIO-s, as either a transmitter or a receiver. The motive for doing the work is that many protocols have similar functions and there could be possible advantages to have a configurable protocol choice in future hardware.
A model of a NPU will be created in SystemC using the TLM 2.0 interface. The model will be analyzed to evaluate its complexity regarding a possible modification to also make it configurable for CPRI.
The result that is presented is that it would be possible to implement a configurable NPU in the future EMCAs. The result is based on the conclusion that the protocols use many similar functions and most of the blocks could be made configurable for use with different protocols. Configurable blocks would benefit a configurable NPU as it would require fewer resources than separate blocks for each protocol.
@mastersthesis{diva2:690145,
author = {Karlsson, Sara},
title = {{Micro NPU for Baseband Interconnect}},
school = {Linköping University},
type = {{LITH-ISY-EX--13/4737--SE}},
year = {2014},
address = {Sweden},
}
The objective of this Master's thesis was to design and implement a low power Analog to Digital Converter (ADC) used for sensor measurements. In the complete measurement unit, in which the ADC is part of, different sensors will be measured. One set of these sensors are three strain gauges with weak output signals which are to be pre-amplified before being converted. The focus of the application for the ADC has been these sensors as they were considered a limiting factor.
The report describes theory for the algorithmic and incremental converter as well as a hybrid converter utilizing both of the two converter structures. All converters are based on one operational amplifier and they operate in repetitive fashions to obtain power efficient designs on a small chip area although at low conversion rates.
Two converters have been designed and implemented to different degrees of completeness. One is a 13 bit algorithmic (or cyclic) converter which uses a switching scheme to reduce the problem of capacitor mismatch. This converter was implemented at transistor level and evaluated separately and to some extent also with sub-components. The second converter is a hybrid converter using both the operation of the algorithmic and incremental converter to obtain 16 bits of resolution while still having a fairly high sample rate.
@mastersthesis{diva2:688132,
author = {Lindeberg, Johan},
title = {{Design and Implementation of a Low-Power SAR-ADC with Flexible Sample-Rate and Internal Calibration}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4725--SE}},
year = {2014},
address = {Sweden},
}
In today’s world of high-speed communication, data-converters are playing a vital role. The purpose of this project is to analyze the aliasing image problem that occurs in quadrature I/Q modulators utilizing radio frequency digital-to-analog converters (RF-DACs). The RF-DAC is considered to be high-speed DAC that operates in higher GHz region. These high performance DACs are becoming the most essential part of the upcoming future communication devices like next generation radars and telecommunication systems. Some I/Q modulators are implemented in this thesis. The aim is to identify the unwanted signal that is trying to distort the desired output.
In this thesis, the work is divided into two main parts. First is the aliasing image verification and second is the implementation of the I/Q modulators. Begin with the assessment of the aliasing image through sketching the spectrum using Matlab tools. Also mathematically the calculation is derived to support the flow. In the next part, four different architectures are implemented focusing on image rejection ratio (IRR) calculation while the maximum achievable rejection ratio is 119 dB using the RF-DAC. Lastly the effect of discrete local oscillation (LO) is shown. A comparison plot is drawn, comparing the effect of a discrete-LO at different bit levels vs. IRR variation. It shows a nice picture of IRR dependence on the perfect matching and not on the signal shaping.
@mastersthesis{diva2:781574,
author = {Khan, Muhammad Awais},
title = {{A Study on the Aliasing-image Problem in I/Q Modulators Employing RF-DACs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4673--SE}},
year = {2013},
address = {Sweden},
}
In modern day communication systems, there is a constant demand for increase in transmission rates. This is however limited by the bandwidth limitation of the channel. Inter symbol interference (ISI) imposes a great threat to increasing data rates by degrading the signal quality. Equalizers are used at the receiver to compensate for the losses in the channel and thereby greatly mitigate ISI. Further, an adaptive equalizer is desired which can be used over a channel whose response is unknown or is time-varying.
A low power equalizing solution in a moderately attenuated channel is an analog peaking filter which boosts the signal high frequency components. Such conventional continuous time linear equalizers (CTLE) provide a single degree of controllability over the high frequency boost. A more complex CTLE has been designed which has two degrees of freedom by controlling the high frequency boost as well as the range of frequencies over which the boost is applied. This extra degree of controllability over the equalizer response is desired to better adapt to the varying channel response and result in an equalized signal with a wider eye opening.
A robust adaptation technique is necessary to tune the equalizer characteristics. Some of the commonly used techniques for adaptation of CTLEs are based on energy comparison criterion in the frequency domain. But the adaptation achieved using these techniques might not be optimal especially for an equalizer with two degrees of controllability. In such cases an eye opening monitor (EOM) could be used which evaluates the actual signal quality in time domain. The EOM gives an estimate on the signal quality by measuring the eye opening of the equalized signal in horizontal and vertical domain. In this thesis work a CTLE with two degrees of freedom with an EOM based adaptation system has been implemented.
@mastersthesis{diva2:719446,
author = {Narayanan, Anand},
title = {{Eye opening monitor for optimized self-adaptation of low-power equalizers in multi-gigabit serial links}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4732--SE}},
year = {2013},
address = {Sweden},
}
An all-digital phase locked loop for WiGig systems was implemented. The developedall-digital phase locked loop has a targeted frequency range of 2.1-GHz to2.5-GHz. The all-digital phase locked loop replaces the traditional charge pumpbased analog phase locked loop. The digital nature of the all-digital phase lockedloop system makes it superior to the analog counterpart.There are four main partswhich constitutes the all-digital phase locked loop. The time-to-digital converteris one of the important block in all-digital phase locked loop.
Several time-to-digital converter architectures were studied and simulated. TheVernier delay based architecture and inverter delay based architecture was designedand evaluated. There architectures provided certain short comings whilethe pseudo-differential time-to-digital converter architecture was chosen, becauseof it’s less occupation of area. Since there exists a relationship between the sizeof the delay cells and it’s time resolution, the pseudo-differential time-to-digitalconverter severed it’s purpose.
The whole time-to-digital converter system was tested on a 1 V power supply,reference frequency 54-MHz which is also the reference clock Fref , and a feedbackfrequency Fckv 2.1-GHz. The power consumption was found to be around 2.78mW without dynamic clock gating. When the clock gating or bypassing is done,the power consumption is expected to be reduced considerably. The measuredtime-to-digital converter resolution is around 7 ps to 9 ps with a load variation of15 fF. The inherent delay was also found to be 5 ps. The total output noise powerwas found to be -128 dBm.
@mastersthesis{diva2:718413,
author = {Wali, Naveen and Radhakrishnan, Balamurali},
title = {{Design of a Time-to-Digital Converter for an All-Digital Phase Locked Loop for the 2-GHz Band}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4684--SE}},
year = {2013},
address = {Sweden},
}
A low-noise, variable gain amplier chain was constructed for interfa-cing a sensor to an ADC. During the course of the work two dierent methods -switched-capacitor circuits and chopping circuits - for dealing with 1/f noise wereinvestigated during the course of the work. The resulting circuit did not quitemeet the performance required by the specication, some possible improvementsare suggested.
@mastersthesis{diva2:714179,
author = {Tallhage, Jonas},
title = {{Construction of a Low-Noise Amplifier Chain With Programmable Gain and Offset}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4700--SE}},
year = {2013},
address = {Sweden},
}
Detta examensarbetet har utförs vid Calmon Stegmotorteknik AB (CST) för att utveckla FPGA delen av derasutvecklingsplattform. Denna rapport avser att ge tankar och teori om utvecklingsmetodiken av detta arbetet. CST:s utvecklingsplattform ska användas för videobehandling, motorstyrningar samt mät och instrumentapplikationer. Dock berör detta arbetet endast de funktioner som behövs för att kunna använda utvecklingskortet för motorstyrningar. Detta innefattar implementationer av PWM, mikrostegningstyrning samt motorstyrning med hjälp av fullsteg och halvsteg.
@mastersthesis{diva2:681435,
author = {Håkansson, Svante},
title = {{Utvecklingsmetodik för styrning av stegmotorer med en FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0413--SE}},
year = {2013},
address = {Sweden},
}
I detta examensarbete har ett system utvecklats som samplar och digitaliserar en högfrekvent analog signal och lagrar de samplade värdena i ett minne. Systemet genererar en analog utsignal genom att antingen direkt omvandla de samplade värdena eller genom att använda sampels lagrade i minnet som källa för omvandlingen. Det går då att använda systemet både som en passiv länk eller som en signalkälla.
En triggfunktion har implementerats för att på ett effektivt sätt ge möjlighet att fylla minnet med för användaren intressanta delar av en signal. Arbetet går även ut på att undersöka om ett FPGA-kort av typen Stratix II DSP Development Kit är ett lämpligt utvecklingskort för att ta fram en prototyp av systemet. Kortet har undersökts med avseende på olika begränsningar för det utvecklade systemet, till exempel vilka frekvenser en insignal kan samplas i.
Ett annat användningsområdet för systemet är möjligheten att få alla sampels lagrade på kortet presenterat i en textfil på en ansluten PC. Detta för att ge möjlighet att analysera eller modifiera den lagrade signalen och därefter kunna kopiera tillbaka filens innehåll till FPGA-kortet. Härmed kan en modifierad eller egen signal användas som källa till utsignalen och helt ersätta systemets insignal.
@mastersthesis{diva2:681167,
author = {Kihlgren, Alexander},
title = {{System för avlyssning, modifiering och överföring av analoga signaler}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0411--SE}},
year = {2013},
address = {Sweden},
}
Todays modern warfare puts high demands on military equipment. Where soldiers are concerned, types of communication equipment such as radios, displays and headsets play a central role. A modern soldier is often required to maintain communication links with other military units. These units can, for example, consist of platoon commanders, headquarters and other soldiers. If the soldier needs to make a report to several units, the message needs to be sent to several radio networks that are connected to these separate units. This multiplicity in turn requires several items of radio equipment connected to the radio network frequencies. Considering all the communication equipment that is used by a modern soldier, the parallel data flow and all the weight a soldier needs to carry, can get quite extensive.
\noindentAt Saab AB it has been proven that a combination of powerful embedded hardware platforms and cross platform software fulfills the communication needs. However, the weight issue still remains as these embedded platforms are quite bulky and hard to carry. In order to increase the portability, a tailored Android application for smaller low-power embedded hardware platform has been developed at Saab AB. Saab AB has also developed a portable analogue interconnection unit for connecting three radios and a headset, the SKE (Sammankopplingsenhet).
\noindentSaab AB intends to develop a new product for soldiers, the RPCS (Rugged Portable Communication System), with capacities of running the Android application and combining the audio processing functionality of the SKE. This thesis focuses on developing a hardware platform prototype for the RPCS using Beagleboard. The SKE audio processing functionality is developed as a software application running on the Beagleboard.
@mastersthesis{diva2:668208,
author = {Kamula, Juha and Hansson, Rikard},
title = {{Rugged Portable Communication System}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4729--SE}},
year = {2013},
address = {Sweden},
}
To make a camera more user friendly or let it operate without an user the camera objective needs to be able to put thecamera lens in focus. This functionality requires a motor of some sort, due to its many benefits the ultrasonic motor is apreferred choice. The motor requires a driving circuit to produce the appropriate signals and this is what this thesis is about.Themain difficulty that needs to be considered is the fact that the ultrasonic motor is highly non-linear.This paper will give a brief walk through of how the ultrasonic motor works,its pros and cons and how to control it. How thedriving circuit is designed and what role the various components fills. The regulator is implemented in C-code and runs on amicro processor while the actual signal generation is done on a CPLD. The report ends with a few suggestions of how toimprove the system should the presented solution not perform at a satisfactory level.
@mastersthesis{diva2:664796,
author = {Ocklind, Henrik},
title = {{Driver Circuit for an Ultrasonic Motor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4659--SE}},
year = {2013},
address = {Sweden},
}
The ePUMA architecture is a novel master-multi-SIMD DSP platform aimed at low-power computing, like for embedded or hand-held devices for example. It is both a configurable and scalable platform, designed for multimedia and communications.
Numbers with both integer and fractional parts are often used in computers because many important algorithms make use of them, like signal and image processing for example. A good way of representing these types of numbers is with a floating-point representation. The ePUMA platform currently supports a fixed-point representation, so the goal of this thesis will be to implement twelve basic floating-point arithmetic operations and two conversion operations onto an already existing datapath, conforming as much as possible to the IEEE 754-2008 standard for floating-point representation. The implementation should be done at a low hardware and power consumption cost. The target frequency will be 500MHz. The implementation will be compared with dedicated DesignWare components and the implementation will also be compared with floating-point done in software in ePUMA.
This thesis presents a solution that on average increases the VPE datapath hardware cost by 15% and the power consumption increases by 15% on average. Highest clock frequency with the solution is 473MHz. The target clock frequency of 500MHz is thus not achieved but considering the lack of register retiming in the synthesis step, 500MHz can most likely be reached with this design.
@mastersthesis{diva2:666579,
author = {Kolumban, Gaspar},
title = {{Low Cost Floating-Point Extensions to a Fixed-Point SIMD Datapath}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4733--SE}},
year = {2013},
address = {Sweden},
}
Connected Me is a Human Body Communication (HBC) system, which is used fortransferring data through human body. The working principle is based on theorycalled Body Coupled Communication (BCC), which uses electrostatic couplingfor transferring data between device and human body. Capacitance between bodyand electrode acts as an electrical interface between devices. BCC has become aprominent research area in the field of Personal Area Network (PAN), introducedby Zimmerman in 1995. Until now there have been significant amount of paperspublished on human body models and Analog Front End (AFE), but only fewreports are available in digital baseband processing.
The proposed Human Body Communication (HBC) system consists ofdigital baseband and AFE. Digital baseband is used for transferring data packets.AFE is designed for reconstructing signal shape after signal degradation causedby the human body. This thesis implements high speed serial digital communicationsystem for a human body channel. Available modulation schemes andcharacteristics of the Physical layer (PHY) with respect to human body channelare analyzed before implementing the system. The outcome of this thesis is aFPGA demonstrator that shows the possibility of communication through thehuman body.
@mastersthesis{diva2:660503,
author = {Vajravelu, Dilip Kumar},
title = {{Connected Me - Proof of Concept}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4504--SE}},
year = {2013},
address = {Sweden},
}
To efficiently capture signal events when performing analog measurements, a competent toolbox is required. In this master thesis, a system for frequency domain triggering is designed and implemented. The implemented system provides advanced frequency domain trigger conditions, in order to ease the capture of a desired signal event. A real-time 1024-point pipelined feedforward FFT-core is implemented to transform the signal from the time domain to the frequency domain. The system is designed and synthesized for a Virtex-6 FPGA (XC6VLX240T) and is integrated into SP Devices’ digitizer ADQ1600. The implemented system is able to handle a continuous stream of 1.6GS/s at 16-bit. A small software API is developed that provides runtime configuration of the Triggering conditions.
@mastersthesis{diva2:656743,
author = {Eriksson, Mattias},
title = {{Design and Implementation of a Real-Time FFT-core for Frequency Domain Triggering}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4716--SE}},
year = {2013},
address = {Sweden},
}
SAR-ADCs are very popular and suitable for conversions up to few tens of MHz with 8 to 12 bits of resolution. A very popular type is the Charge Redistribution SAR-ADC which is based on a capacitive array. Higher speeds can be achieved by using the interleaving technique where a number of SAR-ADCs are working in parallel. These speeds, however, can only be achieved if the reference voltage can cope with the switching of the capacitive array.
In this thesis the design of a programmable voltage reference generator for a Charge Redistribution SAR-ADC was studied. A number of architectures were studied and one based on a Current Steering DAC was chosen because of the settling time that could offer to the Charge Redistribution SAR-ADC switching operation. This architecture was further investigated in order to spot the weak points of the design and try to minimize the settling time.
In the end, the final design was evaluated and possible trimming techniques were proposed that could further speed up the design.
@mastersthesis{diva2:654979,
author = {Mylonas, Georgios},
title = {{Programmable voltage reference generator for a SAR-ADC}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4717--SE}},
year = {2013},
address = {Sweden},
}
This thesis focused on testing whether persistent encryption of Fibre Channel is doable and what kind of security it provides. It has been shown that intercepting, analysing and modifying Fibre Channel traffic is possible without any noticeable performance loss as long as latency is kept within certain boundaries. If latency are outside those boundaries extreme performance loss are to be expected. This latency demand puts further restrictions on the cryptography to be used.
Two platforms were simulated, implemented and explained. One for intercepting and modifying Fibre Channel and one for analysing Fibre Channel traffic using Linux and Wireshark.
@mastersthesis{diva2:645316,
author = {Svensson, Christian},
title = {{High-Speed Storage Encryption over Fibre Channel}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4713--SE}},
year = {2013},
address = {Sweden},
}
The aim of this master’s thesis is to implement an ADC (Analog-to-Digital Converter) foraudio applications using external components together with an FPGA (Field-ProgrammableGate Array). The focus is on making the ADC low-cost and it is desirable to achieve 16-bitresolution at 48 KS/s. Since large FPGA’s have numerous I/O-pins, there are usually someunused pins and logic available in the FPGA that can be used for other purposes. This istaken advantage of, to make the ADC as low-cost as possible.This thesis presents two solutions: (1) a - (Sigma-Delta) converter with a first order passive loop-filter and (2) a - converter with a second order active loop-filter. The solutionshave been designed on a PCB (Printed Curcuit Board) with a Xilinx Spartan-6 FPGA. Bothsolutions take advantage of the LVDS (Low-Voltage-Differential-Signaling) input buffers inthe FPGA.(1) achieves a peak SNDR (Signal-to-noise-and-distortion-ratio) of 62.3 dB (ENOB (Effectivenumber of bits) 10.06 bits) and (2) achieves a peak SNDR of 80.3 dB (ENOB 13.04). (1) isvery low-cost ($0.06) but is not suitable for high-precision audio applications. (2) costs $0.53for mono audio and $0.71 for stereo audio and is comparable with the solution used today:an external ADC (PCM1807).
@mastersthesis{diva2:650302,
author = {Hellman, Johan},
title = {{Implementation of a Low-Cost Analog-to-Digital Converter for Audio Applications Using an FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4711--SE}},
year = {2013},
address = {Sweden},
}
In this master’s thesis a model of a digitally compensated N-bit C-xC sar adc was developed.The architecture uses charge redistribution in a C-xC capacitor network to performthe conversion. Focus in the master’s thesis was set to understand how the charge is redistributedin the network during the conversion and calibration phase. Redundancy andparasitic capacitors is present in the system and rises the need for extra conversion steps aswell as a calibration algorithm. The calibration algorithm, Bit Weight Estimation, calculatesa weight corresponding to each bit which is used in the last conversion step to perform adigital weighting. The result of extensive calculations in different C-xC capacitor networkswas a model in Python of an N-bit C-xC sar adc. That model was used to create a model ofan eight-bit C-xC sar adc and finding suitable parameters for it through calculations andsimulations. The parameters giving the best inl was chosen. With the best parameters theC-xC sar adc static and dynamic performance was tested and showed an inl of less than1lsb, snr of 47:8 dB and enob of 7:6 bits.
@mastersthesis{diva2:647634,
author = {Hallström, Claes},
title = {{Design and Implementation of a Digitally Compensated N-Bit C-xC SAR ADC Model:
Optimization of an Eight-Bit C-xC SAR ADC}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4679--SE}},
year = {2013},
address = {Sweden},
}
Vid implementering av realtidsapplikationer krävs det att man kan använda hårdvaran på ett deterministiskt vis. En realtidsapplikation ställer stora krav på körtider och hur applikationen schemaläggs. Det är därför av största vikt att kontrollera om de uppfyller dessa krav. I detta examensarbete har tre system för realtidsapplikationer jämförts och en analys av framförallt sina beräkningsförmågor och hur pass deterministiskt de uppför sig gällande körtider har gjorts. Även andra aspekter så som utvecklingsmiljöer för mjukvara, tillbehör och effektförbrukning har jämförts.
@mastersthesis{diva2:646439,
author = {Engström, Hampus and Ring, Christoffer},
title = {{Jämförelse av off-the-shelf-hårdvara för realtidsapplikationer}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0409--SE}},
year = {2013},
address = {Sweden},
}
This report describes an analysis of WebGL together with JavaScript with the aim to examine its limitations, strengths and weaknesses. This analysis was performed by building a 2D game engine containing some dynamic elements such as water, smoke, fire, light, and more. Different algorithms have been tested and analyzed to provide a clearer picture of how these work together. The report will go through the most basic functions of the game engine and describe briefly how these work.
The result shows that JavaScript with WebGL can be considered to be a potent toolsets, despite the difficulties caused by JavaScript.
In summary, similar projects can be recommended as Javascript and WebGL proved both fun and incredibly rewarding to work with.
@mastersthesis{diva2:635845,
author = {Wahlin, Yngve and Feldt, Hannes},
title = {{Implementation \& utvärdering av spelmotor i WebGL}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0406--SE}},
year = {2013},
address = {Sweden},
}
The aim of this thesis is twofold. Investigating and presenting information on how the EtherCAT fieldbus protocol performs theoretically in a smaller network and to present an implementation of the protocol on a FPGA based device and use it as a base to test and confirm that the theoretical numbers are correct in practice.
The focus is put toward a small network of up to 16 nodes which continuously produce data which must be moved to a single master node. Focus is not solely put on the network transactions but also includes the transactions performed on the producing devices to make the data available to the EtherCAT network. These devices use a licensed IP core which provide the media access.
Through calculations based on available information on how the involved parts work, the theoretical study shows that with each node producing 32 bytes worth of data, the achievable delay when starting the transaction from the master until all data is received back is below 80 μs. The throughput of useful data is up toward 90% of the 100 Mbit/s line in many of the considered cases. The network delay added in nodes is in the order of 1.5 μs. In terms of intra-node delay, it is shown that the available interfaces, which move data into the EtherCAT part of the device, are capable of handling the necessary speeds to not reduce performance overall.
An implementation of a device is presented; it is written in VHDL and implemented on a Xilinx FPGA. It is verified through simulation to perform within the expected bounds calculated in the theoretical study. An analysis of the resource usage is also presented.
@mastersthesis{diva2:632036,
author = {Svartengren, Joakim},
title = {{EtherCAT Communication on FPGA Based Sensor System}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4690--SE}},
year = {2013},
address = {Sweden},
}
FPGA-kort är ett bra verktyg för företag som snabbt vill kunna ta fram en prototyp för nya projekt, då de är omprogrammeringsbara så att samma hårdvara kan användas för att göra prototyper till mänger av olika system. Ett vanligt programmeringsspråk för att programmera FPGA-kort är VHDL som är ett hårdvarunära språk. Som ett komplement till VHDL är det väldigt användbart att kunna köra något mer generellt programspråk som till exempel C. Detta går att lösa genom att man använder en NIOS2-kärna i FPGA-kretsen och överför kompilerad C-kod till den från en persondator.
Denna rapport kommer att beskriva hur man på ett Altera DE2 FPGA-kort kan implementera olika lösningar för att använda externa gränssnitt till en NIOS2–kärna. Det vill säga hur man kan använda den hårdvara man programmerat med VHDLkod i mjukvaruprogrammen man skriver i C-kod. Fokus kommer att ligga på att jämföra olika lösningar för att visa text på extern skärm via VGA-gränssnittet. En lösning är skapad i SOPC Builder där alla komponenter är skrivna i VHDL och en lösning är skapad i QSYS där Altera University Programs färdiga IP-block används. Även en PS/2-lösning för NIOS2-kärnan kommer att förklaras.
@mastersthesis{diva2:632010,
author = {Hansson, Felix},
title = {{Jämförelse av VGA-lösningar till NIOS2-system i SOPC Builder och QSYS med Altera University Program IP-Cores}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0408--SE}},
year = {2013},
address = {Sweden},
}
Detta examensarbete har utförts vid Saab Dynamics AB(SBD) i Karlskoga med syftet att studera gamla konstruktioner av set-back-generatorer. En set-back-generator (SBG) ska ge momentan energi vid utskjutning av en projektil genom att en magnet rör sig genom en spole. SBG har funnits länge men har bedömts ge för lite energi för att kunna driva elektronik. Men nya typer av magneter har potential att öka energiutbytet väsentligt.
SBD har ett antal äldre SBG:er och arbetet har varit att utifrån dessa undersöka om det går att utvinna mer energi genom ett utbyte av magnet. Andra parametrar som är viktiga för en SBG:s funktion har också studerats och testats i olika konfigurationer i hopp om ytterligare förbättringar. Förväntade resultat har sedan analyserats och jämförts med mätresultat. Med detta som grund har rekommendationer vid en ny konstruktion av SBG levererats.
@mastersthesis{diva2:629333,
author = {Eriksson, Johan and Nilsson, Oscar},
title = {{Batterilös strömförsörjning av strömsnål granatelektronik}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0405--SE}},
year = {2013},
address = {Sweden},
}
A Control Data Flow Graph (CDFG) is a Directed Acyclic Graph (DAG) in which a node can be either an operation node or a control node. The target of this kind of graph is to capture allt he control and data flow information of the original hardware description while preserving the various dependencies.
This kind of graph is generated by Novel Generator of Accelerators and Processors (NoGap), a design automation tool for Application Specific Instruction-set Processor (ASIP) and accelerator design developed by Per Karlström from the Department of Electrical Engineering of Linköping University.
The aim of this project is to validate the graph, check if it fulfills the requirements of its definition. If it does not, it is considered an error and the running process will be aborted. Moreover, useful information will be extracted from the graph for futute work.
@mastersthesis{diva2:627943,
author = {Sánchez Yagüe, Mónica},
title = {{Information extraction and validation of CDFG in NoGap}},
school = {Linköping University},
type = {{LiTH-ISY/ERASMUS-A--13/003--SE}},
year = {2013},
address = {Sweden},
}
A DMA Controller can offload a processor tremendously. A memory copy operation can be initiated by the processor and while the processor executes others tasks the memory copy can be fulfilled by the DMA Controller.
An implementation of a DMA Controller for use in LEON3 SoC:s has been made during this master thesis. Problems that occurred while designing a controller of this type concerned AMBA buses, data transfers, alignment and interrupt handling.
The DMA Controller supports AMBA and is attached to an AHB master and APB slave. The DMA Controller supports burst transfers to maximize data bandwidth. The source and destination address can be arbitrarily aligned. It supports multiple channels and it has interrupt generation on transfer completion along with interrupt masking.
The implemented functionality works as intended.
@mastersthesis{diva2:626214,
author = {Nilsson, Emelie},
title = {{DMA Controller for LEON3 SoC:s Using AMBA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4663--SE}},
year = {2013},
address = {Sweden},
}
The main concept of Body-Coupled Communication (BCC) is to transmit the electrical information through the human body as a communication medium by means of capacitive coupling. Nowadays the current research of wireless body area network are expanding more with the new ideas and topologies for better result in respect to the low power and area, security, reliability and sensitivity since it is first introduced by the Zimmerman in 1995. In contrast with the other existing wireless communication technology such as WiFi, Bluetooth and Zigbee, the BCC is going to increase the number of applications as well as solves the problem with the cell based communication system depending upon the frequency allocation. In addition, this promising technology has been standardized by a task group named IEEE 802.15.6 addressing a reliable and feasible system for low power in-body and on-body nodes that serves a variety of medical and non medical applications.
The entire BAN project is divided into three major parts consisting of application layer, digital baseband and analog front end (AFE) transceiver. In the thesis work a strong driver circuit for BCC is implemented as an analog front end transmitter (Tx). The primary purpose of the study is to transmit a strong signal as the signal is attenuated by the body around 60 dB. The Driver circuit is cascaded of two single-stage inverter and an identical inverter with drain resistor. The entire driver circuit is designed with ST65 nm CMOS technology with 1.2 V supply operated at 10 MHz frequency, has a driving capability of 6 mA which is the basic requirement. The performance of the transmitter is compared with the other architecture by integrating different analysis such as corner analysis, noise analysis and eye diagram. The cycle to cycle jitter is 0.87% which is well below to the maximum point and the power supply rejection ratio (PSRR) is 65 dB indicates the good emission of supply noise. In addition, the transmitter does not require a filter to emit the noise because the body acts like a low pass filter.
In conclusion the findings of the thesis work is quite healthy compared to the previous work. Finally, there is some point to improve for the driver circuit in respect to the power consumption, propagation delay and leakage power in the future.
@mastersthesis{diva2:624756,
author = {Korishe, Abdulah},
title = {{A Driver Circuit for Body-Coupled Communication}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4635--SE}},
year = {2013},
address = {Sweden},
}
Recently an effective usage of the chip area plays an essential role for System-on-Chip (SOC) designs. Nowadays on-chip memories take up more than 50%of the total die-area and are responsible for more than 40% of the total energy consumption. Cache memory alone occupies 30% of the on-chip area in the latest microprocessors.
This thesis project “System Level Exploration of RRAM for SRAM Replacement” describes a Resistive Random Access Memory (RRAM) based memory organizationfor the Coarse Grained Reconfigurable Array (CGRA) processors. Thebenefit of the RRAM based memory organization, compared to the conventional Static-Random Access Memory (SRAM) based memory organization, is higher interms of energy and area requirement.
Due to the ever-growing problems faced by conventional memories with Dynamic Voltage Scaling (DVS), emerging memory technologies gained more importance. RRAM is typically seen as a possible candidate to replace Non-volatilememory (NVM) as Flash approaches its scaling limits. The replacement of SRAMin the lowest layers of the memory hierarchies in embedded systems with RRAMis very attractive research topic; RRAM technology offers reduced energy and arearequirements, but it has limitations with regards to endurance and write latency.
By reason of the technological limitations and restrictions to solve RRAM write related issues, it becomes beneficial to explore memory access schemes that tolerate the longer write times. Therefore, since RRAM write time cannot be reduced realistically speaking we have to derive instruction memory and data memory access schemes that tolerate the longer write times. We present an instruction memory access scheme to compromise with these problems.
In addition to modified instruction memory architecture, we investigate the effect of the longer write times to the data memory. Experimental results provided show that the proposed architectural modifications can reduce read energy consumption by a significant frame without any performance penalty.
@mastersthesis{diva2:623005,
author = {Dogan, Rabia},
title = {{System Level Exploration of RRAM for SRAM Replacement}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4628--SE}},
year = {2013},
address = {Sweden},
}
The revolutionary progress that happened recently in the micro-electro mechanicalsystems (MEMS) field and the complementary metal-oxide-semiconductor(CMOS) integrated circuits has made it possible to produce low-cost, low-powerand small size processing circuits. Utilizing wireless communication theory allowsthose circuits to send their data over a network. This wireless sensor network isknown as "Smart Dust".
Each wireless sensor node in the network is indicated as "mote". It consistsof several components: sensors, micro-processors, radio transceivers and a powermanagement unit. The power management unit can be divided into several partsincluding battery, power control and regulator. The purpose of the regulator is tosupply a constant reliable voltage to the other parts in the mote as most of thedevices have voltage limits that need to be considered to guarantee producing arobust long-life mote.
In this thesis designing a low-power regulator is investigated. The goal of thethesis is to design a regulator that can handle the high-voltage acquired froman energy harvest unit using only 65-nm core transistors. This allows an easierproduction process that results in a low-cost fully-integrated chip. The regulatorarchitecture to be used is a simple linear regulator.
The report highlights the theoretical background, the challenges of the analogdesign and presents the results of the simulation that were ran using cadence designsystem software on schematic level.
@mastersthesis{diva2:620715,
author = {Lababidi, Mohamed},
title = {{Designing a Low Power Regulator for Smart Dust}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4643--SE}},
year = {2013},
address = {Sweden},
}
Drought is the most severe disaster compared to other disasters in human civilization and their impacts are serious which can cause hungur, thrist, food shortages, loss of livestock directly effects the human life. The main objective of this project is to develop an early warning system (EWS) [3] for drought indices by using wireless sensor networks (WSNs) which is the only way forward for an on-site monitoring and validation of locally defined drought indices [3].The designed wireless sensor network (WSN) consisting of a sensor unit, a master unit and a sensor power management unit (PMU). The sensor unit measures the moisture of the soil and transmitt the measured data through ZigBee module to the master unit. A real time clock (RTC) is also used in the sensor unit which records the information of second, minute, hour, day, month of day and year about when or what time the measurement taken. The master unit consisting of a SD-card and Bluetooth module. SD-card is used to store measured data from other sensor units and it is possible to take out the reading of measured data from the master unit by accessing the SD-card via Bluetooth inside the master unit to a PC or a smartphone mobile.To manage the power in the sensor unit and to make sensor alive for several years, the power management unit (PMU) manages the power level between two energy storage buffers (i.e., a supercapacitor and a Li+ ion battery) for a sensor node.
@mastersthesis{diva2:613976,
author = {Ahmed, Zubair},
title = {{Design of Autonomous Low Power Sensor for Soil Moisture Measurement.}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4639--SE}},
year = {2013},
address = {Sweden},
}
Body-channel communication (BCC) is based on the principle of electrical field data transmission attributable to capacitive coupling through the human body. It is gaining importance now a day in the scenario of human centric communication because it truly offers a natural means of interaction with the human body. Traditionally, near field communication (NFC) considers as a magnetic field coupling based on radio frequency identification (RFID) technology. The RFID technology also limits the definition of NFC and thus reduces the scope of a wide range of applications. In recent years BCC, after its first origin in 1995, regain importance with its valuable application in biomedical systems. Primarily, KAIST and Philips research groups demonstrate BCC in the context of biomedical remote patient health monitoring system.
BCC transceiver mainly consists of two parts: one is digital baseband and the other is an analog front end (AFE). In this thesis, an analog front end receiver has presented to support the overall BCC. The receiver (Rx) architecture consists of cascaded preamplifier and Schmitt trigger. When the signals are coming from the human body, they are attenuated around 60 dB and gives weak signals in the range of mV. A high gain preamplifier stage needs to amplify these weak signals and make them as strong signals. The preamplifier single stage needs to cascade for the gain requirement. The single stage preamplifier, which is designed with ST65 nm technology, has an open loop gain of 24.01 dB and close loop gain of 19.43 dB. A flipped voltage follower (FVF) topology is used for designing this preamplifier to support the low supply voltage of 1 V because the topology supports low voltage, low noise and also low power consumption. The input-referred noise is 8.69 nV/sqrt(Hz) and the SNR at the input are 73.26 dB.
The Schmitt trigger (comparator with hysteresis) is a bistable positive feedback circuit. It builds around two stage OTA with lead frequency compensation. The DC gain for this OTA is 26.94 dB with 1 V supply voltage. The corner analyzes and eye diagram as a performance matrix for the overall receiver are also included in this thesis work.
@mastersthesis{diva2:610256,
author = {Maruf, Md Hasan},
title = {{An Input Amplifier for Body-Channel Communication}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4634--SE}},
year = {2013},
address = {Sweden},
}
In this work the usage of a WLC (Wafer Level Camera) for ensuring road safety has been presented. A prototype of a WLC along with the Aptina MT9M114 stereoboard has been used for this project. The basic idea is to observe the movements of the driver. By doing so an understanding of whether the driver is concentrating on the road can be achieved.
For this project the display of the required scene is captured with a wafer-level camera pair. Using the image pairs stereo processing is performed to obtain the real depth of the objects in the scene. Image recognition is used to separate the object from the background. This ultimately leads to just concentrating on the object which in the present context is the driver.
@mastersthesis{diva2:610016,
author = {Pakalapati, Himani Raj},
title = {{Programming of Microcontroller and/or FPGA for Wafer-Level Applications - Display Control, Simple Stereo Processing, Simple Image Recognition}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4656--SE}},
year = {2013},
address = {Sweden},
}
This Bachelor thesis examines the possibility of replacing an outdated, analog video recording system to a digital counterpart. It is key that the video and audio signals remain synchronized, generator locked and time stamped. It is up to nine different video sources and a number of audio sources to be recorded and treated in such a manner which enables synchronized playback. The different video sources do not always follow a universal standard, and differ from format as well as resolution. This thesis aims to compare a number of state of the art commercial of the shelf solutions with proprietary hardware. Great emphasis is placed on giving a functional view over the system features and to evaluate different compression methods. The report also discusses different transmission, storage and playback options. The report culminates in a series of proposed solutions to sub problems which are solved and treated separately, leading to a final proposal from the author. The final draft set how well the system meets pre-set requirements to price.
@mastersthesis{diva2:609840,
author = {Eliasson, Viktor},
title = {{Digital videoregistrering}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0402}},
year = {2013},
address = {Sweden},
}
The aim of this thesis is to implement flexible interpolators and decimators onField Programmable Gate Array (FPGA). Interpolators and decimators of differentwordlengths (WL) are implemented in VHDL. The Farrow structure is usedfor the realization of the polyphase components of the interpolation/decimationfilters. A fixed set of subfilters and adjustable fractional-delay multiplier valuesof the Farrow structure give different linear-phase finite-length impulse response(FIR) lowpass filters. An FIR filter is designed in such a way that it can be implementedfor different wordlengths (8-bit, 12-bit, 16-bit). Fixed-point representationis used for representing the fractional-delay multiplier values in the Farrow structure. To perform the fixed-point operations in VHDL, a package called fixed pointpackage [1] is used.
A 8-bit, 12-bit, and 16-bit interpolator are implemented and their performancesare verified. The designs are compiled in Quartus-II CAD tool for timing analysisand for logical registers usage. The designs are synthesised by selecting Cyclone IVGX family and EP4X30CF23C6 device. The wordlength issues while implementingthe interpolators and decimators are discussed. Truncation of bits is required inorder to reduce the output wordlength of the interpolator and decimator.
@mastersthesis{diva2:609369,
author = {VenkataVikram, Dabbugottu},
title = {{FPGA Implementation of Flexible Interpolators and Decimators}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4654--SE}},
year = {2013},
address = {Sweden},
}
Detta examensarbete har utförts vid Tekniska högskolan vid Linköpings universitet, i samarbete med Acreo Swedish ICT AB. Acreo är ett forskningsinstitut som sysslar med forskning inom bl.a. optik, elektronik och informationsteknologi. Examensarbetet har gått ut på att designa konstruera elektronik som ska kunna läsa av en passiv fuktsensoretikett, som är konstruerad med tryckt elektronik. Dessa etiketter ska kunna placeras inuti väggar på husbyggen, för att på ett enkelt sätt kontrollera fukthalten och på så vis kunna förebygga kostsamma fuktskador. Arbetet har bestått av komponenttester, kretsschemaritande, kretskortsdesign och programmering av mikrodatorer. Allt detta har i slutändan lett fram till en enkel prototyp som kan läsa av fuktsensoretiketterna.
@mastersthesis{diva2:607648,
author = {Landelius, Jacob and Nyberg, Andreas},
title = {{Konstruktion av utläsningselektronik och mjukvara för trådlös avläsning av tryckta fuktsensoretiketter}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--13/0403--SE}},
year = {2013},
address = {Sweden},
}
Implementation of the polyphase decomposed FIR filter structure involves two steps; the generation of the partial products and the efficient reduction of the generated partial products. The partial products are generated by a constant multiplication of the filter coefficients with the input data and the reduction of the partial products is done by building a pipelined adder tree using FAs and HAs. To improve the speed and to reduce the complexity of the reduction tree a4:2 counter is introduced into the reduction tree. The reduction tree is designed using a bit-level optimized ILP problem which has the objective function to minimize the overall cost of the hardware used. For this purpose the layout design for a 4:2 counter has been developed and the cost function has been derived by comparing the complexity of the design against a standard FA design.
The layout design for a 4:2 counter is implemented in a 65nm process using static CMOS logic style and DPL style. The average power consumption drawn from a 1V power supply, for the static CMOS design was found to be 16.8μWand for the DPL style it was 12.51μW. The worst case rise or fall time for the DPL logic was 350ps and for the static CMOS logic design it was found to be 260ps.
The usage of the 4:2 counter in the reduction tree infused errors into the filter response, but it helped to reduce the number of pipeline stages and also to improve the speed of the partial product reduction.
@mastersthesis{diva2:607793,
author = {Satheesh Varma, Nikhil},
title = {{Design and implementation of an approximate full adder and its use in FIR filters}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4565--SE}},
year = {2013},
address = {Sweden},
}
The work was based on digitally controlled oscillator for an all-digital PLL in 65nm process. Phase locked loop’s were used in most of the application for clock generation and recovery as well. As the technology grows faster in the existinggeneration, there has to be quick development with the technique. In such case ananalog PLL which was used earlier gradually getting converted to digital circuit.All-digital PLL blocks does the same work as an analog PLL blocks, but thecircuits and other control circuitry designed were completely in digital form, becausedigital circuit has many advantages over analog counterpart when they arecompared with each other. Digital circuit could be scaled down or scaled up evenafter the circuits were designed. It could be designed for low power supply voltageand easy to construct in a 65 nm process. The digital circuit was widely chosento make life easier.
In most of the application PLL’s were used for clock and data recovery purpose,from that perspective jitter will stand as a huge problem for the designers. Themain aim of this thesis was to design a DCO that should bring down the jitter asdown as possible which was designed as standalone, the designed DCO would belater placed in an all-digital PLL. To understand the concept and problem aboutjitter at the early stage of the project, an analog PLL was designed in block leveland tested for different types of jitter and then design of a DCO was started.
This document was about the design of a digitally controlled oscillator whichoperates with the center frequency of 2.145 GHz. In the first stage of the projectthe LC tank with NMOS structure was built and tested. In the latter stage the LCtank was optimized by using PMOS structure as negative resistance and eventuallyended up with NMOS and PMOS cross coupled structure. Tuning banks were oneof the main design in this project which plays a key role in locking the system ifthe DCO is placed in an all-digital PLL system. So, three types of tuning bankswere introduced to make the system lock more precisely. The control circuits andthe varactors built were all digital and hence it is called as digitally controlledoscillator. Digital control circuits, other sub-blocks like differential to single endedand simple buffers were also designed to optimize the signal and the results wereshown.DCO and tuning banks were tested using different types of simulation and were tested for different jitter qualities and analysis. The simulation results are shownin the final chapter simulation and results.
@mastersthesis{diva2:607147,
author = {Balasubramanian, Manikandan and Vijayanathan, Saravana Prabhu},
title = {{Design of a DCO for an All Digital PLL for the 60 GHz Band:
Design of a DCO for an All Digital PLL for the 60 GHz Band}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4563--SE}},
year = {2013},
address = {Sweden},
}
In this thesis, we have explained the different types of DAC (Digital-to-Analog) architectures and their advantages and disadvantages. We have mainly focused on current-steering digital-to-analog design for achieving high speed and high performance. The current-steering DAC is designed using binary weighted architecture. The benefits of this architecture is that it occupies less area, consumes less power and the number of control signals required are very less.
The requirements for high speed and high performance DAC are discussed in detail. The circuit is implemented in a state-of-the-art 65 nm process, with a supply voltage of 1.2 V and at a sampling speed of 2 GHz. The resolution of the DAC is 8-bits. The design of 8-bit current-steering DAC converts 8 most significant bits (MSBs) into their binary weighted equivalent, which controls 256 unit current sources.
The performance of the DAC is measured using the static and dynamic parameters. In communication applications the static performance measures such as INL and DNL are not of utmost importance. In this work, we have mainly concentrated on the dynamic performance characteristics like SNR (Signal to Noise Ratio) and SFDR (Spurious Free Dynamic Range). For measuring the dynamic parameters, frequency domain analysis is a better choice.
Also, we have discussed how the pole-zero analysis can be used to arrive at the dynamic performance metrics of a unit element of the DAC at higher frequencies. Different methods were discussed here to show the effects of poles and zeroes on the output impedance of a unit element at higher frequencies, for example, by hand calculation, using Mathematica and by using cadence.
After extensive literature studies, we have implemented a technique in cadence, to increase the output impedance at higher frequencies. This technique is called as “complimentary current solution technique”. This technique will improve the output impedance and SFDR compared to the normal unit element design.
Our technique contains mostly analog building blocks, like, current mirrors, biasing scheme and switching scheme and few digital blocks like D-ff (D-flip flop). The whole system is simulated and verified in MATLAB. Dynamic performances of the DAC such as SNR and SFDR are found with the help of MATLAB.
@mastersthesis{diva2:589289,
author = {Sadda, AlajaKumari and Madavaneri, Niraja},
title = {{A Study of Output Impedance Effects in Current-Steering Digital-to-Analog Converters}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4576--SE}},
year = {2013},
address = {Sweden},
}
A phase-locked loop commonly known as PLL is widely used in communication systems. A PLL is used in radio, telecommunications, modulation and demodulation. It can be used for clock generation, clock recovery from data signals, clock distribution and as a frequency synthesizer.
Most electronic circuits encounter the problem of the clock skew. The clock Skew for a synchronous circuit is defined as the difference in the time of arrival between two sequentially adjacent registers. The registers and the flip-flops do not receive the clock at the same time. The clock signal in a normal circuit is generated with an oscillator, oscillator produces error, due to which there is a distortion from the expected time interval. The PLLs are used to address the problem. A phase-locked loop works to ensure the time interval seen at the clocks of various registers and the flip-flops match the time intervals generated by the oscillator. PLLs are trivial and an essential part of the micro-processors. Traditional PLLs are designed to work as an analog building block, but it is difficult to integrate them on a digital chip. Analog PLLs are less affected by noise and process variations. Digital PLLs allow faster lock time and are used for clock generation in high performance microprocessors. A digital PLL has more advantages as compared to an analog PLL. Digital PLLs are more flexible in terms of calibration, programability, stability and they are more immune to noise. The cost of a digital PLL is less as compared to its analog counter part.
Digital PLLs are analogous to the analog PLLs, but the components used for implementing a digital PLL are digital. A digitally controlled oscillator (DCO) is utilized instead of a voltage controlled oscillator. A time to digital converter(TDC) is used instead of the phase frequency detector. The analog filter is replaced with a digital low pass filter. Phase-locked loop is a very good research topic in electronics. It covers many topics in the electrical systems such as communication theory, control systems and noise characterization.
This project work describes the design and simulation of miscellaneous blocks of an all-digital PLL for the 60 GHz band. The reference frequency is 54 MHz and the DCO output frequency is 2 GHz to 3 GHz in a state-of the-art 65 nm process, with 1 V supply voltage. An all-digital PLL is composed of digital components such as a low pass filter, a sigma delta modulator and a fractional N /N +1 divider for low voltage and high speed operation. The all-digital PLL is implemented in MATLAB and then the filter, a sigma delta modulator and a fractional N /N +1 divider are implemented in MATLAB and Verilog-A code. The sub blocks i.e full adder, D flip-flop, a digital to digital converter, a main counter, a prescalar and a swallow counter are implemented in the transistor level using CMOS 65nm technology and functionality of each block is verified.
@mastersthesis{diva2:586012,
author = {Butt, Hadiyah and Padala, Manjularani},
title = {{Design and Simulation of Miscellaneous Blocks of an All-Digital PLL for the 60 GHz Band}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4578--SE}},
year = {2013},
address = {Sweden},
}
Nowadays, hardware designers want to get a powerful and friendly tool to speedup the design flow and design quality. The new development suit NoGap is pro-posed to meet those requirements. NoGap is a design automation tool for ASIP,it helps users to focus on the design stage, free them from module connection andsignal assignment, or integration. Different from the normal ADL tools which limitusers’ design ideas to some template frameworks, NoGap allow designers to im-plement what they want with NoGap Common Language (NoGapCL). However,NoGap is still not perfect, some important functionalities are lacking, but withthe flexible generator component structure, NoGap and NoGapCL can easily beextended.This thesis will firstly investigate the structure of Novel Generator of Acceler-ators and Processors (NoGap) from software prospective view, and then present anew NoGap generator, OpCode Assignment Generator (OpAssignGen), which al-lows users to assign operation code values, exclude operation codes and customizethe operation code size or instruction size.A simple example based on the Microprocessor without Interlocked PipelineStages (MIPS) instructions sets will be mentioned to give users a brief view ofhow to use OpAssignGen. After that, the implementation of the new generatorwill be explained in detail.What’s more, some of NoGap’s flaws will be exposed, but more suggestionsand improvements for NoGap will be given.At last, a successful synthesis result based on the simple MIPS hardware im-plementation will be shown to prove the new generator is well implemented. Moreresults and the final conclusion will be given at the end of the thesis.
@mastersthesis{diva2:788082,
author = {Yaochuan, Chen},
title = {{Binary Instruction Format Specification for NoGap}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4609--SE}},
year = {2012},
address = {Sweden},
}
In this thesis the new Speedster HP FPGA from Achronix is analyzed. It makes use of a new type of interconnection technology called picoPIPE™. By using this new technology, Achronix claims that the FPGA can run at clock frequencies up to 1.5 GHz. Furthermore, they claim that circuits designed for other FPGAs should work on the Speedster HP after some adjustments. The purpose of this thesis is to study this new FPGA and test the claims that Achronix make about it. This analysis is carried out in four steps. First an analysis of how the new interconnection technology works is given. Based on this analysis, a number of small test circuits are designed with the purpose of testing specific aspects of the new FPGA. To analyze circuit reusability an image filter designed by Synective Labs AB for a different FPGA architecture is adapted and evaluated on the Speedster HP. Lastly, an encryption circuit is designed from scratch. This is done in order to test what can be achieved on the Speedster HP when the designer is given full freedom.
@mastersthesis{diva2:619073,
author = {Peters, Christoffer},
title = {{Evaluation of the Achronix picoPIPE\textsuperscript{\texttrademark} Architecture in High Performance Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4645--SE}},
year = {2012},
address = {Sweden},
}
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP) applications, and its implementation involves a large number of computations. Many DSP designers have been working on implementations of the FFT algorithms on different devices, such as central processing unit (CPU), Field programmable gate array (FPGA), and graphical processing unit (GPU), in order to accelerate the performance.
We selected the GPU device for the implementations of the FFT algorithm because the hardware of GPU is designed with highly parallel structure. It consists of many hundreds of small parallel processing units. The programming of such a parallel device, can be done by a parallel programming language CUDA (Compute Unified Device Architecture).
In this thesis, we propose different implementations of the FFT algorithm on the NVIDIA GPU using CUDA programming language. We study and analyze the different approaches, and use different techniques to accelerate the computations of the FFT. We also discuss the results and compare different approaches and techniques. Finally, we compare our best cases of results with the CUFFT library, which is a specific library to compute the FFT on NVIDIA GPUs.
@mastersthesis{diva2:617254,
author = {Sreehari, Ambuluri},
title = {{Implementations of the FFT algorithm on GPU}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4649--SE}},
year = {2012},
address = {Sweden},
}
Nowdays, the transmission of digital TV-signals tends to move towards more untraditional medias, such as TCP/IP networks.
This thesis focused on the problems involved in receiving MPEG transport streams of variable bitrate from a TCP/IP connection, such as jitter and clock synchronization. A suggestion for recovering the transport stream is presented along with a implementation for an Xilinx FPGA targeted for a head end device. The implementation was written in a mix of VHDL and Verilog.
@mastersthesis{diva2:613627,
author = {Liss, Jonathan},
title = {{Implementation of a VBR MPEG-stream receiver in an FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4625--SE}},
year = {2012},
address = {Sweden},
}
Floating point numbers are used in many applications that would be well suited to a higher parallelism than that offered in a CPU. In these cases, an FPGA, with its ability to handle multiple calculations simultaneously, could be the solution. Unfortunately, floating point operations which are implemented in an FPGA is often resource intensive, which means that many developers avoid floating point solutions in FPGAs or using FPGAs for floating point applications.
Here the potential to get less expensive floating point operations by using ahigher radix for the floating point numbers and using and expand the existingDSP block in the FPGA is investigated. One of the goals is that the FPGAshould be usable for both the users that have floating point in their designsand those who do not. In order to motivate hard floating point blocks in theFPGA, these must not consume too much of the limited resources.
This work shows that the floating point addition will become smaller withthe use of the higher radix, while the multiplication becomes smaller by usingthe hardware of the DSP block. When both operations are examined at the sametime, it turns out that it is possible to get a reduced area, compared toseparate floating point units, by utilizing both the DSP block and higherradix for the floating point numbers.
@mastersthesis{diva2:579034,
author = {Englund, Madeleine},
title = {{Hybrid Floating-point Units in FPGAs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4642--SE}},
year = {2012},
address = {Sweden},
}
The goal of this thesis was to implement a DVB-T receiver on Coresonic’s DSP-processor and attempt to evaluate how to design a receiver that is robust against very strong echoes with a long delay. Long delayed echoes is very common in Single Frequency Networks (SFN) which is why focus was put on finding algorithms that work well in SFN.The thesis involved analyzing different algorithms involved in making a DVB-T receiver where the focus was to find a good channel estimation algorithm. The thesis also included programming the DSP-processor and making some smaller modifications to their hardware solution to integrate their error correction hardware. After finding relevant articles with promising algorithms a small transmitter, channel and receiver was modeled in Matlab in order to try the different algorithms. After testing the different algorithms some of the simpler ones were first implemented to quickly get a working receiver. The implementation was however time consuming and all of the most appropriate algorithms to better avert the effects of long and strong echoes where not implemented. This means some algorithms where only analyzed and discussed.The receiver performance is tested and simulated in Coresonic’s DSP simulator. The receiver does not fully meet the requirements set by NorDig when it comes to handling long delay spread echoes with a magnitude of 0db when tested in the DSP processor simulator. The receiver is however able to handle the Ricean channel at a SNR of 19 Db and Rayleigh channel at an SNR of 24 Db.This report is the result of the final thesis of a Master of Science in Computer Engineering at Linköpings Tekniska Högskola. The thesis was performed at Coresonic AB in Mjärdevi Linköping.
@mastersthesis{diva2:575252,
author = {Hägglund, Erik},
title = {{Design of a DVB-T Receiver:
For SFN on a DSP-Processor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4640--SE}},
year = {2012},
address = {Sweden},
}
Reconfigurable devices are the mainstream in today’s system on chip solutions. Reconfigurable devices have the advantages of reduced cost over their equivalent custom design, quick time to market and the ability to reconfigure the design at will and ease. One such reconfigurable device is an FPGA. In this industrial thesis, the design and implementation of a control process interface using ECP2M FPGA and PCIe communication is accomplished. This control process interface is designed and implemented for a 3-D plotter system called LSC11. In this thesis, the FPGA unit implemented drives the plotter device based on specific timing requirements charted by the customer. The FPGA unit is interfaced to a Host CPU in this thesis (through PCIe communication) for controlling the LSC11 system using a custom software. All the peripherals required for the LSC11 system such as the ADC, DAC, Quadrature decoder and the PWM unit are also implemented as part of this thesis. This thesis also implements an efficient methodology to send all the inputs of the LSC11 system to the Host CPU without the necessity for issuing any cyclic read commands on the Host CPU. The RTL design is synthesised in FPGA and the system is verified for correctness and accuracy. The LSC11 system design consumed 79% of the total FPGA resources and the maximum clock frequency achieved was 130 Mhz. This thesis has been carried out at Abaxor Engineering GmbH, Germany. It is demonstrated in this thesis how FPGA aids in quick designing and implementation of system on chip solutions with PCIe communication.
@mastersthesis{diva2:572608,
author = {Murali Baskar Rao, Parthasarathy},
title = {{Implementation of an industrial process control interface for the LSC11 system using Lattice ECP2M FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4637--SE}},
year = {2012},
address = {Sweden},
}
In a conventional charge-pump based PLL design, the loop parameters such as the band-width, jitter performance, charge-pump current, pull-in range among others govern the ar-chitecture and implementation details of the PLL. Different loop parameter specificationschange with a change in the reference frequency and inmost cases, requires careful re-designof some of the PLL blocks. This thesis describes the implementation of a semi-digital PLLfor high bandwidth applications, which is self-compensated, low-power and exhibits band-width tracking for all reference frequencies between 40 MHz and 1.6 GHz in 65nm CMOStechnology.This design can be used for a wide range of reference frequencies without redesigning anyblock. The bandwidth can be fixed to some fraction of the reference frequency during designtime. In this thesis, the PLL is designed to make the bandwidth track 5% of the referencefrequency. Since this PLL is self-compensated, the PLL performance and the bandwidthremains same over PVT corners.
@mastersthesis{diva2:565415,
author = {Yogesh, Mitesh},
title = {{A Self-compensated, Bandwidth Tracking Semi-digital PLL Design in 65nm CMOS Technol-ogy}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4597--SE}},
year = {2012},
address = {Sweden},
}
The resistor ladder (R/2R) digital-to-analogue converter (DAC) architecture is often used in high performance audio solutions due to its low-noise performance. Even high-end R/2R DACs suffer from static nonlinearity distortions. It was suspected that compensating for these nonlinearities would be possible. It was also suspected that this could improve audio quality in audio systems using R/2R DACs for digital-to-analogue (A/D) conversion.
Through the use of models of the resistor ladder architecture a way of characterizing and measuring the faults in the R/2R DAC was created. A compensation algorithm was developed in order to compensate for the nonlinearities. The performance of the algorithm was simulated and an implementation of it was evaluated using an audio evaluation instrument.
The results presented show that it is possible to increase linearity in R/2R DACs by compensating for static nonlinearity distortions. The increase in linearity can be quite significant and audible for the trained ear.
@mastersthesis{diva2:559515,
author = {Kulig, Gabriel and Wallin, Gustav},
title = {{R/2R DAC Nonlinearity Compensation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4616--SE}},
year = {2012},
address = {Sweden},
}
At Saab Dynamics AB there are a number of projects where cameras are an important part of a sensor system. Examples of such projects are monitoring for civil security and 3D mapping, where several cameras are used. The cameras can for example be located in airplanes, helicopters or cars and therefore it is important to have a robust function for recording data. One way to achieve a quick recording with sufficient storage size is to use SATA flash disks. To reduce the size and power consumption of the recording equipment and to enable project-specific adaptations it is desirable to use an FPGA as an interface to SATA devices. This thesis concerns the development of such an interface implemented on an FPGA. The theory behind the SATA interconnect standard is described along with the design work and its challenges.
@mastersthesis{diva2:549840,
author = {Gonzalez, Maya},
title = {{Design and Implementation of a SATA Host Controller on a Spartan-6 FPGA}},
school = {Linköping University},
type = {{LITH-ISY-EX--12/4615--SE}},
year = {2012},
address = {Sweden},
}
Detta examensarbete presenterar metoder att göra refererande mätningar av elektromagnetisk kompabilitet (EMC) i samband med ethernetsignalering och en utvärdering av dessa metoder. Rapporten hänvisar till vilka standarder som gäller för mätning av emission och immunitet i EMClab samt hur dylika mätningar går till. Teori bakom differentiell signalering och ethernet redogörs i korthet.
Rapporten introducerar läsaren till Motorola Mobility, deras set top-box VIP1853 och problematik aktuell för denna box. Undersökningar av VIP1853 presenteras och en diskussion förs kring mätteknisk problematik i samband med dessa undersökningar. De mätmetoder som testats och deras för- och nackdelar beskrivs. Praktiska försök görs med avsikt att förbättra prestandan hos VIP1853. De antaganden som lett till dessa specifika tester gås igenom och resultaten av testerna presenteras.
Avslutningsvis redovisas en analys av skillnaden mellan skarp EMC-mätning i ett dedikerat EMClab kontra mätningar i Motorolas egna labmiljö.
@mastersthesis{diva2:549544,
author = {Wennberg, David},
title = {{En studie i EMC-aspekter vid ethernetsignalering}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0391--SE}},
year = {2012},
address = {Sweden},
}
With the advances in wireless communication technology over last two decades, the use of fractional-N frequency synthesizers has increased widely in modern wireless communication applications due to their high frequency resolution and fast settling time.
The performance of a fractional-N frequency synthesizer is degraded due to the presence of unwanted spurious tones (spurs) in the output spectrum. The Digital Delta-Sigma Modulator can be directly responsible for the generation of spur because of its inherent nonlinearity and periodicity. Many deterministic and stochastic techniques associated with the architecture of the DDSM have been developed to remove the principal causes responsible for production of spurs. The nonlinearities in a frequency synthesizer are another source for the generation of spurs. In this thesis we have predicted that specific nonlinearities in a fractional-N frequency synthesizer produce spurs at well-defined frequencies even if the output of the DDSM is spur-free. Different spur free DDSM architectures have been investigated for the analysis of spurious tones in the output spectrum of fractional-N frequencysynthesizers.
The thesis presents simulation and experimental investigation of mechanisms for spur generation in a fractional-N frequency synthesizer. Simulations are carried out using the CppSim system simulator, MATLAB and Simulink while the experiments are performed on an Analog Devices ADF7021, a high performance narrow-band transceiver IC.
@mastersthesis{diva2:548943,
author = {Imran Saeed, Sohail},
title = {{Investigation of Mechanisms for Spur Generation in Fractional-N Frequency Synthesizers}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4613--SE}},
year = {2012},
address = {Sweden},
}
This thesis presents a model of reconfigurable sigma-delta modulator. These modulators areintended for high speed digital Digital to Analog Converters. The modulators are intendedto reduce complexity of current steering DACs and also considered as a front end of dataconverters. Quantization noise present in digital signal is pushed to higher frequencies bysigma-delta modulators. Noise in high band frequencies can be removed by a low pass filter.
A test methodology involving generation of baseband signal, interpolation and digitizationis opted. Topologies tested in MATLAB® include signal feedback and error feedback modelsof first-order and second-order sigma-delta modulators. Error feedback and signal feedbackfirst-order modulators’ performance is quite similar. The SNR of a first-order error feedbackmodel is 52.3 dB and 55.9 dB for 1 and 2 quantization bits, respectively. In second-orderSDM, signal feedback provides best performance with 80 dB SNR.
The other part of the thesis focuses on the implementation of the sigma-delta modulator(SDM) using faster time to market approach. SoC Encounter, a tool from Cadence, is theeasiest way to do this job. The modulators are implemented in 65-nm technology. The reconfigurablesigma-delta modulator is designed using Verilog-HDL language. Switches areintroduced to control the reconfigurable SDM for different input word lengths. Word-lengthcan vary from 0 to 4 bits. Modulator is designed to work for frequencies of 2 GHz. To netlistthe design, Design Compiler is used which is a tool from Synopsys®.
The area of the chip reported by design compiler is 563.68 um. When the design is implementedin SoC Encounter, area of the chip is increased, because the core utilization, whiledesigning, is only 60%, which is 556.8 um. Remaining 40% area is used by buffers, inverterand filler cells during clock tree synthesis. The buffers and inverters are added to removethe clock phase delay between different registers. Power consumption of the chip is 319mW.Internal power of the modulators is 219.1 mW. Switching power of output capacitances is99.9 mW, which is 31% of the total power consumed. Main concern of the power loss isconsidered to be power leakage. To reduce the leakage power and achieve high speed designCORE65GPHVT libraries are used. Leakage power of the design is 2.825 uW which is0.00088% of the total power.
@mastersthesis{diva2:548758,
author = {Ali Shah, Syed Asmat and Qazi, Sohaib Ayaz},
title = {{Design of an all-digital, reconfigurable sigma-deltamodulator}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4557--SE}},
year = {2012},
address = {Sweden},
}
Analog-to-digital converters are inevitable in the modern communication systems and there is always a need for the design of low-power converters. There are different A/D architectures to achieve medium resolution at medium speeds and among all those Cyclic/Algorithmic structure stands out due to its low hardware complexity and less die area costs. This thesis aims at discussing the ongoing trend in Cyclic/Algorithmic ADCs and their functionality. Some design techniques are studied on how to implement low power high resolution A/D converters. Also, non-ideal effects of SC implementation for Cyclic A/D converters are explored. Two kinds of Cyclic A/D architectures are compared. One is the conventional Cyclic ADC with RSD technique and the other is Cyclic ADC with Correlated Level Shift (CLS) technique. This ADC is a part of IMST Design + Systems International GmbH project work and was designed and simulated at IMST GmbH.
This thesis presents the design of a 12-bit, 1 Msps, Cyclic/Algorithmic Analog-to-Digital Converter (ADC) using the “Redundant Signed Digit (RSD)” algorithm or 1.5-bit/stage architecture with switched-capacitor (SC) implementation. The design was carried out in 130nm CMOS process with a 1.5 V power supply. This ADC dissipates a power of 1.6 mW when run at full speed and works for full-scale input dynamic range. The op-amp used in the Cyclic ADC is a two-stage folded cascode structure with Class A output stage. This op-amp in typical corner dissipates 631 uW power at 1.5 V power supply and achieves a gain of 77 dB with a phase margin of 64° and a GBW of 54 MHz at 2 pF load.
@mastersthesis{diva2:545838,
author = {Puppala, Ajith kumar},
title = {{Design of a Low Power Cyclic/Algorithmic Analog-to-Digital Converter in a 130nm CMOS Process}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4456--SE}},
year = {2012},
address = {Sweden},
}
The majority of signals, that need to be processed, are analog, which are continuous and can take an infinite number of values at any time instant. Precision of the analog signals are limited due to influence of distortion which leads to the use of digital signals for better performance and cost. Analog to Digital Converter (ADC), converts the continuous time signal to the discrete time signal. Most A/D converters are classified into two categories according to their sampling technique: nyquist rate ADC and oversampled ADC. The nyquist rate ADC operates at the sample frequency equal to twice the base-band frequency, whereas the oversampled ADC operates at the sample frequency greater than the nyquist frequency.
The sigma delta ADC using the oversampling technique provides high resolution, low to medium speed, relaxed anti-aliasing requirements and various options for reconfiguration. On the contrary, resolution of the sigma delta ADC can be traded for high speed operation. Data sampling techniques plays a vital role in the sigma delta modulator and can be classified into discrete time sampling and continuous time sampling. Furthermore, the discrete time sampling technique can be implemented using the switched-capacitor (SC) integrator and the switched-current (SI) integrator circuits. The SC integrator technique provides high accuracy but occupies a larger area. Unlike the SC integrator, the SI integrator offers low input impedance and parasitic capacitance. This makes the SI integrator suitable for low supply voltage and high frequency applications.
From a detailed literature study on the multi-bit sigma delta modulator, it is analyzed that, theneeds a highly linear digital to analogue converter (DAC) in its feedback path. The sigma delta modulators are very sensitive to linearity of the DAC which can degrade the performance without any attenuation. For this purpose T.C. Leslie and B. Singh proposed a Hybrid architecture using the multi-bit quantizer with a single bit DAC. The most significant bit is fed back to the DAC while the least significant bits are omitted. This omission requires a complex digital calibration to complete the analog to digital conversion process which is a small price to pay compared to the linearity requirements of the DAC.
This project work describes the design of High-Speed Hybrid Current modeModulator with a single bit feedback DAC at the speed of 2.56GHz in a state-of-the-art 65 nm CMOS process. It comprises of both the analog and digital processing blocks, using T.C. Leslie and B. Singh architecture with the switched current integrator data sampling technique for low voltage, high speed operation. The whole system is verified mathematically in matlab and implemented using signal flow graphs and verilog a code. The analog blocks like switched current integrator, flash ADC and DAC are implemented in transistor level using a 65 nm CMOS technology and the functionality of each block is verified. Dynamic performance parameters such as SNR, SNDR and SFDR for different levels of abstraction matches the mathematical model performance characteristics.
@mastersthesis{diva2:545275,
author = {Baskaran, Balakumaar and Elumalai, Hari Shankar},
title = {{High-Speed Hybrid Current mode Sigma-Delta Modulator}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4558--SE}},
year = {2012},
address = {Sweden},
}
Wireless Sensor Networks have found diverse applications from health to agriculture and industry. They have a potential to profound social changes, however, there are also some challenges that have to be addressed. One of the problems is the limited power source available to energize a sensor node. Longevity of a node is tied to its low power design. One of the areas where great power savings could be made is in nodal communication. Different schemes have been proposed targeting low power communication and short network latency. One of them is the introduction of ultra-low power wake-up receiver for monitoring the channel. Although it is a recent proposal, there has been many works published. In this thesis work, the focus is study and comparison of architectures for a wake-up receiver. As part of this study, an envelope detector based wake-up receiver is designed in 130nm CMOS Technology. It has been implemented in schematic and layout levels. It operates in the 2.4GHz ISM band and consumes a power consumption of 69µA at 1.2V supply voltage. A sensitivity of -52dBm is simulated while receiving 100kb/s OOK modulated wake-up signals.
@mastersthesis{diva2:542394,
author = {Gebreyohannes, Fikre Tsigabu},
title = {{Design of Ultra-Low Power Wake-Up Receiver in 130nm CMOS Technology}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4564--SE}},
year = {2012},
address = {Sweden},
}
This report describes the design and implementation of a fixed audio equalizer based on a scheme where parts of the signal spectrum are downsampled and treated differently for the purpose of reducing the computational complexity and memory requirements. The primary focus has been on finding a way of taking an equalizer based on a simple minimum-phase FIR filter and transform it to the new type of equalizer. To achieve this, a number of undesireable effects such as aliasing distortion and upsampling imaging had to be considered and dealt with. In order to achieve a good amplitude response of the system, optimization procedures were used.
As part of the thesis, a cost-effective implementation of the filter has been made for an FPGA, in order to verify that the scheme is indeed usable for equalizing an audio signal.
@mastersthesis{diva2:539356,
author = {Lindblom, Ludvig},
title = {{Design of a Digital Octave Band Filter}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4581--SE}},
year = {2012},
address = {Sweden},
}
The master thesis is based upon a new type of linear-phase Nyquist finitie impulse responseinterpolator and decimator implemented using a tree-structure. The tree-structure decreasesthe complexity, considerably, compared to the ordinary single-stage interpolator structure.The computational complexity is comparable to a multi-stage Nyquist interpolator structure,but the proposed tree-structure has slightly higher delay. The tree-structure should still beconsidered since it can interpolate with an arbitrary number and all subfilters operate at thebase rate which is not the case for multi-stage Nyquist interpolators.
@mastersthesis{diva2:538007,
author = {Lahti, Jimmie},
title = {{Tree-Structured Linear-Phase Nyquist FIR Filter Interpolators and Decimators}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4590--SE}},
year = {2012},
address = {Sweden},
}
Denna rapport presenterar ett examensarbete som gått ut på att bygga en prototyp av ett trådlöst sensornätverk vars funktion är att mäta fuktighet och lagra värdena på ett minneskort. Detta utförs för att man förhoppningsvis kan bli varnad för en eventuellt inkommande torka. Prototypen som utvecklats består av en huvudenhet och en sensorenhet. Det är möjligt att koppla upp flera sensorenheter till detta system och slutligen kommer det vara flertalet sensorenheter uppkopplade mot varje huvudenhet. Kommunikationssättet som används för kontakt med omvärlden är Bluetooth och det förutsätter att en person har möjligheten att åka till alla stationer och samla upp den data som lagrats. För utvecklandet av prototypen har ett kopplingsdäck och ett STK-500 använts vilket har begränsat val av mikrokontroller till en mikrokontroller från Atmel. Den mikrokontroller som används är en ATMega328. Kommunikationen mellan enheterna sker via ZigBee. Detta trådlösa sensornätverk gör en mätning av markfukt en gång om dagen och lagrar mätvärden på ett minneskort. När man via det grafiska gränssnittet på datorn väljer att ladda ner mätvärdena skrivs de in i en fil på datorn och raderas från det minneskort som sitter i huvudenheten. Prototypen kommer utvecklas och om slutliga resultatet visar sig fungera som förväntat även installeras i flodområdet Limpopo i nordliga Sydafrika som projektet inriktar sig mot.
@mastersthesis{diva2:534544,
author = {Lehtojärvi, Jonas},
title = {{En lågeffektsmodul för markfuktsmätning med fokus på ZigBee}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0393--SE}},
year = {2012},
address = {Sweden},
}
The purpose of this thesis is to demonstrate the effects of mismatch errors that occur in time-interleaved analog-to-digital converters (TI-ADC) and how these are compensated for by proprietary methods from Signal Processing Devices Sweden AB. This will be demonstrated by two different implementations, both based on the combined digitizer/generator SDR14. These demonstrations shall be done in a way that is easy to grasp for people with limited knowledge in signal processing.
The first implementation is an analog video demo where an analog video signal is sampled by such an TI-ADC in the SDR14, and then converted back to analog and displayed with the help of a TV tuner. The mismatch compensation can be turned on and off and the difference on the resulting video image is clearly visible.
The second implementation is a digital communication demo based on W-CDMA, implemented on the FPGA of the SDR14. Four parallel W-CDMA signals of 5 MHz are sent and received by the SDR14. QPSK, 16-QAM, and 64-QAM modulated signals were successfully sent and the mismatch effects were clearly visible in the constellation diagrams. Techniques used are, for example: root-raised cosine pulse shaping, RF modulation, carrier recovery, and timing recovery.
@mastersthesis{diva2:535413,
author = {Nilsson, Johan and Rothin, Mikael},
title = {{Live Demonstration of Mismatch Compensation for Time-Interleaved ADCs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4570--SE}},
year = {2012},
address = {Sweden},
}
A team of developers from Epsilon AB has developed a lightweight remote controlledquadcopter named Crazyflie. The team wants to allow a pilot to navigate thequadcopter using video from an on-board camera as the only guidance. The masterthesis evaluates the feasibility of mounting a camera module on the quadcopter andstreaming images from the camera to a computer, using the existing quadcopterradio link. Using theoretical calculations and measurements, a set of requirementsthat must be fulfilled for such a system are identified. Using the requirementsas a basis, various camera products are investigated and the findings presented.A design to fulfill the requirements, using the found products, is proposed. Theproposed design is then implemented and evaluated.
It is found that the Crazyflie system has the resources necessary to transferan image stream with the quality required for navigation. Furthermore, theimplementation is found to provide the required functionality. From the evaluationseveral key factors of the design that can be changed to further improve theperformance of an implementation are identified. Ideas for future work andimprovements are proposed and possible alternative approaches are presented.
@mastersthesis{diva2:534744,
author = {Tosteberg, Joakim and Axelsson, Thomas},
title = {{Development of a Wireless Video Transfer System for Remote Control of a Lightweight UAV}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4560--SE}},
year = {2012},
address = {Sweden},
}
More computing power will be required in Scania’s future engine control units. Calculations is therefore needed to be performed on new hardware such as an FPGA. One problem that arises is synchronization of flywheelposition. This master thesis examines the opportunities existing Scania hardware has to perform synchronization of flywheel position. Different concepts for synchronization have been developed and compared with each other. One of the concepts have been implemented and made possible witha PCB-adapter. The results show that synchronization is possible within given real-time requirements. Finally, an analysis to series production has been made. It show the challenges that an FPGA will face when integrated into a future engine control unit.
@mastersthesis{diva2:534037,
author = {Pettersson, Tobias},
title = {{Synchronization of flywheel position between autonomous devices}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4602--SE}},
year = {2012},
address = {Sweden},
}
I detta projekt har ett strömsnålt mätsystem utvecklats. Systemet klarar att mäta markfukt under långa tider utan underhåll. Olika sensorer kan kopplas in till en huvudenhet, sensordata loggas på ett minneskort och kan sedan läsas av från en PC.
Programmet till PC:n är utvecklat under projektet. Detta program kan läsa av realtidsklockans tid för att kontrollera att denna överensstämmer med PC:ns klocka. Programmet kan även tömma minneskortet via bluetooth och ladda ner all data som finns på minneskortet. PC:n har möjlighet att synkroniserar huvudenhetens realtidsklocka.
Systemet drivs i sin helhet av solenergi genom solceller vilket gör att enheterna inte behöver något batteri för att klara av mätningar. Med hjälp av superkondensatorer som laddasupp under dagen kan man driva realtidsklockan under hela natten så att klockan inte stannar. Minneskortet är så stort att avläsning av enheten inte behöver ske på flera år. Då avläsning sker är bluetoothenheten väldigt snabb så väntetiden att tömma minneskortet är kort.
@mastersthesis{diva2:532450,
author = {Nordh, Nordh},
title = {{En lågeffektsmodul för markfuktsmätning med fokus på Bluetooth}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0392--SE}},
year = {2012},
address = {Sweden},
}
Capacitive communication using human body as a electrical channel has attracted much attention in the area of personal area networks (PANs) since its introduction by Zimmerman in 1995. The reason being that the personal information and communication appliances are becoming an integral part of our daily lives. The advancement in technology is also helping a great deal in making them interesting,useful and very much affordable. If we interconnect these body-based devices with capacitive communication approach in a manner appropriate to the power, size, cost and functionality, it lessens the burden of supporting a communication channel by existing wired and wireless technologies. More than that, using body as physical communication channel for a PAN device compared to traditional radio transmission seems to have a lot of inherent advantages in terms of power and security etc. But still a lot of feasibility and reliability issues have to be addressed before it is ready for prime time. This promising technology is recently sub-classified into body area networks (BAN) and is currently under discussion in the IEEE 802.15.6 Task Group for addressing the technical requirements to unleash its full potential for BANs. This could play a part in Ericsson's envision of 50 billion connections by 2020. This thesis work is part of the main project to investigate the models, interface and derive requirements on the analog-front-end (AFE) required for the system. Also to suggest a first order model of the AFE that suits this communication system.In this thesis work the human body is modeled along with interfaces and transceiver to reflect the true condition of the system functioning. Various requirements like sensitivity, dynamic range, noise figure and signal-to-noise ratio (SNR) requirements are derived based on the system model. An AFE model based on discrete components is simulated, which was later used for proof of concept. Also a first order AFE model is developed based on the requirements derived. The AFE model is simulated under the assumed interference and noise conditions. The first order requirements for the submodules of the AFE are also derived. Future work and challenges are discussed.
@mastersthesis{diva2:531437,
author = {Kariyannavar, Kiran},
title = {{Connecting the human body - Models, Connections and Competition}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4505--SE}},
year = {2012},
address = {Sweden},
}
This thesis describes the specification, design and implementation of a software-defined radio system on a two-channel 14-bit digitizer/generator. The multi-stage interpolations and decimations which are required to operate two analog-to-digital converters at 800 megasamples per second (MSps) and two digital-to-analog converters at 1600 MSps from a 25 MSps software-side interface, were designed and implemented. Quadrature processing was used throughout the system, and a combination of fine-tunable low-rate mixers and coarse high-rate mixers were implemented to allow frequency translation across the entire first Nyquist band of the converters. Various reconstruction filter designs for the transmitter side were investigated and a cheap implementation was done through the use of programmable base-band filters and polynomial approximation.
@mastersthesis{diva2:531725,
author = {Björklund, Daniel},
title = {{Implementation of a Software-Defined Radio Transceiver on High-Speed Digitizer/Generator SDR14}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4583--SE}},
year = {2012},
address = {Sweden},
}
This thesis has been written at Linköping University for the company Instrument Control Sweden AB (ICS).
ICS is a small company located in Linköping that develops software and hardware for Unmanned Aerial Vehicles, UAV. At present, ICS has a fully functional autopilot called EasyPilot but they want to reduce the autopilot’s size to make it more attractive.
The purpose of this thesis was to investigate if it was possible to reduce the size of the autopilot and how, in that case, it would be done. It was also necessary to examine whether the old processors should be replaced by new ones and how hard it would be to convert the old software to these new processors.
To succeed with the goals many of the old components had to be changed for new, smaller ones. Some less necessary parts were also completely removed. The results showed that the size could be reduced quite a bit, exactly how much is hard to say since no PCB-layout were done.
By doing some programming tests on the new components it could be shown that some parts of the old code could be reused on the new design. It was mainly algorithms and other calculations. However, a lot of new code still had to be written in order to successfully convert the old software to the new hardware.
@mastersthesis{diva2:530519,
author = {Andersson, Erik},
title = {{Omkonstruktion och arkitekturbyte av autopilot för obemannade farkoster}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0395--SE}},
year = {2012},
address = {Sweden},
}
Modern communication systems require higher data rates which have increased thedemand for high speed transceivers. For a system to work efficiently, all blocks ofthat system should be fast. It can be seen that analog interfaces are the main bottleneckin whole system in terms of speed and power. This fact has led researchersto develop high speed analog to digital converters (ADCs) with low power consumption.Among all the ADCs, flash ADC is the best choice for faster data conversion becauseof its parallel structure. This thesis work describes the design of such a highspeed and low power flash ADC for analog front end (AFE) of a transceiver. Ahigh speed highly linear track and hold (TnH) circuit is needed in front of ADCwhich gives a stable signal at the input of ADC for accurate conversion. Twodifferent track and hold architectures are implemented, one is bootstrap TnH andother is switched source follower TnH. Simulations show that high speed with highlinearity can be achieved from bootstrap TnH circuit which is selected for the ADCdesign.Averaging technique is employed in the preamplifier array of ADC to reduce thestatic offsets of preamplifiers. The averaging technique can be made more efficientby using the smaller number of amplifiers. This can be done by using the interpolationtechnique which reduces the number of amplifiers at the input of ADC. Thereduced number of amplifiers is also advantageous for getting higher bandwidthsince the input capacitance at the first stage of preamplifier array is reduced.The flash ADC is designed and implemented in 150 nm CMOS technology for thesampling rate of 1.6 GSamples/sec. The bootstrap TnH consumes power of 27.95mW from a 1.8 V supply and achieves the signal to noise and distortion ratio(SNDR) of 37.38 dB for an input signal frequency of 195.3 MHz. The ADC withideal TnH and comparator consumes power of 78.2 mW and achieves 4.8 effectivenumber of bits (ENOB).
@mastersthesis{diva2:525399,
author = {Younis, Choudhry Jabbar},
title = {{Design and Implementation of a high-efficiency low-power analog-to-digital converter for high-speed transceivers}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4542--SE}},
year = {2012},
address = {Sweden},
}
Large sensor networks of very small motes, having sensing, computation, communication and power units, are becoming an active research topic. The major problem in implementing such networks is threefold. Firstly, power consumption and area which are limited by the technology (limitation on minimum size and power consumption of transistors). Secondly, locating the area of event which arises due to requirement of motes without identity. Thirdly, cost factor as the number of motes required would be high.
This thesis work was done in two parts, the first part comprising of modeling a power and area efficient smart dust network using a novel algorithm to detect the location of event without giving identity to the motes and also developing an interface for monitoring patient's health in hospitals through such a network. The second part consists of designing an analog front end to generate event in case of abnormalities in signal from human body. The designed front end can also be used for intra body communication systems (body area networks) with operating frequencies of order 10-20 MHz.
Body area networks (BAN) is a type of personal area network (PAN), introduced by Zimmerman, using human body as a communication channel for communication between body based devices through capacitive coupling. The major advantage of such communication lies in reducing the burden on RF spectrum. If only one of the body based devices can communicate with the outside world using RF spectrum and all body based devices can communicate with each other forming a BAN, then indirectly all devices can communicate with the outside world with only one of them using the RF spectrum. Power and security are also the inherent advantages in using body as a channel.
@mastersthesis{diva2:524255,
author = {Sharma, Prateek},
title = {{A study on a smart dust network and an analog interface}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4554--SE}},
year = {2012},
address = {Sweden},
}
The main task of this project were to develop, hardware and software that could stream audio data via USB 2.0. This project were based on XMOS, USB 2.0 design. In this project we have brought an idea to reality in the form of a finished product. This with verification help from engineers on Syncore technologies. Under the development process the functionality surrounding component databases, provided by Altium designer, were to be evaluated. To be mentioned is that Altium designer was the software used to develop the PCB in this project. After many hours spent developing, we finally got the hardware and software to behave in the way it was suppose to do. That is, to be able to stream audio data from a high-resolution source(PC/MAC/unit with S/PDIF out, maximum resolution 24-bit 192 kHz). This to both S/PDIF and analog stereo out via RCA-connectors. The sound quality from a possible subjective point of view is very good and we are happy with the result. We think that the functionality surrounding component databases are convenient in many applications. Not just the fact that you easily can generate an up to date pricing of all components used in a project, you can also shorten the development process. This because the developer don't have to recreate schematic symbols and footprints that has already been created. Which of course was the fundamental idea behind the database functionality. These are just a few examples of its advantages. To be considered is the fact that the administration surrounding the component databases can be very time consuming. To take full advantage of Altium designers functionalities we think that it needs a dedicated administrator that maintains the database repository.
@mastersthesis{diva2:515410,
author = {Österberg, Johan and Ekblom, Carl-David},
title = {{USB 2.0 Audio device}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0389--SE}},
year = {2012},
address = {Sweden},
}
The remotely operated underwater vehicles that the client develops have needs of different kinds of data channels. In order to minimize the need of physical cable between the control unit and a ROV, a multiplex protocol has been developed. The protocol has been designed with the aim of using the bandwidth of the transferring link as efficient as possible.
The different kinds of data channels used during this thesis project is; RS232, RS485 and CAN. ROM and FIFO-memories have been used to be able to effectively manage the different data channels. All the reading and sending of these channels have been implemented in FPGA-technology, the coding is made generic so that it will be easier to add more channels to the system in the future.
The multiplex protocol is a modified version of the method STDM and it is a proprietary protocol. Calculations has been made in MatLab to ensure that the protocol does not exceed the maximal bandwidth that is available. The protocol utilizes the error-detecting technique CRC for the purpose of error detection.
A PCB has been developed during this thesis project, the PCB is made so that the different data channels have connection with the FPGA circuit.
@mastersthesis{diva2:514474,
author = {Janson, Robert and Mottaghi, Amir},
title = {{FPGA-design av en STDM-baserad multiplexer för seriell multiprotokollskommunikation}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--12/0388--SE}},
year = {2012},
address = {Sweden},
}
The variable gain amplifier (VGA) is utilized in various applications of remote sensing and communication equipments. Applications of the variable gain amplifier (VGA) include radar, ultrasound, wireless communication and even speech analysis. These applications use the variable gain amplifier (VGA) to enhance dynamic performance.
The purpose of the thesis work is to implement a high linearity and low noise variable gain amplifier in 150 nm CMOS technology, for an analog-front-end of a transceiver. Two different amplifier architectures are designed and compared. First architecture is an amplifier with diode connected load and second architecture is a source degenerative amplifier. The performance of the amplifier with diode connected load is lower than the source degenerative amplifier in terms of gain, power, linearity, noise and bandwidth. So, the source degenerative amplifier is selected for implementation. The three stage variable gain differential amplifier is implemented with selected architecture.
The implemented three stage variable gain differential amplifier have gain range of -541.5 mdB to 22.46 dB with step size of approximately 0.3 dB and total gain steps are 78. The -3 dB bandwidth achieved is 953.3 MHz. The third harmonic distortion (HD3) is -45 dBc at 250 mV and the power is 35 mW at 1.8 V supply source.
@mastersthesis{diva2:488651,
author = {Azmat, Rehan},
title = {{Design and implementation of a low-noise high-linearity variable gain amplifier for high speed transceivers}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4543--SE}},
year = {2012},
address = {Sweden},
}
As one of the most powerful error-correcting codes, Low-density parity check codes are widely used in digital communications. Because of the performance of LDPC codes are capable to close the shannon limited extraordinarily, LDPC codes are to be used in the new Digital Video Broadcast-Satellite-Second Generation(DVB-S2) and it is the first time that LDPC codes are included in the broadcast standard in 2003.
In this thesis, a restructured parity-check matrices which can be divided into sub-matrices for LDPC code in DVB-S2 is provided. Corresponded to this restructured parity-check matrix, a reconstructed decoding table is invented. The encoding table of DVB-S2 standard only could obtain the unknown check nodes from known variable nodes, while the decoding table this thesis provided could obtain the unknown variable nodes from known check nodes what is exactly the Layered-massage passing algorithm needed. Layered-message passing algorithm which also known as "Turbo-decoding message passing" is used to reduce the decoding iterations and memory storage for messages. The thesis also investigate Bp algorithm, lambda-min algorithm, Min-sum algorithm and SISO-s algorithm, meanwhile, simulation results of these algorithms and schedules are also presented.
@mastersthesis{diva2:504435,
author = {Ge, Hanxiao},
title = {{Investigation of LDPC code in DVB-S2}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4321--SE}},
year = {2012},
address = {Sweden},
}
In this thesis, a fully logic - compatible Gain - Cell (GC) based Dynamic - Random - Access (DRAM) with a storage capacity of 2048 bit is designed in UMC – 180 nm technology. The GC used is a two transistor PMOS (2PMOS) cell. This thesis aims at building the foundation for further research on the effects of supply voltage ff scaling on retention time, leakage and power consumption. Different techniques are used to reduce leakage current for longer retention time and ultimately low power. Different types of decoders are analyzed for low power. First, general concepts of memories are presented. Furthermore, the topic of leakage and its effect on retention time and power consumption is introduced. Two memories are designed, first one is single port memory with improved retention time. Finally, a Two port memory with all peripherals, which consists of he GC array, Decoder, Drivers, Registers, Pulse generators is designed. All the simulations for voltage scaling and retention time are shown.
@mastersthesis{diva2:479528,
author = {Iqbal, Rashid},
title = {{Low Power Gain Cell Arrays: Voltage Scaling and Leakage Reduction}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4507--SE}},
year = {2012},
address = {Sweden},
}
Recently, smart dust or wireless sensor networks are gaining more attention.These autonomous, ultra-low power sensor-based electronic devices sense and process burst-type environmental variations and pass the data from one node (mote) to another in an ad-hoc network. Subsystems for smart dust are typically the analog interface (AI), analog-to-digital converter (ADC), digital signal processor (DSP), digital-to-analog converter (DAC), power management, and transceiver for communication.
This thesis project describes an event-driven (ED) digital signal processing system (ADC, DSP and DAC) operating in continuous-time (CT) with smart dust as the target application. The benefits of the CT system compared to its conventional counterpart are lower in-band quantization noise and no requirement of a clock generator and anti-aliasing filter, which makes it suitable for processing burst-type data signals.
A clockless EDADC system based on a CT delta modulation (DM) technique is presented. The ADC output is digital data, continuous in time, known as “data token”. The ADC employs an unbuffered, area efficient, segmented resistor-string (R-string) feedback DAC. A study of different segmented R-string DAC architectures is presented. A comparison in component reduction with prior art shows nearly 87.5% reduction of resistors and switches in the DAC and the D flip-flops in the bidirectional shift registers for an 8-bit ADC, utilizing the proposed segmented DAC architecture. The obtained SNDR for the 3-bit, 4-bit and 8-bit ADC system is 22.696 dB, 30.435 dB and 55.73 dB, respectively, with the band of interest as 220.5 kHz.
The CTDSP operates asynchronously and process the data token obtained from the EDADC. A clockless transversal direct-form finite impulse response (FIR) low-pass filter (LPF) is designed.
Systematic top-down test-driven methodology is employed through out the project. Initially, MATLAB models are used to compare the CT systems with the sampled systems. The complete CTDSP system is implemented in Cadence design environment.
The thesis has resulted in two conference contributions. One for the 20th European Conference on Circuit Theory and Design, ECCTD’11 and the other for the 19th IFIP/IEEE International Conference on Very Large Scale Integration, VLSI-SoC’11. We obtained the second-best student paper award at the ECCTD.
@mastersthesis{diva2:547144,
author = {Chhetri, Dhurv and Manyam, Venkata Narasimha},
title = {{A Continuous-Time ADC and DSP for Smart Dust}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4436--SE}},
year = {2011},
address = {Sweden},
}
Actiwave AB delivers audio solutions for active speakers. One of the features is that audio can be streamed to the speakers over a local network connection. The module that provides this functionality is expensive. This thesis investigates if this can instead be achieved by taking advantage of the Spartan-6 FPGA on their platform, using part of it as a MicroBlaze soft processor on which a rendering device can be implemented. The thesis discusses design decisions such as selection and integration of operating system, UPnP framework and media decoder. A fully functional prototype application for a desktop computer was implemented, with the intention of porting it to the FPGA platform. There turned out to be too many compability issues though, so instead, a simpler renderer was implemented on the FPGA. Mp3 music files were successfully streamed to and decoded on the soft processor, but without fulfilling real-time constraints. The conclusion is that it is reasonable to implement a UPnP Media Renderer on the FPGA. Decoding in real-time can be an issue due to insufficient performance of the soft processor, but several possible solutions exist.
@mastersthesis{diva2:507683,
author = {Ländell, Karl-Rikard and Wiksten Färnström, Axel},
title = {{FPGA Implementation of a UPnP Media Renderer}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4368--S}},
year = {2011},
address = {Sweden},
}
Digital designs are often very large and complex, this makes locating and fixing a bug very hard and time consuming. Often more than half of the development time is spent on verification. Assertion based verification is a method that uses assertions that can help to improve the verification time. Simulating with assertions provides more information that can be used to locate and correct a bug. In this master thesis assertions are discussed and implemented in Senior DSP processor.
@mastersthesis{diva2:483658,
author = {Lepenica, Nermin},
title = {{Assertion Based Verification on Senior DSP}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4538--SE}},
year = {2011},
address = {Sweden},
}
Partial Reconfigurable FPGA provides ability of reconfigure the FPGA duringrun-time. But the reconfigurable part is disabled while performing reconfiguration. In order to maintain the functionality of system, data stream should be hold for RP during that time. Due to this feature, the reconfiguration time becomes critical to designed system. Therefore this thesis aims to build a functional partial reconfigurable system and figure out how much time the reconfiguration takes.
A XILINX ML605 evaluation board is used for implementing the system, which has one static part and two partial reconfigurable modules, ICMP and HTTP. A Web Client sends different packets to the system requesting different services. These packets’ type information are analyzed and the requests are held by a MicroBlaze core, which also triggers the system’s self-reconfiguration. The reconfiguration swaps the system between ICMP and HTTP modules to handle the requests. Therefore, the reconfiguration time is defined between detection of packet type and completion of reconfiguration. A counter is built in SP for measuring the reconfiguration time.
Verification shows that this system works correctly. Analyze of test results indicates that reconfiguration takes 231ms and consumes 9274KB of storage, which saves 93% of time and 50% of storage compared with static FPGA configuration.
@mastersthesis{diva2:487700,
author = {Zhou, Ruoxing},
title = {{Dynamic Partial Reconfigurable FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4457--SE}},
year = {2011},
address = {Sweden},
}
War between malware and antimalware software started two decade back and have adopted the modern techniques with the evolution of technological development in the field of information technology. This thesis was targeted to analyze the performance of freeware antivirus programs available in the market. Several tests were performed to analyze the performance with respect to the core responsibilities of these software’s to scan and detect the viruses and also prevent and eradicate form them. Although irrelevant for common users may be but very important for technical professionals, many tests were performed to analyze the quality of these softwares with respect to their effects on the system it-self like utilization and engagement of precious resources, processing times and also system slowdown because of monitoring techniques. The results derived from these tests show not only the performance and quality of these softwares but also enlighten some areas to be focused for further analysis.
@mastersthesis{diva2:484494,
author = {Rasool, Muhammad Ahsan and Jamal, Abdul},
title = {{Quality of freeware antivirus software}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4541--SE}},
year = {2011},
address = {Sweden},
}
Choosing a right processor for an embedded application, or designing a new pro-cessor requires us to know how it stacks up against the competition, or sellinga processor requires a credible communication about its performance to the cus-tomers, which means benchmarking of a processor is very important. They arerecognized world wide by processor vendors and customers alike as the fact-basedway to evaluate and communicate embedded processor performance. In this the-sis, the benchmarking of ePUMA multiprocessor developed by the Division ofComputer Engineering, ISY, Linköping University, Sweden will be described indetails. A number of typical digital signal processing algorithms are chosen asbenchmarks. These benchmarks have been implemented in assembly code withtheir performance measured in terms of clock cycles and root mean square errorwhen compared with result computed using double precision. The ePUMA multi-processor platform which comprises of the Sleipnir DSP processor and Senior DSPprocessor was used to implement the DSP algorithms. Matlab inbuilt models wereused as reference to compare with the assembly implementation to derive the rootmean square error values of different algorithms. The execution time for differentDSP algorithms ranged from 51 to 6148 clock cycles and the root mean squareerror values varies between 0.0003 to 0.11.
@mastersthesis{diva2:480093,
author = {Murugesan, Somasekar},
title = {{Benchmarking of Sleipnir DSP Processor, ePUMA Platform}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4536--SE}},
year = {2011},
address = {Sweden},
}
Embedded memories dominate area, power and cost of modern very large scale integrated circuits system on chips ( VLSI SoCs). Furthermore, due to process variations, it becomes challenging to design reliable energy efficient systems. Therefore, fault-tolerant designs will be area efficient, cost effective and have low power consumption. The idea of this project is to design embedded memories where reliability is intentionally compromised to increase storage density.
Gain cell memories are smaller than SRAM and unlike DRAM they are logic compatible. In multilevel DRAM storage density is increased by storing two bits per cell without reducing feature size. This thesis targets multilevel read and write schemes that provide short access time, small area overhead and are highly reliable. First, timing analysis of reference design is performed for read and write operation. An analytical model of write bit line (WBL) is developed to have an estimate of write delay. Replica technique is designed to generate the delay and track variations of storage array. Design of replica technique is accomplished by designing replica column, read and write control circuits. A memory controller is designed to control the read and write operation in multilevel DRAM. A multilevel DRAM is with storage capacity of eight kilobits is designed in UMC 90 nm technology. Simulations are performed for testing and results are reported for energy and access time. Monte Carlo analysis is done for variation tolerance of replica technique. Finally, multilevel DRAM with replica technique is compared with reference design to check the improvement in access times.
@mastersthesis{diva2:478155,
author = {Khalid, Muhammad Umer},
title = {{Multilevel Gain Cell Arrays for Fault-Tolerant VLSI Systems}},
school = {Linköping University},
type = {{ISRN: LiTH-ISY-EX--11/4508--SE}},
year = {2011},
address = {Sweden},
}
Configurable devices have become more and more popularnowadays. This is because they can improve the system performance inmany ways. In this thesis work it is studied how introduction of coarse grain configurability can improve the ePUMA, the low power highspeed DSP platform, in terms ofperformance and power consumption. This study takes two DSP algorithms, Fast Fourier Transform (FFT) and FIR filtering asbenchmarks to study the effect of this new feature. Architectures are presented for calculation of FFT and FIR filters and it is shown how they can contribute to the system performance. Finally it is suggestedto consider coarse grain configurability as an option for improvement of the system.
@mastersthesis{diva2:477438,
author = {Pishgah, Sepehr},
title = {{Adaptation of The ePUMA DSP Platform for Coarse Grain Configurability}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4540--SE}},
year = {2011},
address = {Sweden},
}
Video decoding technologies have been widely used in our daily life. Higherresolutions and more advanced coding technologies may promote the capabilitiesof video decoding. A new multi-core digital signal processing processor, ePUMA,which stands for embedded Parallel DSP platform with Unique Memory Access,is chosen to investigate how it supports video decoding.
This thesis aims to benchmark the algorithms of video decoding and evaluatethe performance using ePUMA in MPEG-2 standard, which is a common standardwith the purpose of compressing video signals. Based on the slice-parallelismmethodology on eight co-processors of ePUMA, the implementation of the algorithemsconsists of variable length decoding, inverse scan, inverse quantization,two-dimensional inverse discrete cosine transform, motion vector decoding, formprediction and motion compensation. The performance of the kernels is benchmarkedby ePUMA system simulator. The result shows that to decode real-timeFull HD (1920*1080 pixels, 30 frames per second) video, it will require ePUMA torun at 280 MHz for I frames and at 320MHz for P frames.
@mastersthesis{diva2:468558,
author = {Xiaoyi, Peng},
title = {{Benchmark of MPEG-2 Video Decoding on ePUMA Multi-core DSP Processor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4459--SE}},
year = {2011},
address = {Sweden},
}
There have been proposals of many parity inducing techniques like Forward ErrorCorrection (FEC) which try to cope the problem of channel induced errors to alarge extent if not completely eradicate. The convolutional codes have been widelyidentified to be very efficient among the known channel coding techniques. Theprocess of decoding the convolutionally encoded data stream at the receiving nodecan be quite complex, time consuming and memory inefficient.This thesis outlines the implementation of multistandard soft decision viterbidecoder and word length effects on it. Classic Viterbi algorithm and its variantsoft decision viterbi algorithm, Zero-tail termination and Tail-Biting terminationfor the trellis are discussed. For the final implementation in C language, the "Zero-Tail Termination" approach with soft decision Viterbi decoding is adopted. Thismemory efficient implementation approach is flexible for any code rate and anyconstraint length.The results obtained are compared with MATLAB reference decoder. Simulationresults have been provided which show the performance of the decoderand reveal the interesting trade-off of finite word length with system performance.Such investigation can be very beneficial for the hardware design of communicationsystems. This is of high interest for Viterbi algorithm as convolutional codes havebeen selected in several famous standards like WiMAX, EDGE, IEEE 802.11a,GPRS, WCDMA, GSM, CDMA 2000 and 3GPP-LTE.
@mastersthesis{diva2:469272,
author = {Salim, Ahmed},
title = {{Evaluation of Word Length Effects on Multistandard Soft Decision Viterbi Decoding}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4416--SE}},
year = {2011},
address = {Sweden},
}
The smart dust concept is a fairly recent phenomenon to engineering. It assumes monitoring of a real natural environment in which motes or smart dust machines swarm in collective and coordinate information among themselves and/or to a backend control platform. In analog mixed signal field work on such devices is gaining momentum such that it is conceived to be one of the emerging fields in technology, and work was only possible once the technology for fabrication touched the nanoscale regions. Smart dust network involves remote devices connected in a hive sensing burst type datum signals from the environment and relaying information amongst themselves in an energy efficient manner to coordinate an appropriate response to a detected stimulus. The project presumed a RF based communication strategy for coordination amongst the devices through a wireless medium. That is less susceptible to stringent requirements of LOS and a base band processing system that comprised of an environment sensor, an AFE module, an ADC, a DSP and a DAC. Essentially a 10 bit, 2 Mega Hertz MHz pipelined ADC implemented in a STM 65nm technology. The ADC benefits the smart dust device in allowing it to process data in an energy efficient way and also focusing on reduced complexity as itsdesign feature. While it differs in the other ADC of the system by operating at a higher frequency and assuming a different design philosophy assuming a coherent system sensitive to a clock. The thesis work assumes that various features ofenergy harvesting, regulation and power management present in the smart dustmote would enable the system to contain such a diverse ADC. The ADCs output digital datum would be compatible to the rest of the design modules consisting mainly of DSP sections. The ADC novelty is based on the fact that it removes the necessity of employing a high power consuming OpAmp whose design parameters become more complex as technology scales to the nanoscale era and further down. A systematic, bottom up, test driven approach to design is utilized and various behaviours of the system are captured in Cadence design environment with verilogto layout models and MATLAB and Simulink models.
@mastersthesis{diva2:465643,
author = {Khan, Shehryar and Awan, Muhammad Asfandyar},
title = {{Study on Zero-Crossing-Based ADCs for Smart Dust Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4491--SE}},
year = {2011},
address = {Sweden},
}
This paper presents a reconfigurable FFT architecture for variable length andmultistreaming WiMax wireless standard. The architecture processes 1 streamof 2048-pt FFT, up to 2 streams of 1024-pt FFT or up to 4 streams of 512-ptFFT. The architecture consists of 11 SDF pipelined stages and radix-2 butterflyis calculated in each stage. The sampling frequency of the system is varied inaccordance with FFT length. The wordlength and buffer length in each stage isconfigurable depending on the FFT length. Latch-free clock gating technique isused to reduce power consumption.The architecture is synthesized for Virtex-6 XCVLX760 FPGA. Experimentalresults show that the architecture achieves the throughput as required by theWiMax standard and the design has additional features compared to the previousapproaches. The design used 1% of the total available FPGA resources andmaximum clock frequency of 313.67 MHz was achieved.
@mastersthesis{diva2:464719,
author = {Padma Prasad, Boopal},
title = {{A Reconfigurable FFT Architecture for Variable Length and Multi-Streaming WiMax Wireless OFDM Standards}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4513--SE}},
year = {2011},
address = {Sweden},
}
Communication technology has become indispensable in a modernsociety. Its importance is growing day by day. One of the main reasonsbehind this growth is the advancement in the analog and mixed signalcircuit design.Analog to digital converter (ADC) is an essential part in a modernreceiver system. Its development is driven by the progress of CMOStechnologies with an aim to reduce area and power consumption. In thearea of RF integrated circuits for wireless application low operationalvoltage, and less current consumption are the central aspects of thedesign. The aim of this master thesis is the development and design ofa low-power analog to digital converter for RF applications.The basic specifications are:· High Speed, Low Current (1.5 V supply voltage)· Maximum input frequency 3.5 MHz· 8-bit resolution· Sampling rate < 100 MHzThus, this work comprises a theoretical concept phase in whichdifferent ADC topologies will be investigated. Based on which anappropriate ADC architecture will be fixed. Later, the chosen design willbe implemented in an industrial 130 nm CMOS process.
@mastersthesis{diva2:461622,
author = {Radhakrishnan, Venkataraman},
title = {{Design of a low power analog to digital converter in a 130nmCMOS technology}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4532--SE}},
year = {2011},
address = {Sweden},
}
Our task was to create a virtual test bench for verifying memory addresses in our commissioning body’s models. The purpose with the testbench was that it should be created in such a way that it would be easy to change the device under test without any major changes in the testbench.
To solve the problem that the testbench had to be able to verify different devices we had to create a general enviroment for how the testbench had to be composed. By doing an analysis of which com-ponents that are usually included in a testbench and which components that were necessary in our project we came up with a generall enviroment for the testbench. Our result was a testbench with the follwing basic functions:
* Read from a file that contains read and write operations to the Device Under Test (DUT).* Apply the stimulus to the device* Read the results from the device* Compare the results with wanted values* Generate a log file which contains information about the simulation result.
@mastersthesis{diva2:451409,
author = {Risberg, Christoffer and Lynghed, Hampus},
title = {{Verifieringsplattform i SystemVerilog}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--11/0386--SE}},
year = {2011},
address = {Sweden},
}
This work aims at improving an existing soft microprocessor core optimized for Xilinx Virtex®-4 FPGA. Instruction and data caches will be designed and implemented. Interrupt support will be added as well, preparing the microprocessor core to host operating systems. Thorough verification of the added modules is also emphasized in this work. Maintaining core clock frequency at its maximum has been the main concern through all the design and implementation steps.
@mastersthesis{diva2:445728,
author = {Davari, Mahdad},
title = {{Improving an FPGA Optimized Processor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4520--SE}},
year = {2011},
address = {Sweden},
}
A modular and area-efficient 3D graphics accelerator for tile based rendering in FPGA systems has been designed and implemented. The accelerator supports a subset of OpenGL, with features such as mipmapping, multitexturing and blending. The accelerator consists of a software component for projection and clipping of triangles, as well as a hardware component for rasterization, coloring and video output. Trade-offs made between area, performance and functionality have been described and justified. In order to evaluate the functionality and performance of the accelerator, it has been tested with two different applications.
@mastersthesis{diva2:445413,
author = {Fries, Jakob and Johansson, Simon},
title = {{A Modular 3D Graphics Accelerator for FPGA}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4479--SE}},
year = {2011},
address = {Sweden},
}
The first version of the Senior processor was created as part of a thesis projectin 2007. This processor was completed and used for educational purposes atLinköpings University. In 2008 several parts of the processor were optimized andthe processor expanded with additional functionality as part of another thesisproject. In 2009 an EU funded project called MULTI-BASE started, in which theComputer Division at the Department of Electrical Engineering participated in.For their part of the MULTI-BASE project, the Senior processor was selected tobe used. After continuous revision and development, this processor was sent formanufacturing.
The assignment of this thesis project was to test and verify the different func-tions implemted in the Senior processor. To do this a PCB was developed fortesting the Senior processor together with a Virtex-4 FPGA. Extensive testingwas done on the most important functions of the Senior processor. These testsshowed that the manufactured Senior processor works as designed and that it alonecan perform larger calculations and use external hardware accelerators with thehelp of its various interfaces.
@mastersthesis{diva2:444421,
author = {Hedin, Alexander},
title = {{Testing and evaluation of the integratability of the Senior processor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4510--SE}},
year = {2011},
address = {Sweden},
}
The main scope of this thesis is to implement a new architecture of a high bandwidth phase-locked loop (PLL) with a large operating frequency range from 100~MHz to 1~GHz in a 150~$nm$ CMOS process. As PLL is the time-discrete system, the new architecture is mathematically modelled in the z-domain. The charge pump provides a proportionally damped signal, which is unlikely as a resistive or capacitive damping used in the conventional charge pump. The new damping results in a less update jitter and less peaking to achieve the lock frequency and fast locking time of the PLL. The new semi-digital PLL architecture uses $N$ storage cells. The $N$ storage cells is used to store the oscillator tuning information digitally and also enables analogue tuning of the voltage controlled oscillator (VCO). The storage cells outputs are also used for the process voltage temperature compensation. The phase-frequency detector (PFD) and VCO are implemented like a conventional PLL. The bandwidth achieved is 1/4th of the PFD update frequency for all over the operating range from 100~MHz to 1~GHz. The simulation results are also verified with the mathematical modelling. The new architecture also consumes less power and area compared to the conventional PLL.
@mastersthesis{diva2:444101,
author = {Elangovan, Vivek},
title = {{Low Power and Area Efficient Semi-Digital PLL Architecture for High Brandwidth Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4439--SE}},
year = {2011},
address = {Sweden},
}
Analog-to-digital converter (ADC) plays an important role in mixed signal processingsystems. It serves as an interface between analog and digital signal processingsystems. In the last two decades, circuits implemented in current-modetechnique have drawn lots of interest for sensory systems and integrated circuits.Current-mode circuits have a few vital advantages such as low voltage operation,high speed and wide dynamic ranges. These circuits have wide applications in lowvoltage, high speed-mixed signal processing systems. In this thesis work, a 9-bitpipelined ADC with switch-current (SI) technique is designed and implemented in65 nm CMOS technology. The main focus of the thesis work is to implement thepipelined ADC in SI technique and to optimize the pipelined ADC for low power.The ADC has a stage resolution of 3 bits. The proposed architectures combine adifferential sample-and-hold amplifier, current comparator, binary-to-thermometerdecoder, a differential current-steering digital-to-analog converter, delay logic anddigital error correction block. The circuits are implemented at transistor level in 65nm CMOS technology. The static and dynamic performance metrics of pipelinedADC are evaluated. The simulations are carried out by Cadence Virtuoso SpectreCircuit Simulator 5.10. Matlab is used to determine the performance metrics ofADC.
@mastersthesis{diva2:440789,
author = {Rajendran, Dinesh Babu},
title = {{Design of Pipelined Analog-to-Digital Converter with SI Technique in 65 nm CMOS Technology}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4489--SE}},
year = {2011},
address = {Sweden},
}
Rutiner skrivna i Verilog har utvecklats för avkodning av en frekvensmodulerad signal givet ett Analog Devices AD9874-chip. Olika metoder för I/Q-demodulation har utvärderats och av dessa har CORDIC valts och implementerats i Verilog.
Koden har till viss del testats på en IGLOO nano-FPGA men framförallt simulerats och verifierats i ModelSim.
@mastersthesis{diva2:437524,
author = {Lindström, Gustaf},
title = {{Strömsnål FM-demodulering med FPGA}},
school = {Linköping University},
type = {{LITH-ISY-EX-ET--11/0369--SE}},
year = {2011},
address = {Sweden},
}
The increase in speed and density of programmable logic devices such as Field Programmable Gate Arrays (FPGA) enables ever more complex designs to be constructed within a short time frame. The flexibility of a programmable device eases the integration of a design with a wide variety of components on a single chip. Since Frequency Modulation (FM) is an analog modulation scheme, performing it in the digital domain introduces new challenges. The details of these challenges and how to deal with them are also explained. This thesis presents the design of a digital stereo FM modulator including necessary signal processing, such as filtering, waveform generation, stereo multiplexing etc. The solution is comprised of code written in Very high speed integrated circuit Hardware Description Language (VHDL) and a selection of free Intellectual Property (IP)-blocks and is intended for implementation on a Xilinx FPGA. The focus of the thesis lies on area efficiency and a number of suggestions are given to maximize the number of channels that can be modulated using a single FPGA chip. An estimation of how many channels that can be modulated usingthe provided FPGA, Xilinx XC6SXL100T, is also presented.
@mastersthesis{diva2:437259,
author = {Boström, Henrik},
title = {{An FPGA implementation of a digital FM modulator.}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4481--SE}},
year = {2011},
address = {Sweden},
}
This master thesisinvestigates the possibilities to implement a digital filter for wire guidancein a truck. The analog circuits in the truck, today, are analyzed to understandtheir signal processing. The component MAX261 is especially interesting and itis analyzed in a special Section to make sure that all needed details, todevelop a digital filter, are available. When all theoretical calculation wasfinished, all the circuits were simulated to make sure that the calculationsare correct.
The digital filter is based onan analog filter which is expensive and not so easy to purchase. A requirementspecification was developed by analysis of the properties of the analog filterand how it is currently used. The analog filter is a part of a chain of analogsignal processing which mostly can be performed digitally instead.
The special type of the analogfilter makes the requirements, on the digital filter, very tough and anextensive analysis of digital filter structures was performed in order to finda suitable filter. The digital filter is of WDF (Wave Digital Filter)-type andit is very special, because it has two variable coefficients, one for thesteepness and one for the center frequency. The digital filter consists of anumber of first order filters, because a higher order filter with desiredproperties has coefficient values that are large which makes the stabilityproperties worse.
The best type ofimplementation of this filter and the signal processing are also analyzed.Finally, a prototype was developed on a development board where the maincomponent is a DSP (Digital Signal Processor). The program for the prototype iswritten in C-code and the performance of the system was verified by differenttests and measurements.
@mastersthesis{diva2:430962,
author = {Tunströmer, Anders},
title = {{Analysis and Implementation of a Digital Filter for Wire Guidance}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4478--SE}},
year = {2011},
address = {Sweden},
}
This master thesis investigates the uplink transmition from User Equipment (UE) to base station in LET (Long Term Evolution) and channel estimation using pilot symbols with parameter defined in 3GPP (3rd Generation Partnership Project) specifications. The purpose of the thesis was to implement a simulator which can generate uplink signal as it is generated by UE. The Third Generation (3G) mobile system was given the name LTE. This thesis focus on the uplink of LTE where single carrier frequency division multiple access (SC-FDMA) is utilized as a multiple access technique. The advantage over the orthogonal frequency division multiple access (OFDMA), which is used in downlink is to get better peak power characteristics. Because in uplink communication better peak power characteristic is necessary for better power efficiency in mobile terminals. To access the performance of uplink transmition realistic channel model for wireless communication system is essential. Channel models used are proposed by International Telecommunication Union (ITU) and the correct knowledge of these models is important for testing, optimization and performance improvements of signal processing algorithms. The channel estimation techniques used are Least Square (LS) and Least Minimum Mean Square Error (LMMSE) for different channel models. Performance of these algorithms has been measured in term of Bit Error Rate (BER) and Signal to Noise Ratio (SNR).
@mastersthesis{diva2:429089,
author = {Ahmed, Mohsin Niaz},
title = {{LTE Uplink Modeling and Channel Estimation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4476--SE}},
year = {2011},
address = {Sweden},
}
The filter complexity in the multi-stage decimation system of a Δ-Σ ADC increases progressively as one moves to higher stages of decimation due to the fact that the input word length of the higher stages also increases progressively. The main motivation for this thesis comes from the idea of investigating a way, to reduce the input word length in the later filter stages of the decimation system which could reduce the filter complexity. To achieve this, we use a noise-shaping loop between the first and later stages so that the input word length for the later stages remains smaller than in the case where we do not use the noise-shaping loop. However, the performance (SNR/ Noise-level) level should remain the same in both cases. This thesis aims at analyzing the implications of using a noise-shaping loop in between the decimation stages of a Δ-Σ ADC and also finding the appropriate decimation filter types that could be used in such a decimation system. This thesis also tries to compare the complexity introduced by using the noise-shaping loop with the reduction achieved in the later decimation stages in terms of the input word length. Filter required in the system will also be optimized using minimax optimization technique.
@mastersthesis{diva2:426143,
author = {Gundala, JayaKrishna},
title = {{A study on the decimation stage of a $\Delta$-$\Sigma$ ADC with noise-shaping loop between the stages.}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4486--SE}},
year = {2011},
address = {Sweden},
}
This thesis work primarily focuses on the applicability of sub-threshold source coupled logic (STSCL) for building digital circuits and systems that run at very low voltage and promise to provide desirable performance with excellent energy savings. Sectors like bio-engineering and smart sensors require the energy consumption to be effectively very low for long battery life. Alongside meeting the ultra-low power specification, the system must also be reliable, robust, and perform well under harsh conditions. In this thesis work, logic gates are designed and analyzed, using STSCL. These gates are further used for implementation of digital subsystems in small-sized smart dust sensors which would operate at very low supply voltages and consume extremely low power.
For understanding the performance of STSCL with respect to ultra-low power and energy; a seven-stage ring oscillator, a 4-by-4 array multiplier, a fifth-order FIR filter and finally a fifty-fifth-order FIR filter were designed. The subcircuits and systems have been simulated for different supply voltages, scaling down to 0.2 V, at different temperature values (-20oC and 70oC) in both 45 nm and 65 nm process technologies. The chosen architectures for the FIR filters and array multiplier were conventional and essentially taken from traditional CMOS-based designs.
The simulated results are studied, analyzed and compared with same CMOS-based digital circuits. The results show on the advantage of STSCL-based digital systems over CMOS. Simulation results provide an energy consumption of 1.1388 nJ for a fifty-fifth-order FIR filter, at low temperatures (-20oC), using STSCL logic, which is comparatively less than for the corresponding CMOS logic implementation.
@mastersthesis{diva2:427041,
author = {Roy, Sajib and Nipun, Md. Murad Kabir},
title = {{Understanding Sub-threshold source coupled logic for ultra-low power application}},
school = {Linköping University},
type = {{LITH-ISY-EX--11/4465--SE}},
year = {2011},
address = {Sweden},
}
A high level model of HSIPHY mode of IEEE 802.15.3c standard has been constructedin Matlab to optimize the wordlength to achieve a specific bit error rate (BER) depending on the application, and later an FFT has been implemented for different wordlengths depending on the applications. The hardware cost and power is proportional to wordlength. However, the main objective of this thesis has been to implement a low power, low area cost FFT for this standard. For that the whole system has been modeled in Matlab and the signal to noise ratio (SNR) and wordlength of the system have been studied to achieve an acceptable BER. Later an FFT has been implemented on 65nm ASIC for a wordlength of 8, 12 and 16 bits. For the implementation, a Radix-8 algorithm with eight parallel samples has been adopted. That reduce the area and the power consumption significantly compared to other algorithms and architectures. Moreover, a simple control has been used for this implementation. Voltage scaling has been done to reduce thepower. The EDA synthesis result shows that for 16bit wordlength, the FFT has 2.64 GS/s throughput, it takes 1.439 mm2 area on the chip and consume 61.51mW power.
@mastersthesis{diva2:419729,
author = {Ahmed, Tanvir},
title = {{High Level Model of IEEE 802.15.3c Standard and Implementation of a Suitable FFT on ASIC}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4462--SE}},
year = {2011},
address = {Sweden},
}
Analog-to-Digital Converters (ADCs) can be classified into two categories namely Nyquist-rate ADCs and OversampledADCs. Nyquist-rate ADCs can process very high bandwidths while Oversampling ADCs provide high resolution using coarse quantizers and support lower input signal bandwidths. This work describes a Reconfigurable ADC (R-ADC) architecture which models 14 different ADCs utilizing four four-bit flash ADCs and four Reconfigurable Blocks (RBs). Both Nyquist-rate and Oversampled ADCs are included in the reconfiguration scheme. The R-ADC supports first- and second-order Sigma-Delta (ΣΔ) ADCs. Cascaded ΣΔ ADCs which provide high resolution while avoiding the stability issues related to higher order ΣΔ loops are also included. Among the Nyquist-rate ADCs, pipelined and time interleaved ADCs are modeled. A four-bit flash ADC with calibration is used as the basic building block for all ADC configurations. The R-ADC needs to support very high sampling rates (1 GHz to 2 GHz). Hence switched-capacitor (SC) based circuits are used for realizing the loop filters in the ΣΔ ADCs. The pipelined ADCs also utilize an SC based block called Multiplying Digital-to-Analog Converter (MDAC). By analyzing the similarities in structure and function of the loop filter and MDAC, a RB has been designed which can accomplish the function of either block based on the selected configuration. Utilizing the same block for various configurations reduces power and area requirements for the R-ADC.
In SC based circuits, the minimum sampling capacitance is limited by the thermal noise that can be tolerated in order to achieve a specific ENOB. The thermal noise in a ΣΔ ADC is subjected to noise shaping. This results in reduced thermal noise levels at the inputs of successive loop filters in cascaded or multi-order ΣΔ ADCs. This property can be used to reduce the sampling capacitance of successive stages in cascaded and multi-order ΣΔ ADCs. In pipelined ADCs, the thermal noise in successive stages are reduced due to the inter-stage gain of the MDAC in each stage. Hence scaling of sampling capacitors can be applied along the pipeline stages. The RB utilizes the scaling of capacitor values afforded by the noise shaping property of ΣΔ loops and the inter-stage gain of stages in pipelined ADCs to reduce the total capacitance requirement for the specified Effective Number Of Bits (ENOB). The critical component of the RB is the operational amplifier (opamp). The speed of operation and ENOB for different configurations are determined by the 3 dB frequency and DC gain of the opamp. In order to find the specifications of the opamp, the errors introduced in ΣΔ and pipelined ADCs by the finite gain and bandwidth of the opamp were modeled in Matlab.The gain and bandwidth requirements for the opamp were derived from the simulation results.
Unlike Nyquist-rate ADCs, the ΣΔ ADCs suffer from stability issues when the input exceeds a certain level. The maximum usable input level is determined by the resolution of the quantizer and the order of the loop filter in the ΣΔADC. Using Matlab models, the maximum value of input for different oversampling ADC configurations in the R-ADC were found. The results obtained from simulation are comparable to the theoretical values. The cascaded ADCs require digital filter functions which enable the cancellation of quantization noise from certain stages. These functions were implemented in Matlab. For the R-ADC, these filter functions need to run at very high sampling rates. The ΣΔ loop filter transfer functions were chosen such that their coefficients are powers of two, which would allow them to be implemented as shift and add operations instead of multiplications.
The R-ADC configurations were simulated in Matlab. A schematic for the R-ADC was developed in Cadence using ideal switches and a finite gain, single-pole operational transconductance amplifier model. The ADC configuration was selected by four external bits. Performance parameters such as SNR, SNDR and SFDR obtained from simulations in Cadence agree with those from Matlab for all ADC configurations.
@mastersthesis{diva2:414043,
author = {Harikumar, Prakash and Muralidharan Pillai, Anu Kalidas},
title = {{A Study on the Design of Reconfigurable ADCs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4319--SE}},
year = {2011},
address = {Sweden},
}
Ericsson in Linkoping houses one of the largest test laboratories within thewhole Ericsson Company. Mainly, the laboratories contain equipment forGSM, WCDMA and LTE. To test these systems, a quite large number ofRadio Base Stations are needed. The RBS's are housed in a proportionatelysmall area. Instead of sending signals through the air, cables are used totransfer the RF signals. In this way the equipment communicating witheach other are well speci ed. However this may not be the case if leakageoccur.This thesis work is about developing a system for monitoring the radioenvironment and detect leakages in the test site. There is a need to de newhat a leakage really is and measurements needs to be performed in order toaccomplish this. This report describes how the work has proceeded towardsthe nal implemented solution.
@mastersthesis{diva2:405656,
author = {Johansson, Emil and Myhrman, Kim},
title = {{GSM/WCDMA Leakage Detection System}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4428--SE}},
year = {2011},
address = {Sweden},
}
The goal of this bachelor thesis work was to establish a cable connection between an analogue interface board, containing a 16 bit analogue to digital converter, and a DE2 board in order to allow for digital data transmission between the two boards.
The DE2 board includes an FPGA which was configured to contain a Nios II softcore microprocessor for handling the tasks of reading and saving the 16 bit digital words transmitted over the cable as well as controlling the analogue to digital converter on the interface board.
During the project work various tasks had to be fulfilled which included soldering the cable for parallel transmission of the 16 bit digital data words and the control signals between the boards as well as adjusting the analogue interface board with the correct voltage supplies and jumper settings. Furthermore the hardware circuit insidethe FPGA had to be configured and the program running on the Nios II processor had to be written in C language.
@mastersthesis{diva2:400488,
author = {Keller, Markus},
title = {{Connecting a DE2 board with a 5-6k interface board containing an ADC for digital data transmission}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--11/0382--SE}},
year = {2011},
address = {Sweden},
}
In this bachelor thesis several software, capable of calculating andsimulating complex problems concerning the power losses in inductors andtransformers with the finite element method, have been evaluated and used tosolve test cases provided by the commissioner. The software have been evaluatedwith respect to several requirements stated by the commissioner.The aim is to be able to simulate power losses and inductance levels in complexdesigns of inductors and transformers. By reading the manuals to the software, aview of the methods and equations the different software use for their calculationshave been established. The enclosed tutorials have provided the knowledge forthe operations of the different software. By designing the test models providedby the commissioner, a deeper understanding of the work area has been reached.The test results provides an answer for the test models, the behaviour of themagnetic field has been analysed for the models and the calculated power lossesseem to correspond to the behaviour of the prototypes.The evaluation of the software has been done with regard to the commissionersrequirements. The recommendation will be to use either FEMM 4.2 or QuickField5.7, both software have a short training curve and an interface easy to maintain.For problems requiring a transient analysis the recommendation is QuickField, butthe material library maintainability is better in FEMM 4.2. Regarding COMSOLMultiphysics 3.5 and Ansys RAnsoft Maxwell Student Version 9, both softwareare highly qualified for the complex calculations needed for these kind of problems.The training curve for these software is however much longer than for the othertwo software and for the commissioner to be able to fully use all the possibilitiesin the software this will not be efficient.
@mastersthesis{diva2:398896,
author = {Larsson, Jenny and Håkansson, David},
title = {{Evaluation of software using the finite element method by simulating transformers and inductors}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--11/0381--SE}},
year = {2011},
address = {Sweden},
}
Smart cameras are part of many automated system for detection and correction in various systems. They can be used for detecting unwanted particles inside a fuel tank or may be placed on an airplane engine to provide protection from any approaching obstacle. They can also be used to detect a finger print or other biometric identification on a document or in some video domain. These systems usually have a very sensitive fast processing nature, to stop some ongoing information or extract some information. Image compression algorithms are used for the captured images to enable fast communication between different nodes i.e. the cameras and the processing units. Nowadays these algorithms are very popular with sensor based smart camera. The challenges associated with these networks are fast communication of these images between different nodes or to a centralized system. Thus a study is provided for an algorithm used for this purpose. In-depth study and Matlab modeling of CCITT group4 TIFF is the target of this thesis. This work provides detail study about the CCITT TIFF algorithms and provides a Matlab model for the compression algorithm for monochrome images.
The compressed images will be of a compression ratio about 1:15 which will help in fast communication and computation of these images. A developed set of 8 test images with different characteristics in size and dimension is compressed by the Matlab model implementing the CCITT group4 TIFF. These compressed images are then compared to same set of images compressed by other algorithms to compare the compression ratio.
@mastersthesis{diva2:398511,
author = {Khan, Azam},
title = {{Algorithm study and Matlab model for CCITT Group4 TIFF Image Compression}},
school = {Linköping University},
type = {{LiTH-ISY -EX--10/4451--SE}},
year = {2011},
address = {Sweden},
}
In modern home entertainment video systems the digital interconnection between the different components is becoming increasingly common. However, analog signal sources are still in widespread use and must be supported by new devices. In order to keep costs down, the digital and the analog receiver chains are implemented on a single die to form a system-on-chip (SoC). For such integrated circuits, it is beneficial to reduce the number of power supply domains to a minimum and preferably use the core voltage to power the analog circuits.
An eight-to-one input multiplexer, targeted for video digitizer applications, is presented. Together with the multiplexer, a simple current-mode DC restoration circuit is provided. The goal has been to design the circuits for a standard, single-well, 65 nm CMOS process, entirely using low-voltage core transistors and a single 1.1 V supply domain, while allowing the input signal voltages to extend beyond the supply rails.
To fulfill the requirements, a bootstrap technique has been proposed for the implementation of the multiplexer switches. Bootstrapping a CMOS switch allows high linearity, as well as wide bandwidth and dynamic range, to be achieved with a very low supply voltage. The simulated performance is: 3 dB bandwidth of 536 MHz with a 1.5 pF load at the output of the multiplexer and a SFDR of 65 dBc at 20 MHz and 1 Vp-p input signal. It has been verified that no transistor is stressed by high voltages, therefore, the circuit reliability is guaranteed. The DC restoration circuit utilizes the main video ADC, for measuring the DC level, and is capable of setting it with an accuracy of 60 μV within the range of 100 mV to 500 mV.
@mastersthesis{diva2:396428,
author = {Angelov, Pavel},
title = {{Design of an Input Multiplexer for Video Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4411--SE}},
year = {2011},
address = {Sweden},
}
Hearing aid devices are used to help people with hearing impairment. The number of people that requires hearingaid devices are possibly constant over the years, however the number of people that now have access to hearing aiddevices increasing rapidly. The hearing aid devices must be small, consume very little power, and be fairly accurate.Even though it is normally more important for the user that hearing impairment look good (are discrete). Once thehearing aid device prescribed to the user, she/he needs to train and adjust the device to compensate for the individualimpairment.We are within the framework of this project researching on hearing aid devices that can be trained by the hearingimpaired person her-/himself. This project is about finding suitable noise cancellation algorithm for the hearing-aiddevice. We consider several types of algorithms like, microphone array signal processing, Independent ComponentAnalysis (ICA) based on double microphone called Blind Source Separation (BSS) and DRNPE algorithm.We run this current and most sophisticated and robust algorithms in certain noise backgrounds like Cocktail noise,street, public places, train, babble situations to test the efficiency. The BSS algorithm was well in some situation andgave average results in some situations. Where one microphone gave steady results in all situations. The output isgood enough to listen targeted audio.The functionality and performance of the proposed algorithm is evaluated with different non-stationary noisebackgrounds. From the performance results it can be concluded that, by using the proposed algorithm we are able toreduce the noise to certain level. SNR, system delay, minimum error and audio perception are the vital parametersconsidered to evaluate the performance of algorithms. Based on these parameters an algorithm is suggested forheairng-aid.
@mastersthesis{diva2:443599,
author = {Ardam, Nagaraju},
title = {{Study of ASA Algorithms}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4334--SE}},
year = {2010},
address = {Sweden},
}
The Analog to Digital Converter (ADC) is an inevitable part of video AnalogFront Ends (AFE) found in the electronic displays today. The need to integratemore functionality on a single chip (there by shrinking area), poses great designchallenges in terms of achieving low power and desired accuracy.The thesis initially focuses upon selection of suitable Analog to Digital Converter(ADC) architecture for a high definition video analog front end. SuccessiveApproximation Register (SAR) ADC is the selected architecture as it scales downwith technology, has very less analog part and has minimal power consumption.In second phase a mathematical model of a Time-Interleaved Successive ApproximationRegister (TI-SAR) ADC is developed which emulates the behavior ofSAR ADC in Matlab and the errors that are characteristic of the time interleavedstructure are modeled.In the third phase a behavioral model of TI-SAR ADC having 16 channels and12 bit resolution, is built using the top-down methodology in Cadence simulationtool. All the modules were modeled at behavioral level in Verilog-A. The functionalityof the model is verified by simulation using signal of 30 MHz and clockfrequency of 300 MHz, using a supply voltage of 1.2 V. The desired SNDR (Signalto Noise Distortion ratio) 74 dB is achieved.In the final phase two architectures of comparators are implemented in 65nmtechnology at schematic level. Simulation results show that SNDR of 71 dB isachievable with a minimal power consumption of 169.6 μWper comparator runningat 300 MHz.NyckelordKeywords
@mastersthesis{diva2:383331,
author = {Qazi, Sara},
title = {{Study of Time-Interleaved SAR ADC andImplementation of Comparator for High DefinitionVideo ADC in 65nm CMOS Process}},
school = {Linköping University},
type = {{LiTH-ISY-EX--2010/4344--SE}},
year = {2010},
address = {Sweden},
}
This thesis work describes the implementation perspective of an integrated high efficiency DC-DC converter implemented in 65 nm CMOS. The implemented system employs the Buck converter topology to down-convert the input battery voltages. This converter offers its use as a power management unit in portable battery operated devices.
This thesis work includes the description of a basic Buck converter along with the various key equations involved which describe the Buck operation as well as are used to deduce the requirements for the various internal building blocks of the system. A detailed description of the operation as well as the design of each of the building blocks is included.
The implemented system can convert the input battery voltage in the range of 2.3 V to 3.6 V into an output supply voltage of 1.6 V. The system uses dual-mode feedback control to maintain the output voltage at 1.6 V. For the low load currents the PFM feedback control is used and for the higher load currents the PWM feedback control is used. This converter can supply load currents from 0 to 300 mA with efficiency above 85%. The static line regulation of the system is < 0.1% and the load regulation of the system is < 0.3%. A digital soft-start circuit is implemented in this system. The system also includes the capability to trim the output voltage in ~14 mV steps depending on the 4-bit input digital code.
@mastersthesis{diva2:361085,
author = {Manh, Vir Varinder},
title = {{An Integrated High Efficiency DC-DC Converter in 65 nm CMOS}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4408--SE}},
year = {2010},
address = {Sweden},
}
In some applications polynomials should be evaluated, e.g., polynomial approximation of elementary function and Farrow filter for arbitrary re-sampling. For polynomial evaluation Horner’s scheme uses the minimum amount of hardware resources, but it is sequential. Many algorithms were developed to introduce parallelism in polynomial evaluation. This parallelism is achieved at the cost of hardware, but ensures evaluation in less time.
This work examines the trade-off between hardware cost and the critical path for different level of parallelism for polynomial evaluation. The trade-offs in generating powers in polynomial evaluation using different building blocks(squarers and multipliers) are also discussed. Wordlength requirements of the polynomial evaluation and the effect of power generating schemes on the timing of operations is also discussed. The area requirements are calculated by using Design Analyzer from Synopsys (tool for logic synthesis) and the GLPK (GNU Linear Programming Kit) is used to calculate the bit requirements.
@mastersthesis{diva2:354917,
author = {Nawaz Khan, Shahid},
title = {{Parallel Evaluation Of Fixed-Point Polynomials}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4406--SE}},
year = {2010},
address = {Sweden},
}
Development and construction of an electronic Breakout box is the main work for this thesis. The box is a part of a test system for the component Fuel Flow Transmitter and should convert signals to be suitable for a frequency counter. A previously constructed Breakoutbox for this purpose is being old and needed to be recreated. So SAAB Aerotech, Aircraft services, the company for the thesis work wanted to construct a new, more sustainable Breakoutbox adapted to a more modern technology. The signals to the box comes from the transmitter and should be converted to suitable signals for a frequency counter so it can show pulse and time difference between the signals. Both a digital and an analog approach for this purpose have been examined in the work. The result was that the analog solution worked better because the conversion could be performed with OP-amplifier instead of algorithms in a microprocessor. Many problems occured in this thesis work that wasn’t included in the beginning so the most important property proved to be the ability to solve this problems. The Breakout box finally met the requirements from the specification and will in the future be used instead of the old Breakout box as a component in the test system for the Fuel Flow Transmitter.
@mastersthesis{diva2:353807,
author = {Hjärtström, Markus},
title = {{Utveckling av Breakoutbox för Fuel Flow Transmitter}},
school = {Linköping University},
type = {{LITH-ISY-EX-ET--10/0359--SE}},
year = {2010},
address = {Sweden},
}
This thesis will present work done to develop the hardware of a flight control system (FCS) for an unmanned aerial vehicle (UAV). While as important as mechanical construction and control algorithms, the elecronics hardware have received far less attention in published works.
In this work we first provide an overview of existing academic and commercial UAV projects and based on this overview three different design approaches has been developed: network of independent microcontroolers, a central powerful CPU with helper logic and an field programmable gate array (FPGA) based approach. After evaluation the powerful CPU alternative with an ARM9 CPU is found to be most suitable.
As a final step this design approach is developed into a full design for the FCS which is evaluated and finally implemented. Initially a system incorporating an OMAP-L138 CPU, 256MByte DRAM, sensors and GPS is developed, however due to supply issues and cost limitations the final design instead incorporates a SOM-module with an OMAP35x processor, 128MByte DRAM as well as a sensor module and GPS. This design has been built and tested in the lab but not yet integrated into the UAV.
@mastersthesis{diva2:351926,
author = {Svanfeldt, Mårten},
title = {{Design of the hardware platform for the flight control system in an unmanned aerial vehicle}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4366--SE}},
year = {2010},
address = {Sweden},
}
This project was held at London South Bank University in the UK, with corporation with staff from Linköping University in Sweden as Bachelor thesis.
This report will guide you through the used techniques in order to achieve a successful cooler/Fan project with a minimum budget and good energy saving methods.
The steps of setting the used software and components are supported with figures and diagrams. You will find full explanation of the used components and mathematics, in additional to a complete working code.
@mastersthesis{diva2:347417,
author = {Jones, Omar},
title = {{DESIGN AND DEVELOPMENT OF AN EMBEDDED DC MOTOR CONTROLLER USING A PID ALGORITHM}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4417--SE}},
year = {2010},
address = {Sweden},
}
Reconfigurable computing is an old concept that during the past couple of decades has become increasingly popular. The concept combines the flexibility of software with the performance of hardware. One important contributing factor to the uprising in popularity is the presence of FPGAs (field-programmable gate arrays), which realize the concept by allowing the hardware to be reconfigured dynamically. The current state of reconfigurable computing is discussed further in the thesis.
Debugging is a vital part in the development of a hardware design. It can be done in several ways depending on the situation. The most common way is to perform simulations but in some cases the fault-finding has to be done when the design is implemented in hardware.
In this thesis a framework concept is designed that utilizes and evaluates some of the reconfigurable computing ideas. The framework provides debugging possibilities for FPGA designs in a novel way, with a modular system where each module provide means to aid finding a specific fault. The framework is added to an existing design, and offers the user a glimpse into the design behavior and the hardware it runs on.
One of the debug modules will be released separately under a free license. It allows the developer to see the contents of the memories in a design without requiring special debugging equipment.
@mastersthesis{diva2:327626,
author = {Siverskog, Jacob},
title = {{Evaluation of partial reconfiguration for FPGA debugging}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4390--SE}},
year = {2010},
address = {Sweden},
}
The IceCube telescope is built within the ice at the geographical South Pole in the middle of the Antarctica continent. The purpose of the telescope is to detect muon neutrinos, the muon neutrino being an elementary particle with minuscule mass coming from space.
The detector consists of some 5000 DOMs registering photon hits (light). A muon neutrino traveling through the detector might give rise to a track of photons making up a straight line, and by analyzing the hit output of the DOMs, looking for tracks, neutrinos and their direction can be detected.
When processing the output, triggers are used. Triggers are calculation- efficient algorithms used to tell if the hits seem to make up a track - if that is the case, all hits are processed more carefully to find the direction and other properties of the track.
The Track Engine is an additional trigger, specialized to trigger on low- energy events (few track hits), which are particularly difficult to detect. Low-energy events are of special interest in the search for Dark Matter.
An algorithm for triggering on low-energy events has been suggested. Its main idea is to divide time in overlapping time windows, find all possible pairs of hits in each time window, calculate the spherical coordinates θ and ϕ of the position vectors of the hits of the pairs, histogram the angles, and look for peaks in the resulting 2d-histogram. Such peaks would indicate a straight line of hits, and, hence, a track.
It is not believed that a software implementation of the algorithm would be fast enough. The Master's Thesis project has had the aim of developing an FPGA implementation of the algorithm.
Such an FPGA implementation has been developed. Extensive tests on the design has yielded positive results showing that it is fully functional. The design can be synthesized to about 180 MHz, making it possible to handle an incoming hit rate of about 6 MHz, giving a margin of more than twice to the expected average hit rate of 2.6 MHz.
@mastersthesis{diva2:328163,
author = {Wernhoff, Carl},
title = {{An FPGA implementation of neutrino track detection for the IceCube telescope}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4174--SE}},
year = {2010},
address = {Sweden},
}
SAAB Support and Services, som är servicecenter för flygplanskomponenter, utför idag huvuddelar av sina mätningar manuellt, mätningar som ibland kan ta upp till fyra dagar. För att höja noggrannheten samt öka effektiviteten köpte de år 2007 in ett automatiskt testkoncept från MK Test systems.
I examensarbetet har vi först undersökt den inköpta utrustningen.
Sedan har vi tagit fram rutiner för kalibrering av utrustningen som klarar SAAB:s krav. Därefter har vi arbetat fram kravspecifikationer och instruktioner för hur utrustningen ska användas. Under arbetets gång har vi samlat in information för att kunna göra en utvärdering av hur lämplig utrustningen är att använda för att testa flygplanskomponenter.
Arbetet resulterade huvudsakligen i tre st manualer som går igenom tre olika områden; kalibrering, kablagetest och ett standardtest för t.ex.
paneler och styrboxar.
@mastersthesis{diva2:326955,
author = {Croner, Len},
title = {{Utvärdering av MK F1500 testutrustning}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET- -10/0371- -SE}},
year = {2010},
address = {Sweden},
}
As the complexity of Very Large Scale Integration (VLSI) circuits dramatically increases by improvements of technology, there is a huge interests to shift different applications from analog to digital domain. While there are many platform available for this shift, Field Programmable Gate Arrays (FPGAs) hold an attractive position because of their performance, power consumption and configurability. Comparing with Application Specific Integrated Circuit (ASIC) and Digital Signal Processor (DSP), FPGA stands in the middle. It is easier to implement a function on FPGA than ASIC which is to perform a fixed operation. Although, DSP can implement versatile functions, its computational power is not high enough to support the high data rate of FPGA.
This report is the outcome and result of a master thesis at University of Linköping, Sweden. This report tries to cover both theoretical and hardware aspects of implementation of a Farrow structure for sample rate conversion on FPGA.
The intention of this work was to contribute to what is nowadays the main focus of communication engineers: designing flexible radio systems. Flexible radio systems are interactive and dynamic by definition. That is why a low-cost, flexible multimode terminal is crucially important to support different telecommunication standards and scenarios. In this thesis, FPGA implementation of complete Farrow system is presented. Matlab/Simulink, and VHDL are used in this thesis work as the prime software.
@mastersthesis{diva2:326941,
author = {Azizi, Kaveh},
title = {{FPGA Implementation of a Multimode Transmultiplexer}},
school = {Linköping University},
type = {{LiTH-ISY-EX- -10/4422- -SE}},
year = {2010},
address = {Sweden},
}
In this bachelor thesis a complete prototype of an industrial vibration measurement platform has been developed. By measuring a number of variables such as acceleration, temperature and speed conclusions can be drawn on machinery health. The aim is to evaluate hardware and software solutions for a possible future product. Based on a requirement specification a proper hardware design has be developed. The hardware consists of a four-layer PCB with an ARM Cortex-M3 microcontroller and about 250 other components. The PCB was designed, assembled, tested and finally housed in a box. Measures have been taken to protect the prototype against external disturbances such as inappropriate supply voltages and transients on the input stages.Software has been written for the microcontroller to perform the various measurements required by the prototype. These include RMS, integration and filtering. Special attention was paid to the latter by implementing filters based on lattice wave digital structures. This structure results in a very efficient implementation. Consideration is taken to be able to generate arbitrary filters independent of the characteristics and design method. To save time the microcontroller implements all the algorithms without any floating point numbers.Furthermore, both hardware and software are adapted for future industrial use. The finished prototype supports a number of communication interfaces in which Modbus (RS-485) and current loop communication can be mentioned.The final result is a very good performing platform with strong future potential.The work was commissioned by the consulting firm Syncore Technologies AB at their office in Mjärdevi, Linköping. The project has, in total, taken 10 weeks and occurred during spring 2010.In this bachelor thesis a complete prototype of an industrial vibration measurement platform has been developed. By measuring a number of variables such as acceleration, temperature and speed conclusions can be drawn on machinery health. The aim is to evaluate hardware and software solutions for a possible future product. Based on a requirement specification a proper hardware design has be developed. The hardware consists of a four-layer PCB with an ARM Cortex-M3 microcontroller and about 250 other components. The PCB was designed, assembled, tested and finally housed in a box. Measures have been taken to protect the prototype against external disturbances such as inappropriate supply voltages and transients on the input stages.Software has been written for the microcontroller to perform the various measurements required by the prototype. These include RMS, integration and filtering. Special attention was paid to the latter by implementing filters based on lattice wave digital structures. This structure results in a very efficient implementation. Consideration is taken to be able to generate arbitrary filters independent of the characteristics and design method. To save time the microcontroller implements all the algorithms without any floating point numbers.Furthermore, both hardware and software are adapted for future industrial use. The finished prototype supports a number of communication interfaces in which Modbus (RS-485) and current loop communication can be mentioned.The final result is a very good performing platform with strong future potential.The work was commissioned by the consulting firm Syncore Technologies AB at their office in Mjärdevi, Linköping. The project has, in total, taken 10 weeks and occurred during spring 2010.
@mastersthesis{diva2:325931,
author = {Tegelid, Simon and Åström, Jonas},
title = {{Konstruktion av Industriellt Vibrationsmätningssystem med signalbehandling baserad på Digitala Vågfilter av Lattice-struktur}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--10/0377--SE}},
year = {2010},
address = {Sweden},
}
Vår uppgift är att undersöka om vi kan förbättra utsignalens kvalité från ett digitalt TV-kort genom att översampla A/D-omvandlare. Vi kommer att programmera vår kod i en FPGA och i den finns enbart 3 stycken multiplikatorer lediga. Utgången från vårt filter ska ha samma frekvens som innan översamplingen, vi kommer därför att bygga ett FIR-filter som decimerar signalen. Vi valde detta filter för att kunna utnyttja symmetri och minimera antalet multiplikatorer. Vi använde ett filter med ordningen 44 vilket ger 45 koefficienter. Dessa koefficienter beräknades i Matlab med hjälp av funktionen ”firls” som minimerar energivärdet i stoppbandet.
Vi utförde mätningar på SNR samt grupplöptiden. Dessa visade att SNR förbättrades endast 0,7 dB samt att grupplöptiden inte påverkades nämnvärt. För att kunna förbättra SNR och hitta felkällan som begränsar signalen gjorde vi följande åtgärder.
- Fastställde att instrumenten verkligen kunde mäta så höga decibelvärden.
- Att det inte finns några begränsningar på utgången.
- Att det inte finns några störningar på ingången.
- Ändrade koefficienter i filterkoden för att variera filtrets egenskaper.
De åtgärder vi gjorde förbättrade inte SNR på konstruktionen. På grund av tidsbrist kunde vi inte fortsätta våra undersökningar. Det vi skulle gjort från början är att skapa en testbänk till vår filterkod för att kunna verifiera att den fungerade. Vi kan inte med säkerhet fastställa att filtret verkligen fungerar enligt de initiala kraven. Vi skulle även försöka förbättra klockan till FPGA:n eftersom denna kan skapa klockjitter. Vi skulle även försöka skapa fler mätpunkter i kedjan, att kunna mäta signalen efter A/D-omvandlaren och direkt efter vårt filter.
@mastersthesis{diva2:325823,
author = {Bergstrand, Johan},
title = {{Förbättra SNR i en digital TV-box genom översampling av A/D-omvandlare}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--10/0373--SE}},
year = {2010},
address = {Sweden},
}
CMMB (China Multimedia Mobile Broadingcasting) is a wireless broadcastingchannel standard for low bandwidth, low cost hand-held digital TV is adopted byall continental Chinese government TV broadcasting companies and some HongKong private TV broadcasting companies. The business potential is high, yet thefuture is hard to predict because it might be replaced by GB200600 or DTMB. Thedigital modulation is based on OFDM with pilot supporting channel estimationand equalization and CP supporting multi-path induced ISI problems.This thesis investigates the implement a CMMB system using a SDR platform.Simulation chain was implemented using MATLAB with full data precision includingCMMB transmitter and receiver. The transmitter behavior model includes RSencoder, LDPC encoder, OFDM modulation, etc. The receiver behavior modelincludes OFDM demodulation, channel estimation, channel equalization, LDPCdecoder, RS decoder, etc. Different channel models emulating pathloss, whitenoise, multi-path, and glitch were modeled. Based on the simulation chain andchannel models, T-domain, F-domain channel estimator and equalizer were implemented,optimized. Optimized TD-FD models for different mobility scenarioswere proposed. The focus of the thesis is on 2D (FD-TD) channel estimation andequalization.
@mastersthesis{diva2:324868,
author = {Gu, Haohao and Zhang, He},
title = {{Implementation of CMMB System using Software Defined Radio (SDR) Platform}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4305--SE}},
year = {2010},
address = {Sweden},
}
In this thesis, a single carrier ATSC DTV baseband transmitter, part of the receiver(including channel estimator and channel equalizer), were modeled. Since multi-pathinduced ISI (inter symbol interference) is the most significant impact on theperformance of single carrier DTV reception, modeling and implementation of singlecarrier channel estimator and channel equalizer have been the focus of the thesis. Westarted with the investigation of channel estimation methods. Afterwards, severalchannel estimators and equalizers were modeled and the performance of each channelequalization methods in different scenarios was evaluated. Our results show that thefrequency domain equalizer can achieve low computing cost and handle long delaypaths. Another important issue to be considered in block equalization is Inter-BlockInterference (IBI). The impact of IBI was investigated via behavior modeling. In lastpart of our thesis, two methods for IBI cancellation are compared and the proposal forhardware implementation was given.
@mastersthesis{diva2:306082,
author = {Jian, Wang and Yan, Xie},
title = {{Behavior Modeling of a Digital Video Broadcasting System and the Evaluation of its Equalization Methods}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4309--SE}},
year = {2010},
address = {Sweden},
}
Random numbers are required for cryptographic applications such as IT security products, smart cards etc. Hardwarebased random number generators are widely employed. Cryptographic algorithms are implemented on FieldProgrammable Gate Arrays (FPGAs). In this work a True Random Number Generator (TRNG) employed for spaceapplication was designed, investigated and evaluated. Several cryptographic requirements has to be satisfied for therandom numbers. Two different noise sources was designed and implemented on the FPGA. The first design wasbased on ring oscillators as a noise source. The second design was based on astable oscillators developed on a separatehardware board and interfaced with the FPGA as another noise source. The main aim of the project was to analyse theimportant requirement of independent noise source on a physical level. Jitter from the oscillators being the source forthe randomness, was analysed on both the noise sources. The generated random sequences was finally subjected tostatistical tests.
@mastersthesis{diva2:305133,
author = {Shanmuga Sundaram, Prassanna},
title = {{Development of a FPGA-based True Random Number Generator for Space Applications}},
school = {Linköping University},
type = {{LITH-ISY-EX--10/4398--SE}},
year = {2010},
address = {Sweden},
}
This report is a part of a master thesis project done at Ericsson Linköping incooperation with Linköpings Tekniska Högskola (LiTH). This project is divided intwo different parts. The first part is to create a measurement node that collectsand processes data from network time protocol servers. It is used to determinethe quality of the IP network at the node and detect potential defects on usedtimeservers or nodes on the networks.The second assignment is to analyze the collected data and further improve theexisting synchronization algorithm. Ip communication is not designed to be timecritical and therefore the NTP protocol needs to be complemented with additionalsignal processing to achieve required accuracy. Real time requirements limit thecomputational complexity of the signal processing algorithm.
@mastersthesis{diva2:303708,
author = {Gustafsson, Andreas and Hir, Danijel},
title = {{High precision frequency synchronization via IP networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4394--SE}},
year = {2010},
address = {Sweden},
}
This report describes both evaluation and implementation of the new coming image compression standard JPEG XR. The intention is to determine if JPEG XR is an appropriate standard for IP based video surveillance purposes. Video surveillance, especially IP based video surveillance, currently has an increasing role in the security market. To be a good standard for surveillance, the video stream generated by the camera is required to be low bit-rate, low latency on the network and at the same time keep a high dynamic display range. The thesis start with a deep insightful study of JPEG XR encoding standard. Since the standard could have different settings,optimized settings are applied to JPEG XR encoder to fit the requirement of network video surveillance. Then, a comparative evaluation of the JPEG XR versusthe JPEG is delivered both in terms of objective and subjective way. Later, part of the JPEG XR encoder is implemented in hardware as an accelerator for further evaluation. SystemVerilog is the coding language. TSMC 40nm process library and Synopsys ASIC tool chain are used for synthesize. The throughput, area, power ofthe encoder are given and analyzed. Finally, the system integration of the JPEGXR hardware encoder to Axis ARTPEC-X SoC platform is discussed.
@mastersthesis{diva2:302650,
author = {Yu, Lang},
title = {{Evaluating and Implementing JPEG XR Optimized for Video Surveillance}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4300--SE}},
year = {2010},
address = {Sweden},
}
Senast uppdaterad: 2010-08-26