# Asymmetric Stencil Approach for Latency Reduction of Real-Time Peak Detection Using AMPD Algorithm and FPGA Technology

Alperen Mustafa COLAK\*<sup>‡</sup>, Taito MANABE\*, Yuichiro SHIBATA\*, Fujio KUROKAWA \*\*

\* School of Engineering, Graduate School of Engineering Nagasaki University 1-14 Bunkyo, Nagasaki, Japan 852-8521.
 \*\*Nagasaki Institute of Applied Science, Faculty of Engineering, 536 Aba Machi, Nagasaki, Japan, 851-0193.
 (colak@pca.cis.nagasaki-u.ac.jp, manabe@pca.cis.nagasaki-u.ac.jp, shibata@cis.nagasaki-u.ac.jp, kurokawa\_fujio@nias.ac.jp)

<sup>‡</sup>Corresponding Author; Alperen Mustafa COLAK, <u>colak@pca.cis.nagasaki-u.ac.jp</u>

Received: 10.02.2020 Accepted: 26.03.2021

Abstract- In many signal processing applications, the detection of peaks is a substantial stage. However, the high false positive peak identification rate is a crucial problem because of the complexity of the signals and multiple noise sources. For this reason, a modified Automatic Multiscale Peak Detection (AMPD) algorithm of any time serial data based on Field-Programmable Gate Array (FPGA) has been implemented by these authors. In addition, a kind of approximation with an asymmetric stencil is proposed to reduce the pipeline latency. In this paper, it is focused on evaluating the trade-off relationship between latency reduction effects and accuracy of peak point detection on a real-time peak detection method developed in the previous study using the AMPD algorithm and FPGA technology.

Keywords automatic multiscale-based peak detection, latency reduction, FPGA.

# 1. Introduction

Detection of peaks in time-varying measured signals is a fundamental step for various signal processing and control algorithms utilized in most power electronics and renewable energy systems. Although the peak detection looks to be a rather straightforward task at the first glance, it requires many sophisticated calculation techniques since measured signals often suffer from noise and distortion. Therefore, a number of methodologies based on manual / automatic and supervised / unsupervised techniques have been proposed for the peak detection task in the literature.

Li et al. replaced the Gaussian smoothing in the continuous wavelet transform with the peak-preserving diffusion filtering and the false discovery rate of the proposed algorithm was improved for four simulated proteomics datasets [1]. Zheng et al. combined the crazy climber algorithm and continuous wavelet transform for the peak detection in mass spectrometry and the combined approach was found good at identifying the low-amplitude and overlapping peaks [2]. Sachin Kumar et al. applied a total variation denoising approach to detect the R-peaks in electrocardiogram signals and low false-negative beats and high false-positive beats were obtained [3]. Rahul et al. used the template waveform and adaptive thresholding to detect P, QRS, and T peaks in electrocardiogram signals and it showed the low computational complexity in comparison to heuristic approaches [4]. Vadrevu at al. integrated center of gravity and variational mode decomposition methods for identifying systolic peaks in the photoplethysmography

signal and the designed strategy achieved good sensing performance under noisy conditions [5].

Scholkmann et al. presented an automatic peak detector algorithm based on the local maxima scalogram and the efficiency in peak detection was increased for quasiperiodic and noisy periodic signals [6]. Schmidt et al. employed a convolutional neural network for the peak detection and localization in a noisy signal and it outperformed the continuous wavelet transform in terms of the signal processing performance [7]. Liu et al. introduced a Hilbert transform-based multi-peak detection algorithm for sensing the optical fiber Bragg gratings and the demodulation accuracy and speed were enhanced [8]. Bodendorfer et al. compared interpolated maximum search, linear phase operator, parabolic fit and Gaussian fit subpixel algorithms for fiber Bragg grating interrogators and the absolute values of peak wavelengths were evaluated [9]. Tolt et al. proposed a least squares minimization-based impulse response function in order to detect the peaks in time-correlated single-photon counting lidar data and it provided the low false alarm rate [10].

Guo et al. designed a sinusoidal voltage peak detector including ADC converter, second-order RC filter and full wave rectifier circuit and Fourier analysis exhibited good performance from 20 Hz to 500 kHz [11]. Wu et al. utilized from the differential structure in designing an mV-level real-time peak voltage detector and the detection error was decreased for the amplitude of 10 mV and the signal frequency of 20 kHz [12]. Manitha et al. developed a fundamental voltage peak detection controller for series

#### INTERNATIONAL JOURNAL of RENEWABLE ENERGY RESEARCH A.M. COLAK et al., Vol.11, No.1, March, 2021

active filters and it performed well than instantaneous reactive power theory and synchronous reference framebased controllers [13]. Ahmad et al. utilized from a golden band search algorithm in order to select the global peak in photovoltaic characteristics curve and the efficiency of the photovoltaic system was improved under partial shading conditions [14]. Lee et al. sensed the voltage sags in the grid voltage by a single-phase digital phase lock loop based on a d-q transformation and the latency in the peak detection process was decreased [15].

In addition to these works, Weibull Pareto sine-cosine optimization method [16], Gaussian fitting-based Levenberg-Marquardt method [17], multilayer perceptron [18], lifting wavelet transform [19], delta square operation [20], Kalman filtering [21], etc. were also employed for different peak detection tasks in the literature.

Most of these algorithms introduce an idea of frequency domain processing to separate signals of interest from noises and are significantly effective for off-line data analysis. However, since information in the time domain is not directly handled, this approach is not suitable for realtime applications such as control of power electronics systems, where low latency processing is essentially important. In order to achieve a successful peak detection method in terms of both robustness for noises and real-time performance, a Field-Programmable Gate Array (FPGA) based algorithm was proposed by these authors [22], [23]. This algorithm is based on the Automatic Multiscale-based Peak Detection (AMPD) algorithm [6], which can robustly detect peak points from noisy periodic signals. However, it is an off-line algorithm, which must store all the data to the memory before starting analysis and is not directly applicable for real-time peak application domains. To cope with this problem, the algorithm was modified to be a nonstoring processing and was implemented on an FPGA as pipelined hardware.

Although this pipelined architecture is effective for increasing the throughput, an essential latency due to data comparison has still remained. This paper proposes a latency reduction approach using an asymmetric stencil of data comparison, aiming at achieving a better balance between latency and accuracy. Use of an asymmetric stencil enables reduction of data comparison at the cost of introducing a kind of approximation to some extent. Through the evaluation experiments, trade-offs between the latency improvement and detection accuracy are revealed.

### 2. AMPD Algorithm Design

The main idea of the AMPD peak detection algorithm is the use of local maxima scalogram (LMS), which is a matrix consisting of local maxima information for given input signals. Let  $y_i$  where  $i \in \{0, 1, 2, 3, ..., n\}$  be time series input signal values. The LMS for  $y_i$  is defined as:

$$X = \begin{bmatrix} x_{1,1} & x_{1,2} & x_{1,3} & \cdots & x_{1,n} \\ x_{2,1} & x_{2,2} & x_{2,3} & \cdots & x_{2,n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{W,1} & x_{W,2} & x_{W,3} & \cdots & x_{W,n} \end{bmatrix} = (x_{a,b})$$
(1)

where W is given by:

$$W = \left[\frac{n}{2}\right] - 1 \tag{2}$$

and each element is calculated as:

$$x_{a,b} = \begin{cases} 0, & \text{if } (y_{b-1} > y_{b-a-1}) \land (y_{b-1} > y_{b+a-1}) \\ r & \text{otherwise} \end{cases} (3)$$

where *r* is a random number such that 1 < r < 2. As Equation (3) shows, in the LMS matrix, the element whose value is 0 depicts the local maxima of the input signal. The larger the row number (*b*), the lower the frequency to which the local maxima corresponds. The number of rows of the LMS matrix (*W*) is called a scale and corresponds to the longest distance of data pairs that are compared. While the value of the scale is automatically tuned in the original AMPD algorithm, a fixed scale approach was taken for FPGA implementation of these authors to enable efficient pipeline structure [22]. As the final step of the algorithm, the variance of the elements of the LMS matrix is calculated for each column as:

$$\sigma_b = \frac{1}{W-1} \sum_{a=1}^{W} \left[ \left( x_{a,b} - \frac{1}{W} \sum_{k=1}^{W} x_{a,b} \right)^2 \right]^{\frac{1}{2}}$$
(4)

where  $b \in \{1, 2, 3, ..., n\}$ . If the value of  $\sigma_b$  is close enough to 0, the corresponding input data  $y_b$  can be considered as the local maxima in terms of most frequencies and is detected as a peak point.

In the FPGA implementation of these authors, row-wised parallelism is spatially extracted in the LMS calculation [22]. Concretely, the comparison operations in Equation (3) are simultaneously performed for each  $a \in \{1, 2, 3, ..., W\}$ . In addition, the pipeline structure is thoroughly introduced to exploit temporal parallelism. Since the time-series input data are given in a one-by-one manner synchronized with a clock signal, a total of W comparison operations are executed for each clock cycle. This means, after an initial pipeline latency, the pipelined hardware outputs a peak detection result for a single input data item  $(y_i)$  at every clock cycle. Thanks to the pipelined architecture, this FPGA implementation can achieve a high degree of throughput while the essential pipeline latency imposed of the scale still remains to be a problem for real-time applications such as feed-back control systems.

#### 3. Proposed Method

In the previous work of these authors [22], it was shown that the best detection results were obtained when the value of the scale is 33 and 65. This means that the pipeline latency cannot be smaller than 65 clock cycles, which corresponds to 650 ns when the system is clocked at 100

# INTERNATIONAL JOURNAL of RENEWABLE ENERGY RESEARCH A.M. COLAK et al., Vol.11, No.1, March, 2021

MHz. The source of this essential latency is comparison operations in Equation (3). According to Equation (3), among the data newer than  $y_{b-1}$ ,  $y_{b+W-1}$  has the largest distance to  $y_{b-1}$  as a comparison target. Since  $y_{b+W-1}$  is newer than  $y_{b-1}$ , when the system gets  $y_{b-1}$ ,  $y_{b+W-1}$  is not available yet and the comparison cannot be performed. Thus, the system must wait for W clock cycle until  $y_{b+W-1}$  gets available and this wait period imposes the essential pipeline latency.

As long as the calculation structure, which is called operation stencil in computer science, expressed in Equation (3) is hold, it is impossible to reduce the pipeline latency to less than W. Therefore, in this study, the pipeline structure designed earlier is modified. Concretely, the operation stencil in Equation (3) is modified so that the operation can be performed without waiting W clock cycles. The key idea is the introduction of an asymmetric stencil of data comparison parameterized by a window approach (WA). Instead of Equation (3), the proposed approach uses the following comparison stencil:

$$x_{a,b} = \begin{cases} 0, \text{ if } (y_{b-1} > y_{b-a-1}) \land (y_{b-1} > y_{min(b+a-1, b+W-1-WA),}) \\ r & \text{otherwise} \end{cases}$$
(5)

where *r* is a random number such that 1 < r < 2. Since the newest data required to perform the comparisons in Equation (5) is changed to  $y_{b+W-1-WA}$ , the clock cycles to be waited before execution is reduced from *W* to W - WA, reducing the pipeline latency by *WA* clock cycles. Note that the number of comparison operations is the same as that of Equation (3).

However, when a > WA,  $y_{b-a-1}$  and  $y_{b+W-1-WA}$  are compared with  $y_{b-1}$ , and thus the two comparison targets have different distances t  $y_{b-1}$ , which is called an asymmetric stencil. This asymmetricity introduces approximation as the cost of latency reduction, since the comparison target is changed. However, by tuning the value of WA, the balance between the pipeline latency and peak detection accuracy. This trade-off relationship is later discussed in Section 4, with evaluation results.



Fig. 1. Pipeline Structure.

Figure 1 depicts the concept of the proposed pipeline structure corresponding to Equation (5), where *D* represents each data. When WA = 0, the pipeline of comparison is the same to the original structure. For WA = 1, the comparison stencil becomes asymmetric. The distance between the newest input and the peak candidate is shorten by 1, compared to the longest distance between the peak candidate and the oldest comparison target, so that the pipeline latency is reduced by 1 clock cycle.

#### 4. Evaluation

#### 4.1. Assessment with an FPGA Implementation

Since the number of comparisons of the proposed method is reduced and then compared to the original Equation (3), the proposed method introduces an approximation in the peak detection process. To reveal trade-offs between latency reduction and detection accuracy, the out-FPGA implementation experiments are carried out. Figure 2 depicts the experiment setup. The 12-bit DC919AF with 100 MHz maximum system frequency has been utilized as an analog-to-digital converter (ADC) and implemented with the Xilinx Kintex-7 XC7K325T FPGA board.

The accuracy of detection was evaluated with simulations in terms of the precision and recall as follows.

Precision = 
$$\frac{TP}{FP+TP}$$
 where, TP = true positive / FP = false positive

Recall =  $\frac{TP}{FN+TP}$  where, TP = true positive / FN = false negative

As a benchmark, the signal data of phase-to-phase voltage data was used [23]. The corresponding dataset contains 4470 data points.



Fig. 2. FPGA design and experimental setup

In Table 1, evaluation results for the design with a scale of 33 are shown. The latency is reduced from 230 ns to 110 ns with increasing WA to 12. On the other hand, the recall was degraded from 56% to 11%. The precision was not affected by the asymmetric comparison stencil. It is also managed to maintain the 100 MHz clock frequency regardless of the WA value. In terms of a trade-off balance, the best WA value seems to be around 6. However, the

# INTERNATIONAL JOURNAL of RENEWABLE ENERGY RESEARCH A.M. COLAK et al., Vol.11, No.1, March, 2021

recalls were rather small even for the original algorithm, suggesting the selection of the scale value was not appropriate for the input data.

Table 1. Evaluation results for scale 33.

| WA    | Latency | System    | Precision | Recall |
|-------|---------|-----------|-----------|--------|
|       | (ns)    | Frequency |           |        |
|       |         | (MHz)     |           |        |
| WA=0  | 230     | 100       | 82%       | 56%    |
| WA=2  | 210     | 100       | 82%       | 50%    |
| WA=4  | 190     | 100       | 82%       | 42%    |
| WA=6  | 170     | 100       | 82%       | 37%    |
| WA=8  | 150     | 100       | 82%       | 34%    |
| WA=10 | 130     | 100       | 82%       | 31%    |
| WA=12 | 110     | 100       | 82%       | 11%    |



Fig. 3. Study of accuracy and recall for scale 33.

# Table 2. Evaluation results for scale 65.

| WA    | Latency<br>(ns) | System Frequency<br>(MHz) | Precision | Recall |
|-------|-----------------|---------------------------|-----------|--------|
| WA=0  | 400             | 100                       | 88%       | 81%    |
| WA=2  | 380             | 100                       | 88%       | 79%    |
| WA=4  | 360             | 100                       | 88%       | 78%    |
| WA=8  | 320             | 100                       | 88%       | 77%    |
| WA=16 | 240             | 100                       | 88%       | 60%    |
| WA=31 | 90              | 100                       | 88%       | 36%    |



Fig. 4. Study of accuracy and recall for scale 65.



Fig. 5. Peak detected by the AMPD module for L1-L3 line voltage.



Fig. 6. Peak detected by the AMPD module for L1-L3 line voltage.



Fig. 7. Latency with scale 33 and scale 65

In Figure 3, the *WA* is analyzed by using the scale 33 module. In other words, 33 data in each clock is analyzed and successfully determined the peak points. A precision success rate of 82% for the scale 33 is found.

In Table 2, the evaluation results for the design with a scale of 65 is analyzed. Compared to the results in Table 1, this scale value achieves better detection recalls. The latency is reduced from 400 ns to 90 ns at the cost of recall degradation from 81% to 36%. Again, the precision of 88% was maintained regardless of *WA*. In this case, the design with *WA* value of 8 reduced the latency by 20% with a reasonable recall compromise.

In Figure 4, WA is analyzed using the scale 65 module. Thus, in this design, 65 data on each clock and found a precision rate of 88% is designed.

Figure 5 and Figure 6 illustrate detection results of designs in different scales. The right peak points have been calculated and latency time has been reduced by the pipeline structure developed.

# 4.2. Latency Comparison

In this study, the computational time obtained by creating the WA module is reduced. For this reason, WA pipeline structure is built, so it is requested to show the FPGA's high speed by lowering the latency. In doing so, it is examined two types of design, scale 33 and scale 65. When the AMPD method with CPU (Intel Xeon E3-1225 at 3.3GHz) under normal conditions is applied, it takes approximately 5.8 us. But, with the designed pipeline structure, this time was reduced to a value like 320 ns. It is believed that this is a good result to show FPGA's high speed in real time.

### 5. Conclusion

In this study, a latency reduction technique for an FPGA-based real-time peak detection method for time series data was proposed, by introducing a pipelined asymmetric stencil for comparison operations in the AMPD algorithm. The evaluation results revealed that, the latency times can be reduced from 230 ns to 110 ns for the scale of 33 and then from 400 ns to 90 ns for the scale of 65. One drawback of this approach is that the modified pipeline structure introduces a kind of approximation in the original AMPD algorithm, causing some false detection of peak points around the real peak points. Although the experimental results showed that the balance between the latency and accuracy can be tuned by adjusting a parameter of the asymmetric stencil structure, one of the interesting future work is to address how to mitigate the detection of unwanted peak points while reducing the latency time.

# References

- [1] J. Li, Y. Li, W. Zhao and M. Jiang, "Diffusion enhancement model and its application in peak detection", Chemometrics and Intelligent Laboratory Systems, vol. 189, pp. 130-137, 2019.
- [2] Y. Zheng, R. Fan, C. Qiua, Z. Liu and D. Tian, "An improved algorithm for peak detection in mass spectra based on continuous wavelet transform", International Journal of Mass Spectrometry, vol. 409, pp. 53-58, 2016.
- [3] S.S. Kumar, N. Mohan, P. Prabaharan and K.P. Soman, "Total variation denoising based approach for R-peak detection in ECG signals", Procedia Computer Science, vol. 93, pp. 697-705, 2016.
- [4] J. Rahul, M. Sora and L.D. Sharma, "A novel and lightweight P, QRS, and T peaks detector using adaptive thresholding and template waveform", Computers in Biology and Medicine, 2021.
- [5] S. Vadrevu and M.S. Manikandan, "Effective systolic peak detection algorithm using variational mode decomposition and center of gravity", IEEE Region 10 Conference, 22-25 November 2016, Singapore.
- [6] F. Scholkmann, J. Boss and M. Wolf, "An Efficient Algorithm for Automatic Peak Detection in Noisy periodic and Quasi-Periodic Signals," Algorithms, vol.5, pp.588-603, 2012.
- [7] MN. Schmidt, T.S. Alstrøm, M. Svendstorp and J. Larsen, "Peak detection and baseline correction using a convolutional neural network", International Conference on Acoustics, Speech and Signal Processing, 12-17 May 2019, Brighton, UK.
- [8] F. Liu, X. Tong, C. Zhang, C. Deng, Q. Xiong, Z. Zheng, P. Wang, "Multi-peak detection algorithm based on the Hilbert transform for optical FBG sensing", Optical Fiber Technology, vol. 45 pp. 47-52, 2018.
- [9] T. Bodendorfer, M.S. Muller, F. Hirth and A.W. Koch, "Comparison of different peak detection algorithms with regards to spectrometic fiber Bragg grating interrogation systems", International Symposium on Optomechatronic Technologies, 21-23 September 2009, Istanbul, Turkey.
- [10]G. Tolt, C. Grönwall and M. Henriksson, "Peak detection approaches for timecorrelated single-photon counting three-dimensional lidar systems", Optical Engineering vol. 57, no. 3, 031306, 2018.
- [11]H. Guo, S. Cui and X. Xu, "Design and implementation of voltage peak detection based on Fourier analysis", Advances in Computer Science Research, vol. 94, pp. 99-102, 2019.
- [12]Q. Wu, S. Wang, C. Liao, Z. Tang, H. Luo, S. Huang, and L. Deng, "A mV-level real-time peak-voltage

detection circuit based on differential structure", Review of Scientific Instruments, vol. 92, 034713, 2021.

- [13]P.V. Manitha, M.G. Nair and T. Thakur, "Fundamental voltage peak detection controller for series active filters", Electric Power Systems Research, vol. 184, 106315, 2020.
- [14]A. Ahmad, A. Khandelwal and P. Samuel, "Golden band search for rapid global peak detection under partial shading condition in photovoltaic system", Solar Energy, vol. 157, pp. 979-987, 2017.
- [15]W.C. Lee and T.K. Lee, "Peak detection method using two-delta operation for single voltage sag", International Power Electronics Conference, 18-21 May 2014, Hiroshima, Japan.
- [16]N. Kumar, I. Hussain, B. Singh and B.K. Panigrahi, "Peak power detection of PS solar PV panel by using WPSCO", IET Renewable Power Generation, vol. 11, no. 4, pp. 480-489, May 2017.
- [17]Z. Shi and H. Liu, "STM32F4 based real-time peak detection of FBG", 15th International Conference on Optical Communications and Networks, pp. 1-3, 24-27 September 2016, Hangzhou, China.
- [18]M. Schirmer, F. Stradolini, S. Carrara and E. Chicca, "FPGA-based approach for automatic peak detection in cyclic voltammetry", IEEE International Conference on Electronics, Circuits and Systems, pp. 65-68, 11-14 December 2016, Monte Carlo, Monaco.
- [19]R. Ghozzi, S. Lahouar, C. Souani and K. Besbes, "Peak detection of GPR data with lifting wavelet transform (LWT)", International Conference on Advanced Systems and Electric Technologies pp. 34-37, 14-17 January 2017, Hammamet, Tunisia.
- [20]W.C. Lee, K.N. Sung and T.K. Lee, "Fast detection algorithm for voltage sags and swells based on delta square operation for a single-phase inverter system", Journal of Electrical Engineering and Technology, vol. 11, no. 1, pp. 157-166, January 2016.
- [21]A.T. Tzallas, V.P. Oikonomou and D.I. Fotiadis, "Epileptic spike detection using a Kalman filter based approach", International Conference of the IEEE Engineering in Medicine and Biology, pp. 501-504, 30 August-3 September 2006, New York, USA.
- [22]A. M. Colak, T. Manabe, Y. Shibata, and F. Kurokawa, "Peak Detection Implementation for Real-Time Signal Analysis Based on FPGA," Journal of Circuits and Systems, vol. 9, no. 10, pp. 148-167, 2018, October 31. A.M. Colak, Y. Shibata and F. Kurokawa, "Peak Point
- [23]Detection of Phase-to-Phase Effective Voltages for Smart Grids: A Comparative Study," IEEE 6th International Conference on Renewable Energy Research and Applications (ICRERA), San Diego, CA, 2017, pp.1149-1153.