### Testing and verifying the tools of hardware design

#### John Wickerson Imperial College London

joint work with Yann Herklotz, Michalis Pardalos, Quentin Corradi, George Constantinides, Alastair Donaldson, Emiliano Morini, and Laura Pozzi

> Verification Futures Conference 01 July 2025



more rigorous engineering, they can be made more suitable for safety- and security-critical settings.

### Some EDA tools



All these tools are too buggy. With more rigorous engineering, they can be made more suitable for safety- and security-critical settings.



# High-level synthesis

```
int main() {
    int x[2] = {3, 6};
    int i = 1;
    return x[i];
  }
```

```
module main(reset, clk, finish, return_val);
1
 2
        input [0:0] reset, clk;
 3
        output reg [0:0] finish = 0;
        output reg [31:0] return_val = 0;
 4
        reg [31:0] reg_3 = 0, addr = 0, d_in = 0, reg_5 = 0, wr_en = 0;
 5
        reg [0:0] en = 0, u_en = 0;
 6
        reg [31:0] state = 0, reg_2 = 0, reg_4 = 0, d_out = 0, reg_1 = 0;
 7
        reg [31:0] stack [1:0];
 8
        // RAM interface
 9
10
        always @(negedge clk)
11
          if ({u_en != en}) begin
12
            if (wr_en) stack[addr] <= d_in;</pre>
13
            else d_out <= stack[addr];</pre>
14
            en <= u_en;
15
          end
16
        // Data-path
17
        always @(posedge clk)
18
          case (state)
19
            32'd11: reg_2 <= d_out;</pre>
20
            32'd8: reg_5 <= 32'd3;
21
            32'd7: begin u_en <= ( ~ u_en); wr_en <= 32'd1;
22
                         d_in <= reg_5; addr <= 32'd0; end
23
            32'd6: reg_4 <= 32'd6;
24
            32'd5: begin u_en <= ( ~ u_en); wr_en <= 32'd1;
25
                         d_in <= reg_4; addr <= 32'd1; end
26
            32'd4: reg_1 <= 32'd1;
27
            32'd3: reg_3 <= 32'd0;
            32'd2: begin u_en <= ( ~ u_en); wr_en <= 32'd0;</pre>
28
29
                         addr <= {{{reg_3 + 32'd0} + {reg_1 * 32'd4}} / 32'd4}; end
30
            32'd1: begin finish = 32'd1; return_val = reg_2; end
31
            default: :
32
          endcase
        // Control logic
33
34
        always @(posedge clk)
35
          if ({reset == 32'd1}) state <= 32'd8;
36
          else case (state)
37
                 32'd11: state <= 32'd1;
                                                 32'd4: state <= 32'd3;
38
                 32'd8: state <= 32'd7;
                                                 32'd3: state <= 32'd2;
                                                 32'd2: state <= 32'd11;
39
                 32'd7: state <= 32'd6;
40
                 32'd6: state <= 32'd5;
                                                 32'd1 ;
                 32'd5: state <= 32'd4;
                                                 default: ;
41
42
               endcase
43
      endmodule
```

# **Testing HLS tools**

• We generated 6700 random C programs and gave them to four HLS tools (Intel i++, Xilinx Vivado HLS, LegUp, and Bambu).



# Some bugs in HLS tools

```
int a[2][2][1] = {{{0},{1}},{{0},{0}};
    int main() {
    a[0][1][0] = 1;
    }
```

#### This code crashes LegUp 4.0

```
1 volatile unsigned int g = 0;
_{2} int a[256] = {0};
_3 int c = 0;
5 void d(char b) {
   c = (c \& 4095) \land a[(c \land b) \& 15];
7 }
void e(long f) {
    d(f); d(f >> 8); d(f >> 16); d(f >> 24);
    d(f >> 32); d(f >> 40); d(f >> 48);
11
12 }
13
14 int main() {
   for (int i = 0; i < 56; i++)</pre>
15
      a[i] = i;
  e(g);
  e(-2L);
18
    return c;
20 }
```

Hardware generated by Vivado HLS from this code outputs the wrong value

# Aside: bugs over time



# Can we have more reliable HLS tools?

Vericert



Vericert



#### Correctness

• We prove the following theorem:

 $\forall C, V, B, \quad \mathsf{HLS}(C) = \mathsf{OK}(V) \land Safe(C) \implies (V \Downarrow B \implies C \Downarrow B).$ 

- Roughly:
  - 1.5 person-years of effort,
  - 3k lines of implementation,
  - 8k lines of Coq proof.



#### Correctness

Beware of bugs in the above code; I have only proved it correct, not tried it.

#### • From executing 155267 randomly generated C programs:



 After fixing the bug in Vericert's pretty-printer, we found 0 runtime errors.

#### Performance

- Measured performance using the PolyBench/C benchmark.
- 27 of the 30 programs in the benchmark are applicable (the others use floats).
- We synthesised the designs for a Xilinx FPGA and measured their area and running time.
- We compared against an open-source HLS tool called LegUp.

#### Performance (1st attempt)



#### Performance



# Further reading

• Vericert is fully open-source and available on Github:



https://github.com/ymherklotz/vericert

# **Further reading**

#### 00PSLA '21

#### Formal Verification of High-Level Synthesis

YANN HERKLOTZ, Imperial College London, UK JAMES D. POLLARD, Imperial College London, UK COLOR RAMANATHAN, Imperial College London, UK · College London, UK

f software into hardware, is rapidly TTT C promises

#### PLDI '24



Hyperblock Scheduling for Verified High-Level Synthesis YANN HERKLOTZ, Imperial College London, United Kingdom JOHN WICKERSON, Imperial College London, United Kingdom High-level synthesis (HLS) is the automatic compilation of software programs into custom hardware designs. With programmable hardware devices (such as FPGAs) now widespread LUS is increasingly existing HLS tools are too unreliable for sofety and

#### Where next?

- Operation scheduling
- Resource sharing
- Constant propagation, loop pipelining, ...
- Support more of the C language as input



#### 20

### Logic synthesis

• We generated about 100,000 random Verilog designs.

```
1 module top #(parameter param0 = 5'h9e23848124)
2 (y, clk, wire0, wire1, wire2, wire3);
    // *** Declarations ***
    output wire [(5'h31):(1'h0)] y;
    input wire [(1'h0):(1'h0)] clk;
    input wire [(3'h6):(1'h0)] wire0;
     input wire [(4'ha):(1'h0)] wire1;
    input wire signed [(4'ha):(1'h0)] wire2;
    input wire [(4'hb):(1'h0)] wire3;
    reg [(3'h2):(1'h0)] reg20 = (1'h0);
10
    reg [(3'h5):(1'h0)] reg19 = (1'h0);
11
    reg [(3'h4):(1'h0)] reg18 = (1'h0);
12
    reg [(2'h2):(1'h0)] reg17 = (1'h0);
    reg [(4'ha):(1'h0)] reg16 = (1'h0);
14
    reg signed [(4'h9):(1'h0)] reg15 = (1'h0);
15
    wire [(3'h6):(1'h0)] wire5;
16
    wire [(2'h3):(1'h0)] wire4;
17
    // *** Assign output ***
18
    assign y =
19
      {reg20, reg19, reg18, reg17, reg16, reg15, wire5, wire4};
20
    // *** Random module items ***
21
    assign wire4 = (((~wire1) ? ((((15'h9ecc51592fdeb04)
22
      ? reg17[(5'h2):(2'h2)] : (reg18 ? wire2 : wire0))
23
24
      ? $unsigned(((-2'ha73a956341f45c0) << reg18)) :</pre>
      wire1[(4'ha):(3'h7)]) - reg18) :
25
      reg15[(4'h9):(3'h7)]) >>> $unsigned($signed((
26
      reg16[(4'ha):(3'h7)] ? ((wire1 && reg16) &&
27
      {reg15, reg15, wire3}) : (reg18 ? (~&wire3) :
28
      (-39'ha7a1419cd4ea34a))))));
29
    assign wire5 = $signed(((wire2 ? (
30
      (-8'h5e411249da4f335) ? (4'hb2fa97daeae9ff) :
31
      wire1) : (wire4 ? wire2 : wire1)) ?
32
      $signed(wire3) : ({(7'hbac46141008d14)} >>>
33
      (&wire0))));
34
    always @(posedge clk) begin
35
      for (reg15 = (1'h0); (reg15 < (2'h2)); reg15 =</pre>
36
           (reg15 + (1'h1)) begin
37
        if (((wire3 == (~(reg16 + wire1))) >=
38
             {$signed(wire0[(2'h2):(1'h0)])})
           reg16 <= ($unsigned($unsigned(wire1)) <</pre>
             wire3[(1'h1):(1'h1)]);
41
        else reg16 <= $unsigned(reg17[(2'h2):(2'h0)]);</pre>
42
        reg17 <= wire3[(1'h0):(1'h0)];</pre>
43
      end
44
      reg18 <= $signed(({wire0} ~^ wire3));</pre>
45
46
    end
    always @(posedge clk) begin
47
      if (wire3[(4'h9):(3'h6)])
48
        reg19 = $signed($unsigned(wire1)) <<</pre>
49
           $unsigned({wire1});
50
      reg20 <= ({({(~|wire3), $unsigned(reg19)} ?</pre>
51
        reg16 : reg15[(2'h2):(1'h1)]),
52
        (~&((wire0 ? wire3 : reg17) ~^ reg18))}
53
        || ((~&(wire3[(4'hb):(4'h9)] ? wire4 : (+wire5)))));
54
    end
55
```

endmodule

56

# Logic synthesis - results

| Tool                                  | Total test cases | Failing test cases | Distinct failing test cases | Bug reports |
|---------------------------------------|------------------|--------------------|-----------------------------|-------------|
| Yosys 0.8                             | 26400            | 7164 (27.1%)       | $\geq 1$                    | 0           |
| Yosys 3333e00                         | 51000            | 7224 (14.2%)       | $\geq 4$                    | 3           |
| Yosys 70d0f38 (crash)                 | 11               | 1 (9.09%)          | $\geq 1$                    | 1           |
| Yosys 0.9                             | 26400            | 611 (2.31%)        | $\geq 1$                    | 1           |
| Vivado 18.2                           | 47992            | 1134 (2.36%)       | ≥ 5                         | 3           |
| Vivado 18.2 (crash)                   | 47992            | 566 (1.18%)        | 5                           | 2           |
| XST 14.7                              | 47992            | 539 (1.12%)        | $\geq 2$                    | 0           |
| Quartus Prime 19.2                    | 80300            | 0 (0%)             | 0                           | 0           |
| Quartus Prime Lite 19.1               | 43               | 17 (39.5%)         | 1                           | 0           |
| Quartus Prime Lite 19.1 (No \$signed) | 137              | 0 (0%)             | 0                           | 0           |
|                                       |                  |                    |                             |             |

### Logic synthesis - example

```
1 module top (y, clk, w1);
     output y;
2
     input clk;
3
     input signed [1:0] w1;
4
   reg r1 = 1'b0;
5
   assign y = r1;
6
     always @(posedge clk)
7
       if ({-1'b1 == w1}) r1 <= 1'b1;</pre>
8
 endmodule
9
```

Vivado incorrectly expands 1 'b1 to 2 'b11 (should be 2 'b01)

### **Further reading**

FPGA '20

### Finding and Understanding Bugs in FPGA Synthesis Tools

Yann Herklotz yann.herklotz15@imperial.ac.uk Imperial College London London, UK

#### ABSTRACT

All software ultimately relies on hardware functioning correctly. Hardware correctness is becoming increasingly important due to the growing use of custom accelerators using FPGAs to speed up applications on servers. Furthermore, the increasing complexity of hardware also leads to ever more reliance on automation, meaning that the correctness of synthesis tools is vital for the reliability of

This paper aims to improve the quality of FPGA synthesis tools the hardware.

by introducing a method to test them automatically using randomly generated, correct Verilog, and checking that the synthesised netlist is always equivalent to the original design. The main contributions of this work are twofold: firstly a method for generating random behavioural Verilog free of undefined values, and secondly a Verilog test case reducer used to locate the cause of the bug that was found. These are implemented in a tool called Verismith. This paper also provides a qualitative and quantitative analysis of the bugs found in Yosys, Vivado, XST and Quartus Prime. Every synthesis tool except Quartus Prime was found to introduce discrepancies between the j.wickerson@imperial.ac.uk Imperial College London London, UK

```
1 module top (y, clk, w1);
```

```
output y;
```

```
input clk;
input signed [1:0] w1;
reg r1 = 1'b0;
```

```
assign y = r1;
always @(posedge clk)
```

```
if ({-1'b1 == w1}) r1 <= 1'b1;</pre>
```

#### endmodule

Figure 1: Vivado bug found automatically by Verismith. Vivado incorrectly expands -1'b1 to -2'b11 instead of -2'b01. The bug was reported and confirmed by Xilinx.<sup>1</sup>

#### 1 INTRODUCTION

Almost all digital computation performed in the world today relies, in one way or another, on a logic synthesis tool. Computation specified in RTL passes through a logic synthesis tool before being implemented on an FPGA or an ASIC. Even designs that are expressed in higher-level languages eventually get synthesised down tation that is executed in software is carried out on







- 95 wire foo;
- 96 some;
- 97 other;
- 98 stuff;
- 99 here;
- 100 wire baz;



- 95 wire foo;
- 96 some;
- 97 other; stuff;
- 98 here;
- 99 wire baz;

wire w95\_foo; wire w99 baz;

### Can we have more reliable logic synthesis tools?

# Verified logic synthesis

• Andreas Lööw at Imperial is building a verified logic synthesis tool in HOL4.

#### Lutsig: A Verified Verilog Compiler for Verified Circuit Development

Andreas Lööw Chalmers University of Technology Gothenburg, Sweden

#### Abstract

We report on a new verified Verilog compiler called Lutsig. Lutsig currently targets (a class of) FPGAs and is capable of producing technology mapped netlists for FPGAs. We have connected Lutsig to existing Verilog development tools, and in this paper we show how Lutsig, as a consequence of this connection, fits into a hardware development methodology for verified circuits in the HOL4 theorem prover. One important step in the methodology is transporting properties proved at the behavioral Verilog level down to technology mapped netlists, and Lutsig is the component in the methodology that enables such transportation.

CCS Concepts: • Hardware  $\rightarrow$  Hardware description languages and compilation; Logic synthesis. The

ecosystem for software development, we find tools for the following development methodology: (i) prove a correctness theorem about your program at the source level, (ii) use a verified compiler to transform your program to machine code, and, lastly, (iii) transport the source-level program correctness theorem down to the generated machine code by composing the source-level program correctness theorem with the compiler correctness theorem. When carried out inside an ITP, the development methodology is capable of producing artifacts with remarkably small trusted computing bases (TCBs) [20]. For example, the verified CakeML compiler [35] and its accompanying formal methods tools hosts such a development methodology inside the ITP HOL4 [32]

CPP '21









}



# **Further reading**

#### DVCon '24

#### Who checks the checkers? Automatically finding bugs in C-to-RTL formal equivalence checkers Emiliano Morini

Alastair F. Donaldson

Imperial College London

Michalis Pardalos Imperial College London michail.pardalos17@imperial.ac.uk

> Laura Pozzi Università della Svizzera italiana (USI) Lugano laura.pozzi@usi.ch

Intel emiliano.morini@intel.com alastair.donaldson@imperial.ac.uk

John Wickerson Imperial College London j.wickerson@imperial.ac.uk

Abstract-C-to-RTL (register-transfer level) formal equivalence checkers (ECs) allow hardware implementations to be compared against software specifications. Thanks to their complete state-space coverage, ECs are trusted to authorise design sign-off. Therefore, ridding ECs of bugs is a top priority. In pursuit of this goal, we have developed Equifuzz, a technique and tool for randomized testing (fuzzing) of SystemC-to-RTL ECs. Equifuzz uses knowledge of SystemC semantics to generate rich designs that are known to be equivalent to trivial RTL designs. It has uncovered 7 unsoundness bugs in major commercial ECs (where the EC claimed equivalence incorrectly), and 5 incompleteness bugs (where the EC failed to prove equivalence between equivalent designs), all of which have been confirmed by the tool vendors. The fact that Equifuzz has been able to find serious bugs in extensively tested, major commercial ECs demonstrates that fuzzing is a valuable complement to the handcrafted tests that EC developers use as standard.

I. INTRODUCTION (ECa) such as Synopsys

found great success at uncovering bugs in many different tools, such as state-of-the-art C compilers [25, 15], graphics shader compilers [3] and OpenCL compilers [17]. Although fuzzing has not, to our knowledge, been applied to ECs before, it has been effective at finding bugs in other tools from the EDA realm, such as FPGA synthesis tools [8] and high-level synthesis tools [6]. Fuzzing has also been used to find bugs in verification tools other than ECs, such as SMT

solvers [24] and software model checkers [26]. We have developed Equifuzz: a technique and tool for randomized

testing of ECs that compare RTL implementations against SystemC specifications. We focus on SystemC because it is accepted by the three major commercial ECs, and often used by their industrial users. Equifuzz works by generating random SystemC programs. These are then compared (using the EC-under-test) against trivial RTL designs that are known to be equivalent. We record a potential bug if the EC

regult other than "equivalent". 1 uncovering 16

#### Can we have more reliable equivalence checkers?

Vera

- An equivalence checker built and proven-correct in Coq.
- Reduces the equivalence problem to an SMT query and uses SMTCoq to solve it.
- Preliminary results on the EPFL benchmarks are encouraging (44 out of 60 verify within a few minutes).
- A subtlety: undefined behaviour.





All these tools are too buggy. With more rigorous engineering, they can be made more suitable for safety- and security-critical settings.

