## arm

## Safety and Security Challenges in Hardware IP Development

Verification Futures Conference 2023 US

Vivek Vedula 14-Sep-2023

© 2023 Arm

#### Semiconductor Ecosystem: Arm's Role



Source: Semiwiki.org

### Key Pillars of Dependable Computing

#### -- Functional correctness

- "Does the system behave correctly under normal operating conditions?"
- "For every possible input, does the output satisfy the specification?"
- Pre-silicon: Simulation, Emulation, Formal verification
- Post-silicon: DFT, At-speed functional testing, Board-level testing

#### - Safety and Security

- "Does the functionally-correct system behave predictably under adverse operating conditions?"
- Adverse conditions due to Murphy's law: Safety
  - + Anything that can go wrong will go wrong, and at the worst possible time.
- Adverse conditions due to Devil's law: Security

 $\,+\,$  Anything that can go wrong shall be made to go wrong, and at every possible time.

arm

Safety



© 2023 Arm

### **Example: Electrical power steering**

- An example of a control system which must demonstrate functional safety

- Must continue to function or at least behave predictably in event of a fault
- By predictable behaviour we mean it must gracefully go into a known "safe" state

 Functionally safe systems aim at preventing hazardous behaviour in event of a fault



### Sources of Failures in E/E Systems

Systematic Failures: Design Faults, wrong specs, software errors
Requires safety analysis on the design to identify causes

- Random Failures: "Acts of God"

- Permanent faults/Hard Errors
  - + Ex: Physical failure of silicon over time

- Transient/Soft Errors
  - + Ex: Atomic particle causing a bit-flip





#### Fault Tolerant Time Interval -FTTI

FTTI: Time-span in which a fault or faults can be present in a system before the hazardous event Fault Reaction Time: Time span from detection of fault to reach the safe state



### Safety Product Development Lifecycle at System-level

- Identify and classify hazardous events caused by malfunctioning behavior
- Derive safety goals for each hazardous event
- -- Generate FSRs to meet each safety goals
- -- Generate TSRs to meet each FSR
- -- Identify requirements for HW and SW



### Hardware Safety Requirements' Mitigation



### What makes it challenging for IP-Level design?

- Little or no context of how the IP is going to be used
- -- Unknown operating conditions
  - False identification of failures (random and systematic)
- HARA is highly subjective and unreliable
  - Resultant safety goals and requirements are questionable
- IPs are expected to be highly configurable for usage in multiple products
  - Risk of over-engineering the safety mitigations (such as dual lock-step)
- -- Long duration of products in the field (15+ years in Automotive)
  - Higher chance of random failures over the life of the product

### Challenge #1: Assumptions of Use

- -- Identification of potential use-cases
- + Operating conditions where the system will be deployed
- End-of-life for each use-case
- Process technologies and their failure rates
- + Safety functions that will be enabled
- Probability of failure of safety functions

### Challenge #2: Implementation of Safety Requirements

- + Safety often impacted by interacting entities
- Ensuring reasonable design trade-offs with safety functions
- Gate-level fault analysis requires functionality to be implemented first
- -- Detection and control need to happen well within FTTI of the product

### Challenge #3: Verification of the Safety Measures

- + Traditional fault models are unreliable for modeling **all** safety-related failures
- + Safety verification at the IP-level does not directly translate to system-level
- Formal-based tools struggle with gate-level analysis
- Undetected faults require tedious analysis to ensure their impact on Safety

**arm** Security

© 2023 Arm



#### **Basic Security Framework**

- + Security is the state of being "protected" against unauthorized/illegal access to the "assets" through malicious "attacks"
- + Asset Something of value
  - Examples: data, code, encryption keys.
- + Attack (*aka* Exploit) A deliberate action performed to compromise an asset
  - Examples: WannaCry ransomware attack



#### **Classification of Attacks: CIA Traid**



- Confidentiality is the restriction of access to the asset for approved users only
- Integrity is the maintenance of consistency, accuracy and trustworthiness of the asset
- Availability is providing access to the asset when required
- Any action that could compromise any of these properties of an asset is referred to as a *Threat*

arn

### **Vulnerabilities Vs Weaknesses**

+ Vulnerability: A flaw or an error in the system that can be used to carry out a threat

- Ex #1: Bypass of normal privilege checks needed to access data belonging to the OS (Meltdown)
- Ex #2: Leakage of data through observable properties of speculative execution and branch prediction

+ Weakness: A design flaw that could potentially contribute to the introduction of a vulnerability

- Not all weaknesses lead to vulnerabilities
- Ex #1: Processor optimization that removes/modifies security mechanisms
- Ex #2: Information Exposure through Microarchitectural State after Transient Execution

### **Chip-level Vulnerabilities: Examples**

#### Test and Debug Infrastructure **Root-of-Trust** Nailgun Attack: Privilege escalation via Arm's 111111 FPGA-based root of trust bitstream Product CISCO multi-core debug infrastructure requiring no can by modified by attacker Home / Cisco Security / Security physical access Cisco Security Advisory Cisco Secure Boot Hardware Tampering Vulnerability Memory-mapped Interface cisco-sa-20190513-secureboot CVE-2019-1649 Advisory ID: CWE-284 Download PDF 2019 May 13 17:30 GMT First Published: 🛥 Email Last Updated: 2019 August 2 13:57 GMT High Version 1.13: Final Debug Target Debug Host No workarounds available Workarounds: (TARGET) (HOST) Clsco Bug IDs: CSCvn77141 CSCvn77142 CSCvn77143

#### Micro-arch Efficiency



Meltdown breaks the most fundamental isolation between user applications and the operating system. This attack allows a program to access the memory, and thus also the secrets, of other programs and the operating system.



Spectre breaks the isolation between different applications. It allows an attacker to trick error-free programs, which follow best practices, into leaking their secrets. In fact, the safety checks of said best practices actually increase the attack surface and may make applications more susceptible to Spectre



arm

18 © 2023 Arm

#### Why Hardware?

#### MITRE records an increase in hardware vulnerabilities in the last 10 years

- Relied upon as the root-of-trust (Crypto, TEE etc)
  - Hardware attack ROI is high; breaks large investments in cybersecurity
- Aggressive Perf/Power optimizations
- Higher configurability (Ex: chicken bits) for performance, maintenance
  - Integration challenges
- Increased shift towards hardware-based security architectures



Source: NIST/MITRE 12/2020

19 © 2023 Arm

#### Challenge #1: IP-level Scope

- -- IP-level security is foundational
- -- Out-of-context security requirements
- -- Specification cannot capture all possible malicious intent
- -- Risk analysis is highly subjective
- Unknown ROI for mitigations



arm

What is attackable ?

### "Top 10" Hardware Security Weaknesses

| CWE #           | Description                                                           | Impact at IP-level                                                                          |
|-----------------|-----------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| <u>CWE-1189</u> | Improper Isolation of Shared Resources on System-on-a-Chip (SoC)      | Disable resource sharing between trusted/untrusted agents                                   |
| <u>CWE-1191</u> | On-Chip Debug and Test Interface With Improper Access Control         | Enable authentication of debug interface                                                    |
| <u>CWE-1231</u> | Improper Prevention of Lock Bit Modification                          | Ensure appropriate lock bit protection mechanism                                            |
| <u>CWE-1233</u> | Security-Sensitive Hardware Controls with Missing Lock Bit Protection | Ensure appropriate lock bit protection mechanism                                            |
| <u>CWE-1240</u> | Use of a Cryptographic Primitive with a Risky Implementation          | Disallow usage of non-standard crypto primitives                                            |
| <u>CWE-1244</u> | Internal Asset Exposed to Unsafe Debug Access Level or State          | Allow only trusted agents to access security-sensitive assets over debug interface          |
| <u>CWE-1256</u> | Improper Restriction of Software Interfaces to Hardware Features      | Ensure access control for SW-controllable features such as frequency and voltage (Ex: DVFS) |
| <u>CWE-1260</u> | Improper Handling of Overlap Between Protected Memory Ranges          | Ensure priority scheme for programmable memory protection regions                           |
| <u>CWE-1272</u> | Sensitive Information Uncleared Before Debug/Power State Transition   | During state transitions, ensure clearing of data that is not required in the next state.   |
| <u>CWE-1274</u> | Improper Access Control for Volatile Memory Containing Boot Code      | Ensure that the volatile memory region is prevented from being modified by untrusted agents |

Source: cwe.mitre.org

### Why is security verification so difficult in HW?

- + Attacking is fundamentally easier than protection
- Attackers need just **one** vulnerability
- -- Verification needs to ensure 100% of the vulnerabilities are covered
  - Impossible to know the list of all vulnerabilities
- + Proving the non-existence of unknown/0-day vulnerabilities is impossible
- Attack-oriented bug hunting needs "out-of-the-box", "malicious" mind-set

orn

### Challenge #2: Verification Tools and Methodology

#### **Functional Requirements**

- FSM must never transition to "Secure" state after reaching "Test" state
- CPU must not be interrupted if running secure code
- Within scope of SVA-based specification
- Lends well to traditional solutions (Sim/Formal/Emulation)

#### **Data-Protection Requirements**

- Data in secure location must not be visible to the CPU, if it is not in secure mode (Confidentiality)
- Secure registers must never be written by a nonsecure agent (Integrity)
- Difficult/impossible to specify using SVA
  - Do not scale effectively due to difficulty of identifying all prop paths in large designs
- Legacy EDA solutions are inadequate

#### How do you measure coverage?

### Safety vs. Security



# arm

## Security meets Safety

© 2023 Arm

### Safety and Security: Examples

#### Safety application



Patient-controlled drug delivery

#### Safety application



Braking system



Avionics



### Security and Safety: Challenges with co-existence of reqs

+ Safety Hazards can potentially cause violation of Security goals and vice versa

- A security attack may result in a new safety failure mode previously thought to be benign
  - Ex: a fault made observable through a malicious attack
- A safety failure mode may expose a security vulnerability that was previously considered secure
  - Ex: a fault on register access control may expose the register contents to an external interface
- Long safety-product deployment lifecycle makes threat-modeling difficult



### **Potential Conflicts in Requirements**

- -- Trade-off between performance overhead to process security functions vs hard realtime safety function to respond to hazards
  - Conflict between "fail-safe" state and "fail-secure" state

+ Countermeasures for a security threat could exacerbate a safety hazard

- Security threat of malware trying to read out the private keys from the HSM or forcefully overwrite its firmware
  - When detected, the countermeasure (HSM) would shut down which is a violation to safety



### **Examples of Safety and Security Failures**

|                                                    |                                                       | Imp    | act      |
|----------------------------------------------------|-------------------------------------------------------|--------|----------|
| Function                                           | Potential Risk                                        | Safety | Security |
| Authentication or check integrity of flash content | Authentication is ok while flash is corrupted         | High   | High     |
| Authentication or check integrity of flash content | Authentication is not ok while flash is not corrupted | Low    | low      |
| Authorization of transmission to COM buses         | Authorization ok to untrusted destination             | Low    | High     |
| Decryption of secure data                          | Decrypted data is erroneous                           | Low    | High     |
| Reset                                              | Reset asserted when not expected                      | High   | Low      |
| Detection of side-channel attack                   | Do not signal failure while attack                    | High   | High     |
| Detection of side-channel attack                   | Assert signal when there is no attack                 | High*  | High     |

\*Depending on how frequent this happen, it might cause denial of service.



### Summing it all up..

- + Safety and Security are key pillars of dependable electronic/electronic systems
- + IP from multiple providers used in these systems
- -- Assumptions-of-use could have a significant impact
- -- Customized solutions needed to address the challenges
- -- Both requirements within the same IP magnifies these challenges multi-fold

orn

Questions/Comments?

### E-MAIL: VIVEK.VEDULA@ARM.COM

31 © 2023 Arm

|            | + |  |  |  |   |                   |                    |  |
|------------|---|--|--|--|---|-------------------|--------------------|--|
|            | + |  |  |  |   |                   | inke               |  |
|            |   |  |  |  |   | + Gr              | cias<br>azie<br>討谢 |  |
|            |   |  |  |  |   | ありが<br>Asa        | とう<br>ante         |  |
|            |   |  |  |  | ÷ |                   | lerci              |  |
|            |   |  |  |  | + | <sub>+</sub> धन्य | वाद<br>iitos       |  |
|            |   |  |  |  |   |                   | شکرًا<br>זקוא      |  |
| © 2023 Arm |   |  |  |  |   | + î               | תודו               |  |

#### **Functional Safety Standards**



#### ISO International Organization of Standards

ISO 26262 is for Automotive Developed with OEM

33 © 2023 Arm

#### **Common Criteria**

- -- International standard/guidelines for computer-security certification
- Enables objective evaluation of a product's compliance to defined set of requirements
- -- Two key components: Protection profiles and EALs

#### **Protection Profiles**

- Standard set of security requirements for a specific type of product
  - Derived through threat modeling
  - Identification of vulnerabilities
  - Risk-assessment

#### **Evaluation Assurance Level**

- Rating of the rigor/depth of evaluation of the mitigations (SARs)
  - Defines how thoroughly the product is evaluated/tested
  - Higher EAL =/=> Higher security

#### **Security Standards**



ISO International Organization of StandardsSAE Society of Automotive EngineersIEC International Electrotechnical CommissionUL Underwriters Laboratories