# Ferroelectrics for future 3D NAND storage technology (Invited)

Prasanna Venkatesan<sup>1</sup> and Asif Khan<sup>1,2</sup>

<sup>1</sup>School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA <sup>2</sup>School of Material Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

## Abstract

dielectric inserts have emerged as a potential solution to the threshold voltages over time for the TDL FEFETs show continued z-scaling in 3D NAND devices. Here, we less than 1% retention loss. However, the GBL FEFETs present a comprehensive optimization of these ferroelec- show significant retention loss resulting from detrapping tric gate stacks in NAND devices for in-storage compute of the MW enhancing charges trapped at the FE-GBL inapplications. This involves (1) exploring the design space terface through the GBL [11]. While MW enhancement is to optimize the memory window (MW) and (2) band engi- achieved irrespective of the position of the dielectric, robust neering for robust retention, and (3) implementing a novel retention is achieved only when the dielectric is laminated disturb mitigation scheme to reduce pass disturb. The opti- in the middle of the ferroelectric gate stack. The retention mized device is then utilized to demonstrate a high-density loss mechanisms have been explored in detail in [17]. in-storage compute solution for protein identification using The evolution of the threshold voltage of the TDL FEFETs open modification search.

## Introduction

emerged as a viable alternative for implementing large AI itive  $V_T$  shift is due electron trapping. In order to mitigate models. However, reliability challenges linked with low electron trapping, we proposed a mitigation scheme where z-pitch and high write voltages hinder further z-scaling, a periodic refresh is applied every M pass disturb pulses to thereby limiting data densities. Replacing the charge trap detrap the electrons and reduce  $V_T$  shift. The efficacy of layer in conventional 3D NAND with a ferroelectric layer the mitigation scheme in reducing pass disturb from 28% has been proposed to mitigate these reliability concerns and down to 16% and 4% with M = 1000 and M = 10, respecenable 3D NAND with over 1000 layers [1-4]. Recently, tively, indicates that the disturb was caused predominantly OLC compatible operation (MW > 7.5 V) at low write by electron trapping rather than polarization reversal [12]. voltage (< 15 V) in FEFETs has been achieved by either System-level benchmarking is performed for the implelaminating a dielectric layer in the middle of the ferroelec- mentation of open-modification search for protein identifitric gate stack (Tunnel dielectric layer, TDL) or placing it cation to quantify the advantages of in-storage computing next to the gate to act as a gate blocking layer (GBL). In in high-density FE-NAND over other alternatives (Fig. 5). this work, we explore different dielectrics and geometries FE-NAND is found to be more energy efficient and faster to optimize for large MW enhancement, study retention of than conventional 3D NAND and other solutions while FEFETs with TDL and GBL gate stacks, and characterize offering increased data density. disturb in the FEFET with the best retention. In the last Conclusion section, we benchmark these FE-NAND devices against Ferroelectric NAND devices can be optimized to enthe incumbent solutions for open modification search for able ultra-high density in-storage compute in scales preprotein identification [5].

### **Results and Discussions**

effect of their position in the ferroelectric gate stack, two- compared to CTF NAND devices, make FE-NAND determinal FE-MOSCAPs were fabricated using the process vices as a suitable candidate for large dataset processing. outlined in [6-7]. The MWs of these gate stacks are mea- Acknowledgements sured from the C-V curves and summarized in Fig. 2 [8]. This work was supported by Samsung Electronics and It is identified that Al<sub>2</sub>O<sub>3</sub> as a TDL and SiO<sub>2</sub> as a GBL SUPREME, one of the seven SRC-DARPA JUMP 2.0 cenexhibit the largest MWs. A hybrid gate stack with Al<sub>2</sub>O<sub>3</sub> TDL and SiO<sub>2</sub> GBL exhibits further MW enhancement, enabling a MW as high as 11 V with the same gate stack thickness, a 4.5x improvement over the reference 18 nm stacks are fabricated following the process flow shown in [9-10] for the retention and disturb characterization.

Retention was characterized in the TDL and GBL FEFETs Band-engineered ferroelectric field-effect transistors with using the pulse scheme shown in Fig. 3. The evolution of

with pass disturb cycles was measured as shown in Fig. 4. Significant shift in  $V_T$  (28%) is observed with pass disturb 3D NAND based in-storage compute solutions have pulses of  $V_{pass} = V_T + 2$  V. It is hypothesized that the pos-

viously unprecedented, potentially allowing for petabyte scale memories. Such parallelization and in-storage op-In order to study the role of different dielectrics and the eration coupled with the low write energies and latencies

ters. Fab was done at the IEN, supported by the NSF-NNCI program (ECCS-1542174).

#### References

[1] Han et al., IEDM (2023). [2] Lim et al., IEDM (2023). [3] Das et al., IEDM (2023). [4] Kim et al., VLSI (2024). [5] Kang et al., HZO gate stack. The origin of MW enhancement caused PACT (2022). [6] Fernandes et al., EDL (2024). [7] Fernandes et by these dielectric insert in the ferroelectric gate stacks have al., TED (2024). [8] Das et al., EDTM (2024). [9] Tasneem et al., been described in detail in [3]. FEFETs with reference (19 IEDM (2021). [10] Park et al., JEDS (2024). [11] Venkatesan et nm HZO), TDL (8/3(Al<sub>2</sub>O<sub>3</sub>)/8) and GBL (14/4(SiO<sub>2</sub>)) gate al., EDL (2025). [12] Venkatesan et al., EDL (2024). [13] Kang et al., Bioinformatics (2023). [14] Kang et al., TCAD (2024). [15] Fan et al., DAC (2024). [16] Hsu et al., MEMSYS (2023). [17] Venkatesan et al., IRPS (2025).







Fig. 3: (a) Pulse scheme for retention characterization. (b) Retention at RT in the TDL and GBL FEFETs shows that TDL FEFETs demonstrate robust retention while GBL FEFETs show a 29% retention loss after 1e4 s in line with other GBL FEFETs which exhibit between 25 and 50% retention loss.

Fig. 5: As a benchmarking standard, we assume a repository with one billion reference HVs and 15k query HVs. The QLC FE-NAND devices retain the advantages arising from the parallelism and read energy efficiency of CTF NAND while achieving significantly higher data densities.

Fig. 4: (a) Disturb mitigation scheme for reducing pass disturb to acceptable levels. (b) Evolution of I<sub>D</sub>- $V_G$  of the PGM state (b) the 8/3(Al)/8 in after  $\underbrace{II}_{10^{-1}}$ er of  $\underbrace{II}_{10^{-1}}$ FEFET increasing number of pass disturb pulses  $(V_{pass} = V_T + 2 V)$ with no mitigation and mitigation pulses applied every 1000 (C) and 10 cycles. (c&d)  $V_T$  and  $\Delta V_T$  shows  $V_{T}(V)$ that applying the mitigation pulse reduces pass disturb from 28% down to

0

18nm HZO 8/2(Si)/8 8/2(Al)/8 7/4(Si)/7 7/4(Al)/7 14/4(AI) 14/4(Si) 6/2(AI)/6/4(Si Hybrid GBL TDI N pulses (a) PGM M pulses M pulses Vwrite M pulses Disturb V<sub>pass</sub> pulses time 50 us 10 us Mitigation Í<sub>D</sub>-V<sub>G</sub>  $I_D - V_G$ pulse Read Read  $10^{0}$ w/o mitigation M = 1000M = 10► 10<sup>2</sup> 0 10-2 10-3 -2 0 2 4-2 0 2 4-2 0 2 4  $V_{G}(V)$  $V_{G}(V)$  $V_{G}(V)$ 3 (d) 4 No mitigation No mitigation 2  $(M = \infty)$  $(M = \infty)$  $\Delta V_{T}(V)$ 2 1000 1 0 M = 10000 M = 10M = 10-2 \_1 106107  $10^{0}$  $10^{0}$  $10^{2}$  $10^{4}$  $10^{2}$  $10^{4}$ 106107 #pulses #pulses

| Architecture                     | GPU [13]           | DRAM [14]                                      | MLC ReRAM<br>[15]            | 3D NAND<br>[16]                     | FeNAND<br>(this work)                   |
|----------------------------------|--------------------|------------------------------------------------|------------------------------|-------------------------------------|-----------------------------------------|
| Compute<br>type                  | -                  | Near-memory                                    | Near-memory                  | In-memory                           | In-memory                               |
| Algorithm                        | HOMS-TC            | HyperOMS-<br>PIM-DRAM                          | HyperOMS-<br>PIM-ReRAM       | HyperOMS-<br>3DNAND                 | HyperOMS-<br>FeNAND                     |
| Fechnology<br>Node               | RTX 4090<br>(5 nm) | 22 nm DRAM<br>(DDR4)<br>28 nm node-<br>compute | 130 nm RRAM<br>with 3M cells | NAND: 14 nm<br>ASIC: 7 nm<br>FinFET | FeNAND: 14<br>nm<br>ASIC:7 nm<br>FinFET |
| Speed                            | 1x (23min)         | 2.43x                                          | 1.71x                        | 423x                                | 737x                                    |
| Energy<br>Efficiency             | 1x(454kJ)          | 101x                                           | 516x                         | 7230x                               | 22146x                                  |
| Capacity<br>Limit(per<br>module) | 24GB               | 128GB                                          | 3M cells                     | 16TB                                | >100TB                                  |

Fig. 2: (a) The MW of the FE gate stacks are optimized on FE-MOSCAPs by extracting the MW from the C-V curves. (b)  $Al_2O_3$ acts as the best TDL while SiO<sub>2</sub> is better as a GBL. The hybrid gate stack with a Al<sub>2</sub>O<sub>3</sub> TDL and SiO<sub>2</sub> GBL is shown to achieve a large MW as high as 11 V.

14 nm

Si

14 nm

6 nm

6 nm

(a)

2

Gat

2

8 nm

8 nm

S

Gate

7 nm

4 nm

7 nm

Gate

7 nm

7 nm

Si

ERS

ᠾᠰ

4%.