## Datapath Architecture for Aperture Array Based Multibeam Mask Writer Systems

N. Chaudhary, S. A. Savari<sup>1</sup>

## Department of Electrical & Computer Engineering, Texas A&M University

narendra5@tamu.edu

Multibeam mask writers are the next generation of the mask writing tools because variable-shaped beam (VSB) mask writers cannot handle the throughput requirements of sub-10 nm technology nodes. IMS Nanofabrication and NuFlare have introduced their multibeam mask writers for the 7 nm technology node<sup>2,3</sup>. These "aperture array based" multibeam mask writers use an aperture plate system to convert a broad beam into multiple small beams and individually control each beam by a deflection plate. These tools are known to have a throughput which is independent of the pattern complexity of the mask while the throughput of a VSB writer depends on the pattern complexity of the mask through the beam shot count. The pattern independent throughput of the multibeam mask writers suggests that the underlying data communication systems in the existing datapath architecture could be improved by suitable data compression schemes<sup>4,5</sup>. We extend our earlier discussion on parallel data decompression<sup>4</sup> by proposing a new data decompression architecture which can be attached to NuFlare's deflection plate (or blanking aperture array) architecture and which handles the synchronization constraints created by the multiple decoders. We incorporate some of the ideas from the integrated circuit testing literature on data decompression architectures<sup>6</sup> for parallel decompression of system-on-a-chip designs with multiple cores.

The existing datapath architectures of these multibeam mask writers adapt the older architectures for VSB systems, and their throughput bottleneck is not from data communication but from the choice to perform the complex computations of rasterization, proximity effect corrections and other corrections online at write time using application specific hardware (see Figure 1(a)). We propose an alternate architecture shown in Figure 1(b) which is partly motivated by multibeam direct write lithography systems, where rasterization and various corrections are computed offline to reduce the throughput bottleneck. Since data communication then becomes more critical we propose that parallel data decompression should be performed online at the write time to increase the throughput. Figure 2 illustrates the idea of the proposed data decompression architecture. Here, the deflection plate of a multibeam system consisting  $k \times n$  beams arranged in k rows and n columns receives the grayscale shot data of k decoders running in parallel. We use a simple compression scheme to represent each 10-bit zero grayscale pixel value by a single bit and each nonzero pixel value by a 11-bit symbol. We also develop a scanning strategy motivated by NuFlare's multibeam system to avoid the need to communicate the beam array deflection data. Table 1 shows the compression ratio results for randomly generated data and the motif pattern of Inverse Lithography Technology features of contact holes with a minimum element of 80 nm. The throughput gain we describe is the multiplicative speedup in parallel decompression compared to uncompressed data transfer.

<sup>1.</sup> This work was supported in part by NSF grant ECCS-1201994 and by GenISys, Inc. The authors also thank N. Hayashi of Dai Nippon Printing for providing an ILT test layout image.

<sup>2.</sup> C. Klein and E. Platzgummer, Proc. SPIE 9985, 998505, (2016).

<sup>3.</sup> H. Matsumoto et al., Proc. SPIE 9984, 998405, (2016).

<sup>4.</sup> N. Chaudhary, Y. Luo and S. A. Savari, J. Vac. Sci. Technol., B 34, 06KF01 (2016).

<sup>5.</sup> N. Chaudhary, Y. Luo and S. A. Savari, J. Vac. Sci. Technol., B 33, 06FD01 (2015).

<sup>6.</sup> A. Chandra and K. Chakrabarty, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 20 (3), 355-368, (2001).



Figure 1: (a) Current datapath architecture. (b) Proposed datapath architecture.

Figure 2: Proposed decompression architecture assuming an  $k \times n$  beam array. k parallel decoders convert the k compressed strings  $C_1, \ldots, C_k$  to the corresponding uncompressed 10-bit symbols  $U_1, \ldots, U_k$  and communicate them to control circuits on the deflection plate.



TABLE 1: Data compression ratio and throughput gain with parallel decompression architecture compared to uncompressed data transfer.

| Data type        | Fraction of  | Uncompressed | compressed | Compression | Speedup           |
|------------------|--------------|--------------|------------|-------------|-------------------|
|                  | zero symbols | (MB)         | (MB)       | ratio       | (Throughput gain) |
| Random           | 50%          | 15.6         | 9.4        | 1.7         | 1.7               |
| Random           | 90%          | 15.6         | 3.1        | 5.0         | 5.0               |
| Random           | 99%          | 15.6         | 1.7        | 9.1         | 8.9               |
| ILT layer 1      | 99.6%        | 302.5        | 31.5       | 9.6         | 9.6               |
| 32x31 beam array |              |              |            |             |                   |
| ILT layer 2      | 92.3%        | 302.5        | 53.5       | 5.7         | 5.4               |
| 32x31 beam array |              |              |            |             |                   |