Digital II
Previously, we have developed a 3-Nb-layer superconducting integrated fabrication process with Nb/Al-AlOx/Al junctions of 6 kA/cm2 critical current density, named SIMIT-Nb03 and a 4-Nb-layer process, named SIMIT-Nb03P. Existing cells in SIMIT-Nb03 wereupdated with the minimal design modification for SIMIT-Nb03P process. We kept developing new circuits with a maximal scale of 1E4 JJs on these two processes, including a SNN circuit (SUSHI) and a 32-bit String-Matching Processor. We then continue our process upgrade by introducing CMP to further increase the critical current density and number of Nb wiring layers, and the resulting 7-Nb-layer 15 kA/cm2 process is named SIMIT-Nb04. Both schematic and layout of cell library have been optimized and re-designed to improve yield and design flexibility compared with those for SIMIT-Nb03. We designed new cells to allow more flexibility in LSI circuit design and new interface cells to allow multi-chip module development. With this new process, we have successfully developed high-frequency SFQ circuits with a scale of 1E3 JJs. Higher working frequency was achieved on this process too, e.g., a PTL ring oscillator worked up to 160 GHz. Meanwhile, we kept developing EDA tools customized for the above processes and cell libraries, supporting key procedures of design automation and verification including RSFQ logic & physical synthesis, superconducting power grid current analysis and timing analysis, which have been fully verified on SIMIT-Nb03, SIMIT-Nb03P, and SIMIT-Nb04. By utilizing the EDA tool chain we developed, an RTL-to-GDSII automation workflow with supply current distribution analysis, static and Monte-Carlo timing analysis can be achieved and the efficiency and reliability of our circuit design are improved.
Superconducting Single-Flux-Quantum (SFQ) logic circuits have been studied for years. The high-speed ultra low-power logic operation is attractive, while the circuit design requires distinctive ideas to make full use of the nature of the device. We have been designed several SFQ logic circuits and developed design methods for them. In this presentation, we summarize logical aspects of SFQ circuit design and key ideas in the design methods that are different from conventional CMOS design methods.
SFQ logic circuits work based on pulse logic, which is inherently different from two-level logic employed in CMOS circuits. All SFQ logic gates are usually clocked, i.e., the operation is triggered by clock pulses, and the circuits can be designed in similar way as two-level logic. However, the amount of clock wire is not negligible, and the timing constraints at each logic gate require precise physical design. The wiring has become possible with the progress of multi-layer and thin wire routing. The placement and routing methods are expected to make progress. To eliminate the clock wiring, alternative design with asynchronous logic is possible for specific applications. Recently, design with clockless gates, which synchronize by the data pulses' own timing, becomes attractive.
As a natural result of using pulse logic, an output of a gate cannot drive a bus, or multiple gates. A fanout can be implemented as an active splitter, but it increases circuit area and delay. Therefore, logic optimization objective should be adjusted according to the cost, and suitable circuit algorithms should be selected. Development of large-scale memory elements for use under this condition is also expected.
Let us visit the issue of clock distribution and operation timing. As the switching of SFQ gates is very fast, wiring delay is relatively large. In this situation, synchronous design, in which all clocked gates operate at the same time, is not always a best solution.
General synchronous circuit design, in which each clocked gate operates once at a certain point in a clock period but not necessarily simultaneous, can be an effective solution, especially with respect to the clock frequency. Flow-clocking schemes which have long been used in SFQ design fall into this category. In concurrent-flow clocking scheme, the timing to trigger a gate and the successor is skewed. In other words, clocked gates in a circuit are grouped to form pipeline stages, and the timing to trigger the stages are different each other. Here, precise timing design method considering placement and routing constraints is required.
Flow-clocking is advantageous if the computation goes through from the input to the output in one direction, like the cases of stream processing. However, if the results of a stage should be fed back to the previous stages, which is often the case, synchronization is difficult because clock signal among the stages are skewed. Insertion of data buffers could be a solution, but we should be careful not to spoil the merit of high clock frequency. Hierarchical clocking scheme would be also effective. For example, we can design the gate-level flow-clocking micro clock, with the system-level synchronous clock whose cycle is composed of several micro clock cycles.
In logic timing design, we have been focusing on the order of pulse arrival at each gate. Provided that the order is kept, which is a local property, functionality of the circuit is unchanged, which is a global property. Based on this insight, we have been developing schemes for description of circuit specification, formal verification, and static timing analysis.
In summary, SFQ logic family and the derivatives are expected for high-speed low-power computation. To make use of the potential, design ideas and methodologies customized for the device is needed. Though we focused on logic design, issues on device and physical design are also important and cannot be considered separately.
Stochastic computation represents a number by the probability of a “1” occurring in a binary digit number train. This approach enables multiplication and addition to be performed with a minimal number of logic gates and has garnered attention in applications where approximate computation is effective. For stochastic computations, which require calculation of long number trains, the use of superconducting single-flux quantum (SFQ) circuits—capable of high-speed operation—presents a promising solution. Another advantage of the use of the SFQ circuits is the availability of an ultra-fast superconductor random number generator (SRNG) [1].
In this study, we designed and evaluated an SFQ stochastic multiplier, a crucial circuit component in the field of wireless signal processing. Wireless signal processing is well-suited for stochastic computing because it involves numerical values that include quantization errors from analog-to-digital conversion. The matrix multiplier can be implemented using adders and multipliers. We developed an SFQ multiplier intended for operation at 50 GHz. A random number train generated by the SRNG is used as the control signal for the multiplexer. We designed a 2x2 matrix multiplier composed of eight AND gates functioning as adders and four multipliers, all based on SFQ stochastic circuits using the AISE 10 kA/cm^2 Nb high-speed process [2]. The test circuit of the 2x2 matrix multiplier contains approximately 1500 Josephson junctions.
The test circuit was measured at low speed at 4.2 K, the temperature of liquid helium. SRNGs were used to generate the input stochastic number sequences. We experimentally verified the correct operation of one output. The relationship between the output and the input as well as the multiplexer control signals was measured. The output exhibited errors consistent with numerical analysis.
We believe this is the first demonstration of a stochastic arithmetic circuit operating with SFQ circuits. The designed matrix multiplication circuit is expected to be valuable in not only wireless signal processing but also various applications, including image processing and neural networks.
[1] Y. Yamanashi and N. Yoshikawa, “Superconductive Random Number Generator Using Thermal Noises in SFQ Circuits,” IEEE Trans. Appl. Supercond., vol. 19, no. 3, pp. 630–633, Jun. 2009.
[2] M. Hidaka and S. Nagasawa, “Fabrication process for superconducting digital circuits,” IEICE Trans. Electron., vol. E104–C, pp. 405–410, Sep. 2021.
This work was supported by KAKENHI 22H01542 and 24H00311. The authors thank Naoki Ishikawa for fruitful discussion. The circuits were fabricated in the clean room for analog-digital superconductivity (CRAVITY) of National Institute of Advanced Industrial Science and Technology (AIST) with the high-speed standard process (HSTP).
Figure 1. Micrograph of the SFQ stochastic 2x2 multiplier.
Keywords: SFQ circuit, stochastic computing
RSFQ circuits have attracted attention for their application to qubit control circuits because of their high-speed operation, low power consumption, and cryogenic operation [1]. Controlling qubits requires a clock generator with a stable long-term oscillation frequency. An all-digital phase locked loop (ADPLL) is one of the circuitry solutions to achieve this. It is a circuit including a digitally controlled oscillator (DCO), which is a variable frequency oscillator, and it synchronizes itself with an external clock by negative feedback control. The SFQ-based ADPLL in the previous study [2] detected earlier one of the reference or the internal signal and then, increased or decreased the frequency of the internal signal by one predefined step for one reference period, where the acceptable initial phase difference was not provided. In this study, a time-to-digital converter (TDC) was introduced into the ADPLL to determine the time difference of the reference and the internal signal, which enabled to make and several improvements on the circuit performance.
A block diagram of the designed ADPLL is shown in Fig.1(a). The TDC is a circuit which generates a 2-bit digital output (-4, -3, -2, -1, +1, +2, +3, or +4) corresponding to the time difference between the external clock (reference) and the internal clock (internal signal). It determines the time difference by using the outputs of the last 3 TFFs in the 1/2M divider. By eliminating the “0” output from the TDC, its time resolution is enhanced to the detection limit of the SFQ circuit cells. The output of the TDC corresponds to the proportional (P) control. To reduce the simple harmonic motion, a proportional-derivative (PD) controller is added to reduce the simple harmonic motion.
The DCO is a ring oscillator which consists of multiple tunable delay JTLs (TDJTLs) [2] with the same parameters. In order to control the oscillation frequency, it requires signals to switch between slow/fast mode for each cell in the TDJTLs. The DCO controller is the circuit for converting the serial output from the PD controller into a frequency control word (FCW) for the DCO. In order to process the continuous Up'/Down' signals input, the throughput of the DCO controller has been improved by making it indefinite which cell in the TDJTLs will be fast mode.
For the circuit design, we used the RSFQ digital cell library "CONNECT" updated for the Nb/AlOx/Nb 10kA/cm2 process (AIST-HSTP) [3]. In addition, we designed a layout of the TDJTL cell with its parameter optimization for expanding operating margins.
Numerical simulation was performed to verify the ADPLL with the target frequency set at 20 GHz. By setting the number of TDJTLs in the DCO to 4 and the bias voltage to 109.6 % of the nominal bias voltage (2.5 mV × 1.096 = 2.74 mV), the oscillation frequency range of the DCO was from 19.8 GHz to 20.2 GHz. This DCO was incorporated into an ADPLL with a 1/212 divider and an external clock of 20 GHz / 212 = 4.88 MHz was input to TDC. The initial phase difference between the external clock and the internal clock was set to 180°, and the DCO thermal noise at 4.2 K was approximated by a Gaussian distribution. The time variation of the period difference and phase difference between the clocks are shown in Fig.1(b). Phase locking occurred in 18 μs (88 clock cycles), and it was confirmed that phase locking was achieved even with initial phase difference as large as 180°. The time variation of the output period of the DCO is shown in Fig.1(c). Because of the influence of thermal noise and the limitations of the frequency control performance of the ADPLL, fluctuations in the period with a standard deviation of σ = 0.33 ps occurred.
[1] R. McDermott, et al., Phys. Rev. Appl. 2 (2014).
[2] H. Cong, et al., IEEE Trans. Appl. Supercond. 32 (2022).
[3] N. Takeuchi, et al., Supercond. Sci. Technol. 30 (2017).
This work was partly supported by JSPS KAKENHI Grant Number JP20H02201. It was also supported through the activities of VDEC, The University of Tokyo, in collaboration with Cadence Design Systems.
Keywords: all-digital phase locked loop (ADPLL), time-to-digital converter (TDC), digitally controlled oscillator (DCO), RSFQ