The development of quantum materials is a source of breakthrough new phenomena and innovative functions, and is becoming increasingly important as a starting point and foundation for next-generation industries. Although various methods have been established to predict the functions of quantum materials based on existing theories, one often fails to experimentally synthesize materials as theoretically designed. Because of the difficulty in predicting the phase formation of new materials, the actual development of new materials has been based on empirical knowledge, physical intuition, and exhaustive search, which has been a bottleneck in the development of materials. One of the reasons for the difficulties is the lack of available experimental data necessary to accurately predict the phase formation of new materials. Although the trial-and-error process of materials development has accumulated a large amount of experimental data on the success or failure of phase formation, failed cases are rarely disclosed. Furthermore, the chemical compositions and crystal structures of materials in the open databases are limited to those theoretically calculated or successfully synthesized.
In the simple system of ternary ABX3, the Goldschmidt tolerance factor, which consists of ionic radii, is known as an indicator for the formation of a cubic perovskite phase. However, such an indicator for phase formation is unknown for multinary compounds. In this study, we focus on layered perovskite compounds [Figure 1 (left)] as a typical example of multinary systems to identify indicators for phase formation (phase formation determinants) that would be available for predicting phase formation. We developed a machine learning model to predict phase formation using hundreds of experimental data on phase formation. To predict the formation of new materials in layered perovskite compounds, we developed the Python codes to compute the phase formation determinants for this system using SISSO (Sure Independence Screening and Sparsifying Operator) [1], a type of symbolic regression that is expected to have high extrapolation performance. The experimental data on phase formability in an arsenic system were classified [Figure 1(right)], achieving a classification accuracy of ~ 90%. We also constructed a prediction model by using ~ 300 experimental data on phase formability including non-arsenic systems and evaluated the generalization performance of the phase formation determinants.