Project Description
Project C13 builds a “virtual map” for nitrogen heteropolycycles. We compile a comprehensive library of reported N-heteropolycycles by automated extraction from full-text articles, tables, and structure images, complemented by high-throughput calculations for key electronic and materials descriptors. On this heterogeneous dataset we train multimodal, transformer-based models that embed molecular structure, properties, synthesis information, and applications into a shared latent space. Distances in this space reflect not only structural similarity but also functional similarity, enabling property-aware search and inverse design. Two proof-of-principle studies target (i) charge-transport materials and (ii) photophysics, where the map will propose synthetically plausible candidates for further simulation, synthesis, and characterization. Results will be released as open code and a web search platform with an accompanying API.

