MixSong: Diverse and Strictly Formatted Chinese Poetry Generation
Chinese poetry, renowned for its elegance and simplicity, is a hallmark of Chinese culture. While neural networks have made significant advancements in generating poetry, balancing diversity with adherence to rigid structural formats remains a challenge. Research indicates that factors such as themes, emotions (e.g., happiness, sadness), and sentiments (e.g., positive, negative) play a crucial role in poetic creation, influencing both the diversity and quality of the generated content. In this paper, we propose MixSong, an autoregressive language model based on the Transformer architecture, designed to incorporate a wide range of conditional factors. MixSong utilizes adversarial training to integrate these factors, enabling the model to implicitly learn distributional information in the latent space. Additionally, we introduce several uniquely customized symbol sets, including paragraph identifiers, position identifiers, rhyme identifiers, tune identifiers, and conditional distinctive identifiers. These symbols help MixSong effectively capture and enforce the constraints necessary for generating high-quality poetry. Extensive experimental results demonstrate that MixSong significantly outperforms existing models in both automatic metrics and human evaluations, achieving notable improvements in both diversity and quality of the generated poetry.