Industrial Management and Data Systems

A dual adversarial structure of generative adversarial network for nature language generation

KUEN-LIANG SUE
Yi-Cheng Chen
Publication typeJournal Article
Publication date2025-03-06
scimago Q1
wos Q1
SJR1.207
CiteScore9.6
Impact factor4.2
ISSN02635577, 17585783
Abstract
Purpose

Recently, due to the practicability in several domains, generative adversarial network (GAN) has successfully been adopted in the field of natural language generation (NLG). The purpose of this paper focuses on improving the quality of text and generating sequences similar to human writing for several real applications.

Design/methodology/approach

A novel model, GAN2, is developed based on a GAN with dual adversarial architecture. We train the generator by an internal discriminator with a beam search technique to improve the quality of generated sequences. Then, we enhance the generator with an external discriminator to optimize and strengthen the learning process of sequence generation.

Findings

The proposed GAN2 model could be utilized in widespread applications, such as chatbots, machine translation and image description. By the proposed dual adversarial structure, we significantly improve the quality of the generated text. The average and top-1 metrics, such as NLL, BLEU and ROUGE, are used to measure the generated sentences from the GAN2 model over all baselines. Several experiments are conducted to demonstrate the performance and superiority of the proposed model compared with the state-of-the-art methods on numerous evaluation metrics.

Originality/value

Generally, reward sparsity and mode collapse are two main challenging issues when adopt GAN to real NLG applications. In this study, GAN2 exploits a dual adversarial architecture which facilitates the learning process in the early training stage for solving the problem of reward sparsity. The occurrence of mode collapse also could be reduced in the later training stage with the introduced comparative discriminator by avoiding high rewards for training in a specific mode. Furthermore, the proposed model is applied to several synthetic and real datasets to show the practicability and exhibit great generalization with all discussed metrics.

Chakraborty T., Reddy K S U., Naik S.M., Panja M., Manvitha B.
2024-01-29 citations by CoLab: 45 PDF Abstract  
Abstract Generative adversarial networks (GANs) have rapidly emerged as powerful tools for generating realistic and diverse data across various domains, including computer vision and other applied areas, since their inception in 2014. Consisting of a discriminative network and a generative network engaged in a minimax game, GANs have revolutionized the field of generative modeling. In February 2018, GAN secured the leading spot on the ‘Top Ten Global Breakthrough Technologies List’ issued by the Massachusetts Science and Technology Review. Over the years, numerous advancements have been proposed, leading to a rich array of GAN variants, such as conditional GAN, Wasserstein GAN, cycle-consistent GAN, and StyleGAN, among many others. This survey aims to provide a general overview of GANs, summarizing the latent architecture, validation metrics, and application areas of the most widely recognized variants. We also delve into recent theoretical developments, exploring the profound connection between the adversarial principle underlying GAN and Jensen–Shannon divergence while discussing the optimality characteristics of the GAN framework. The efficiency of GAN variants and their model architectures will be evaluated along with training obstacles as well as training solutions. In addition, a detailed discussion will be provided, examining the integration of GANs with newly developed deep learning frameworks such as transformers, physics-informed neural networks, large language models, and diffusion models. Finally, we reveal several issues as well as future research outlines in this field.
Garneau N., Lamontagne L.
2023-11-26 citations by CoLab: 1
Nishikino K., Kobayashi K.
2023-09-16 citations by CoLab: 1 Abstract  
Supervised fine-tuning of large language models (LMs) does not always provide good text-generation performance in terms of quality and diversity. This is because such models maximize the likelihood of correct subsequent words based on previous contexts encountered in the training phase, instead of evaluating the entire structure of the generated texts. In this context, fine-tuning methods for LMs using adversarial imitation learning (AIL) have been proposed to improve the trade-off relationship between quality and diversity. This method leverages the evaluations of the discriminators without requiring manually designed metrics. Previously proposed AIL methods cannot control the shapes of the reward functions and constrain updates of LMs using fixed ranges, independent of the quality, e.g., proximal policy optimization. This study proposes a combination of an AIL method and an approximation of mixture distributions (AMDAIL), synergizing with LMs for text generation. AMDAIL exhibits two features: (1) controlling the distribution of the bounded reward values by varying the shape of the bounded reward function, and (2) a variable constraint to promote updates using the confidence of the discriminator as the quality of the texts. The proposed method exhibits stable behavior in the training phases and improves the trade-off relationship between the quality and diversity in the inference phases.
Wu C., Wang C., Xu J., Liu Z., Zheng K., Wang X., Song Y., Gai K.
2023-08-04 citations by CoLab: 17
Ji Z., Lee N., Frieske R., Yu T., Su D., Xu Y., Ishii E., Bang Y., Madotto A., Fung P.
ACM Computing Surveys scimago Q1 wos Q1
2023-03-03 citations by CoLab: 960 Abstract  
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation, and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions, and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
Huang R., Cui C., cHEN F., Ren Y., Liu J., Zhao Z., Huai B., Wang Z.
2022-10-10 citations by CoLab: 22 Abstract  
Deep generative models have achieved significant progress in speech synthesis to date, while high-fidelity singing voice synthesis is still an open problem for its long continuous pronunciation, rich high-frequency parts, and strong expressiveness. Existing neural vocoders designed for text-to-speech cannot directly be applied to singing voice synthesis because they result in glitches and poor high-frequency reconstruction. In this work, we propose SingGAN, a generative adversarial network designed for high-fidelity singing voice synthesis. Specifically, 1) to alleviate the glitch problem in the generated samples, we propose source excitation with the adaptive feature learning filters to expand the receptive field patterns and stabilize long continuous signal generation; and 2) SingGAN introduces global and local discriminators at different scales to enrich low-frequency details and promote high-frequency reconstruction; and 3) To improve the training efficiency, SingGAN includes auxiliary spectrogram losses and sub-band feature matching penalty loss. To the best of our knowledge, SingGAN is the first work designed toward high-fidelity singing voice vocoding. Our evaluation of SingGAN demonstrates the state-of-the-art results with higher-quality (MOS 4.05) samples. Also, SingGAN enables a sample speed of 50x faster than real-time on a single NVIDIA 2080Ti GPU. We further show that SingGAN generalizes well to the mel-spectrogram inversion of unseen singers, and the end-to-end singing voice synthesis system SingGAN-SVS enjoys a two-stage pipeline to transform the music scores into expressive singing voices. Audio samples are available at \url{https://SingGAN.github.io/}
Li X., Metsis V., Wang H., Ngu A.H.
2022-07-08 citations by CoLab: 80 Abstract  
Signal measurements appearing in the form of time series are one of the most common types of data used in medical machine learning applications. However, such datasets are often small, making the training of deep neural network architectures ineffective. For time-series, the suite of data augmentation tricks we can use to expand the size of the dataset is limited by the need to maintain the basic properties of the signal. Data generated by a Generative Adversarial Network (GAN) can be utilized as another data augmentation tool. RNN-based GANs suffer from the fact that they cannot effectively model long sequences of data points with irregular temporal relations. To tackle these problems, we introduce TTS-GAN, a transformer-based GAN which can successfully generate realistic synthetic time-series data sequences of arbitrary length, similar to the real ones. Both the generator and discriminator networks of the GAN model are built using a pure transformer encoder architecture. We use visualizations and dimensionality reduction techniques to demonstrate the similarity of real and generated time-series data. We also compare the quality of our generated data with the best existing alternative, which is an RNN-based time-series GAN. TTS-GAN source code: github.com/imics-lab/tts-gan
Xia Y., Zheng W., Wang Y., Yu H., Dong J., Wang F.
2022-03-01 citations by CoLab: 41 Abstract  
Facial expression synthesis has gained increasing attention with the development of Generative Adversarial Networks (GANs). However, it is still very challenging to generate high-quality facial expressions since the overlapping and blur commonly appear in the generated facial images especially in the regions with rich facial features such as eye and mouth. Generally, existing methods mainly consider the face as a whole in facial expression synthesis without paying specific attention to the characteristics of facial expressions. In fact, according to the physiological and psychological research, the differences of facial expressions often appear in crucial regions such as eye and mouth. Motivated by this observation, a novel end-to-end facial expression synthesis method called Local and Global Perception Generative Adversarial Network (LGP-GAN) with a two-stage cascaded structure is proposed in this paper which is designed to extract and synthesize the details of the crucial facial regions. LGP-GAN can combine the generated results from the global network and local network into the corresponding facial expressions. In Stage I, LGP-GAN utilizes local networks to capture the local texture details of the crucial facial regions and generate local facial regions, which fully explores crucial facial region domain information in facial expressions. And then LGP-GAN uses a global network to learn the whole facial information in Stage II to generate the generate final facial expressions building upon local generated results from Stage I. We conduct qualitative and quantitative experiments on the commonly used public database to verify the effectiveness of the proposed method. Experimental results show the superiority of the proposed method over the state-of-the-art methods.
Alsmadi I., Aljaafari N., Nazzal M., Alhamed S., Sawalmeh A.H., Vizcarra C.P., Khreishah A., Anan M., Algosaibi A., Al-Naeem M.A., Aldalbahi A., Al-Humam A.
IEEE Access scimago Q1 wos Q2 Open Access
2022-01-27 citations by CoLab: 15 Abstract  
Machine learning algorithms represent the intelligence that controls many information systems and applications around us. As such, they are targeted by attackers to impact their decisions. Text created by machine learning algorithms has many types of applications, some of which can be considered malicious especially if there is an intention to present machine-generated text as human-generated. In this paper, we surveyed major subjects in adversarial machine learning for text processing applications. Unlike adversarial machine learning in images, text problems and applications are heterogeneous. Thus, each problem can have its own challenges. We focused on some of the evolving research areas such as: malicious versus genuine text generation metrics, defense against adversarial attacks, and text generation models and algorithms. Our study showed that as applications of text generation will continue to grow in the near future, the type and nature of attacks on those applications and their machine learning algorithms will continue to grow as well. Literature survey indicated an increasing trend in using pre-trained models in machine learning. Word/sentence embedding models and transformers are examples of those pre-trained models. Adversarial models may utilize same or similar pre-trained models as well. In another trend related to text generation models, literature showed effort to develop universal text perturbations to be used in both black-and white-box attack settings. Literature showed also using conditional GANs to create latent representation for writing types. This usage will allow for a seamless lexical and grammatical transition between various writing styles. In text generation metrics, research trends showed developing successful automated or semi-automated assessment metrics that may include human judgement. Literature showed also research trends of designing and developing new memory models that increase performance and memory utilization efficiency without validating real-time constraints. Many research efforts evaluate different defense model approaches and algorithms. Researchers evaluated different types of targeted attacks, and methods to distinguish human versus machine generated text.
He T., Zhang J., Zhou Z., Glass J.
2021-12-17 citations by CoLab: 4
Montahaei E., Alihosseini D., Soleymani Baghshah M.
Neurocomputing scimago Q1 wos Q1
2021-08-01 citations by CoLab: 9 Abstract  
Although GAN-based methods have received many achievements in the last few years, they have not been entirely successful in generating discrete data. The most crucial challenge of these methods is the difficulty of passing the gradient from the discriminator to the generator when the generator outputs are discrete. Despite the fact that several attempts have been made to alleviate this problem, none of the existing GAN-based methods have improved the performance of text generation compared with the maximum likelihood approach in terms of both the quality and the diversity. In this paper, we proposed a new framework for generating discrete data by an adversarial approach in which there is no need to pass the gradient to the generator. The proposed method has an iterative manner in which each new generator is defined based on the last discriminator. It leverages the discreteness of data and the last discriminator to model the real data distribution implicitly. Moreover, the method is supported with theoretical guarantees, and experimental results generally show the superiority of the proposed DGSAN method compared to the other popular or recent methods in generating discrete sequential data.
Wu Q., Li L., Yu Z.
Generative Adversarial Networks (GANs) for text generation have recently received many criticisms, as they perform worse than their MLE counterparts. We suspect previous text GANs' inferior performance is due to the lack of a reliable guiding signal in their discriminators. To address this problem, we propose a generative adversarial imitation learning framework for text generation that uses large pre-trained language models to provide more reliable reward guidance. As previous text GANs suffer from high variance of gradients, we apply contrastive discriminator, and proximal policy optimization (PPO) to stabilize and improve text generation performance. For evaluation, we conduct experiments on a diverse set of unconditional and conditional text generation tasks. Experimental results show that TextGAIL achieves better performance in terms of both quality and diversity than the MLE baseline. We also validate our intuition that TextGAIL's discriminator demonstrates the capability of providing reasonable rewards with an additional task.
Rizzo G., Van T.H.
2020-11-01 citations by CoLab: 11 Abstract  
Text generation is a challenging task for intelligent agents. Numerous research attempts have investigated the use of adversarial networks with word sequence-based generators. However, these approaches suffer from an unbalance between generator and discriminator causing overfitting due to the strength that the discriminator acquires by getting too precise in distinguishing what the generator is producing and what instead comes from the real dataset. In this paper, we investigate how to balance both generator and discriminator of a sequence-based text adversarial network exploiting: i) the contribution of global knowledge in the input of the adversarial network encoded by global word embeddings that are adapted to the context of the datasets in which they are utilized, and ii) the use of a self-attentive discriminator that slowly minimizes its loss function and thus enables the generator to get valuable feedback during the training process. Through an extensive evaluation on three datasets of short-, medium- and long-length text documents, the results computed using word-overlapping metrics show that our model outperforms four baselines. We also discuss the results of our model using readability metrics and the human perceived quality of the generated documents.
Chen J., Wu Y., Jia C., Zheng H., Huang G.
Neurocomputing scimago Q1 wos Q1
2020-11-01 citations by CoLab: 25 Abstract  
Automatically generating meaningful and coherent text has many applications, such as machine translation, dialogue systems, BOT application, etc. Text generation technology has attracted more attention over the past decades. A bunch of excellent methods are proposed; however, there are still challenges to generate text rivals the real one by human, such as most machines output fixed length text, or can only generate text quite the same with the input training text. In this paper, we put forward a novel text generation system, called customizable conditional text generative adversarial network, which is capable of generating diverse text content of variable length with customizable emotion label. It is more convenient for generating actual original text with specific sensitive orientation. We propose a conditional text generative adversarial network (CTGAN), in which emotion label is adopted as an input channel to specify the output text, and variable length text generation strategy is put forward. After generating initial texts by CTGAN, to make the generated text data match the real scene, we design an automated word-level replacement strategy, which extracts the keywords (e.g. nouns) from the training texts and replaces the specific keywords in the generated texts. Finally, we design a comprehensive evaluation metric based on various text evaluations, called mixed evaluation metric. Comprehensive experiments on real-world datasets testify that our proposed CTGAN behaves better than other text generation methods, i.e., generated text are more real compared with the real text than other generation methods, achieving state-of-the-art generation performance.
Yang Y., Dan X., Qiu X., Gao Z.
IEEE Access scimago Q1 wos Q2 Open Access
2020-05-11 citations by CoLab: 19 Abstract  
Text generation is a basic work of natural language processing, which plays an important role in dialogue system and intelligent translation. As a kind of deep learning framework, Generative Adversarial Networks (GAN) has been widely used in text generation. In combination with reinforcement learning, GAN uses the output of discriminator as reward signal of reinforcement learning to guide generator training, but the reward signal is a scalar and the guidance is weak. This paper proposes a text generation model named Feature-Guiding Generative Adversarial Networks (FGGAN). To solve the problem of insufficient feedback guidance from the discriminator network, FGGAN uses a feature guidance module to extract text features from the discriminator network, convert them into feature guidance vectors and feed them into the generator network for guidance. In addition, sampling is required to complete the sequence before feeding it into the discriminator to get feedback signal in text generation. However, the randomness and insufficiency of the sampling method lead to poor quality of generated text. This paper formulates text semantic rules to restrict the token of the next time step in the sequence generation process and remove semantically unreasonable tokens to improve the quality of generated text. Finally, text generation experiments are performed on different datasets and the results verify the effectiveness and superiority of FGGAN.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?