Using Generative Adversarial Network (GAN) to Produce Artistic painting by IRJET Journal

Using Generative Adversarial Network (GAN) to Produce Artistic painting

Anushka Naik1 , Amogh Sanzgiri2

1 Student, Dept. Of Information Technology, Goa College of Engineering, Goa, India

2 Professors, Dept. of Information Technology, Goa College of Engineering, Goa, India

Abstract - This paper uses Generative Adversarial Networks (GANs) to create an antique Indian painting style using JPEG photos, emulating the manner of the well-known Indian artist Raja Ravi Varma. In order to bridge the gap between artificial intelligence and creative expression, the goal is to investigate the potential of GANs to produce aesthetically pleasing and stylistically rich artworks. Using a carefully chosen dataset, the GAN architecture is trained as part of the approach, enabling the model to pick up on the complex compositions, textures, and patterns specific to his painting genres. By use of the antagonistic interaction between a discriminator and generator, the GAN aims to generate artworks that combine elements of computational creativity with conventional creative approaches. The project's output has the potential to be used in content development, digital art creation, and the democratization of artistic tools. This initiative, which uses GANs, adds to the field of generative art by offering a forum for the fusion of artificial intelligence and human creativity.

Key Words: Generative Adversarial Netwok (GAN), artistic style, Raja Ravi Verma, AI painting, CycleGAN, StyleTransfer

1. INTRODUCTION

Artisticexpressionhaslongbeenareflectionofcultural identity, creativity, and the evolving narrative of human civilization. It has helped us to connect our past with present and highlighting our evolution, importance and dominance over time. Ajanta and Ellora caves is one such significantevidence.Butwithtimeartandtheirartistsare disappearing not due to advancement in technology but alsodue toits technique. Traditional painting techniqueis time consuming, presence of artist physically and unavailability of raw materials. Indeed, high resolution cameras, small storage devices and fast printing machines have made people shift their interest from traditional art form. Though with such existing problems the craze for traditional art is still among us and is growing over time . Insuchcasethereismuchneedtopreserveourtraditional art by using modern techniques such as Generative AdversarialNetwork(GAN).

The advent of Generative Adversarial Networks (GANs)[8] has revolutionized the field of artificial intelligence, particularlyin the domain ofcreativity.GANs,

with their ability to generate realistic and novel content, presentauniqueopportunitytoexploretraditionalartistic styles with the computational power of modern technology.[5]GANhastwocomponentsandworkslikea game based model. A Generator (G) produces the sample and Discriminator (D) tries to distinguish between G’s producedsampleandoriginalpainting.IfDissuccessfulin identifying fake then a penalty is imposed on Gand if Dis notsuccessfulinidentifyingfakethenapenaltyisimposed on D.Due to this penalty the generator and discriminator learnandimproveitsperformances.

Though GAN have evolved over time but their area of research is limited in particular domain. Most GAN have focused on western art but ignored importance of Asian especiallyIndianartstyle.Oneofthemanyresonsbehind this is less dataset availibility and its diversity due to various painters. The culture depicting the art changes from place to place even though the artform may be the same.

2. RELATED WORK

2.1 Generative Adversarial Network (GAN)

Generativeadversarialnetworks(GANs),introducedby Goodfellow[8] , is an emerging technology for both unsupervised and semi-supervised learning. They are implicit density generative models, and they are characterizedbytwomaincomponents:ageneratorG,and a discriminator D. The basic idea of GANs is to set up a game between the generator and discriminator. The formertriestogeneratesamplesthatareintendedtocome from the real data distribution, while the latter examines realandgeneratedsamplesinordertodistinguishbetween real or fake data. A common analogy is to think of the generator as an art forger, and the discriminator as an art expert. The forger tries to create forgeries which are increasingly similar to real paintings, in order to deceive the art expert. The expert, at the same time, learns more andmoresophisticatedwaystodiscriminatebetweenreal andfalseartworks. OneofthemostcrucialpointsofGANs is that the generator has no direct access to the real data: the only manner for it tolearn is throughinteraction with thediscriminator.Bycontrast,thediscriminatorhasaccess to both real and generated data. This behavior can be expressed via a min max game, where the generator tries

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 05 | May 2024 www.irjet.net p-ISSN: 2395-0072

to minimize the gain of the discriminator, while the discriminatortriestodotheopposite.

The adversarial modeling framework is most straightforward to apply when the models are both multilayer perceptrons. [7] To learn the generator’s distributionpg overdatax,wedefineaprioroninputnoise variables pz(z), then represent a mapping to data spaceas G(z;θg),whereGisadifferentiablefunctionrepresentedby a multilayer perceptron with parameters θg. [6,7] We also defineasecondmultilayerperceptronD(x;θd)thatgivean outputs in a single scalar. D(x) represents the probability thatxcamefromthedataratherthanpg.

( ))) (1)

WetrainDtomaximizetheprobabilityofassigningthe correct label to both training examples and samples from G.WesimultaneouslytrainGtominimize.

) )[ )] )[ )))](2)

AdvantagesofGAN[8]

 Image synthesis is the ability to produce images, starting from another type of information. This information can be random noise, a text describingtheimage,orafeatureoftheimage.

 Image-to-image translation is translating the possiblerepresentationofonesceneintoanother, such as mapping grayscale images to RGB, or generatinganimagefromonlytheedges.

 The generator, in GANs, learns a mapping betweenanarbitrarylatentspaceanddataspace, in a completely unsupervised manner. The generator associates the feature code values to theactualsemanticattributesoftheoutput.

 Inthepreviousapplications,thegoalofthemodel is to train the generator with the help of the discriminator, which acts like a teacher. Usually, after the learning phase, the discriminator is discarded,andonlythegeneratorisused.Insemisupervised learning, this paradigm is shifted, since the objective is to train the discriminator, withthehelpofthegenerator.

2.1 Arbitary Style Transfer [3]

Arbitrary style transfer refers to the process of applying the artistic style of one image (the style reference) to the content of another image (the content reference) in a way that preserves the key features of the contentimagewhileadoptingtheartisticcharacteristicsof the style image. This technique is a subset of neural style transfer, which utilizes deep neural networks to achieve thetransferofartisticstyles

2.2 Neural Style Transfer [4]

Neuralstyletransferisanoptimizationtechnique performedbytakingtwoimage,oneisnormalimageanda artistic image and blend them together so the output image looks like the content image, but “painted” in the style of the style reference image. This is implemented by optimizing the output image to match the content statisticsofthecontentimageandthestylestatisticsofthe stylereferenceimage. Thesestatisticsare extractedfrom theimagesusingaconvolutionalnetwork

2.3 Image to Image Translation [7,6]

Image-to-image translation is a generative artificial intelligence (AI) technique that translates a sourceimage intoa targetimage while preservingcertain visual properties of the original image. This technology uses machine learning and deep learning techniques such as generative adversarial networks (GANs); conditional adversarial networks(cGANs); and convolutional neural networks (CNNs) to learn complex mapping functions between input and output images.Image-to-image translation allows images to be converted from one form toanotherwhileretainingessentialfeatures.Thegoalisto learn a mapping between the two domains and then generate realistic images in whatever style a designer chooses. This approach enables tasks such as style transfer, colorization and super-resolution, a technique that improves the resolution of an image. The image-toimage technology encompasses a diverse set of applications in art, image engagement, data augmentation and computer vision, also known as machine vision. For instance, image-to-image translation allows photographers to change a daytime photo to a nighttime one, convert a satellite image into a map and enhance medicalimagestoenablemoreaccuratediagnoses.

2.4 StarGAN [6]

For image-to-image translation, StarGAN representsamajorbreakthroughingenerativeadversarial networks (GANs), especially when it comes to multidomain and attribute-manipulation scenarios. StarGAN is aflexibleoptionforarangeofimagetranslationproblems since it provides an integrated framework that can manage several domains in a single model. One of StarGAN's primary innovations is its one-to-many translation capability, which enables the simultaneous translation of a single input image into several target domains. Applications that need to manipulate style or control features, like altering the color of hair, a person's expression, or other visual elements in photos, must have thisfeature.

Through a shared generator and discriminator architecture, which allows the generator to learn to map

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 05 | May 2024 www.irjet.net p-ISSN: 2395-0072

input images to different output domains and the discriminator to distinguish between actual and created images across all domains, StarGAN is able to operate across numerous domains. This configuration reduces computational complexity and resources by enabling effective training and inference for handling a variety of domainswithoutthe requirementforseparatemodelsfor each domain. To further enhance translation quality, StarGAN also includes a domain classification loss to enforce the generator's capacity to generate realistic images in each target domain. Researchers and practitioners working on image synthesis, style transfer, and domain adaptation tasks across many visual domains have grown to favor StarGAN because of its versatility, attributecontrol,andone-to-manytranslationcapabilities.

2.5 CycleGAN [7]

Inproblemswheregetting paireddataisdifficult or impracticable, CycleGAN has emerged as a key framework for unpaired image-to-image translation. An expansion of conventional GANs designed especially for situations involving unpaired data is called CycleGAN. Using cycle-consistency constraints to enforce meaningful translations, CycleGAN can learn mappings between two domains without the need for directly paired instances, which is one of its main advantages. This method makes the process of acquiring data easier while simultaneously improving the model's ability to generalize to new data andotherpicturedistributions.

Two generators and two discriminators, each tasked with translatingimagesacross twodomainsanddistinguishing real from created images, make up CycleGAN's architecture. A key component of training is the cycleconsistencyloss,whichmakessurethatwhen animageis reconstructedfromonedomaintoanotherandbackagain, itcloselyresemblestheoriginal,therebytransferringstyle or qualities without sacrificing substance. Because of this mechanism, CycleGAN generates images with higher realismandcoherence,whichmakesitsuitedforavariety of image translation tasks, including artistic rendering, object transfiguration, and style transfer. With its ease of setup, efficiency when processing unpaired data, and capacity to generate translations of excellent quality, CycleGAN has solidified its place as a top framework for picturesynthesisanddomainadaptation.

2.6

Pix2Pix [7]

A pioneer in the field of image-to-image translation, Pix2Pix is well known for producing outputs of excellent quality when given paired training data. Pix2Pix is a conditional generative adversarial network (GAN) version that concentrates on challenges for which training input-output pairs are available. Pix2Pix's pixellevel mapping skills are one of its main advantages; they enable accurate translation between domains, such as the

conversion of grayscale images to color, the creation of realistic photographs from sketches, or the conversion of satellite images to maps. Because it allows for precise control over picture changes, Pix2Pix is an adaptable tool for a variety of computer vision and image synthesis applications.

Pix2Pix's architecture consists of a conditional GAN configuration with a discriminator that separates generatedpairsfromrealpairsandageneratorthatlearns tomapinputphotostooutputimagesinpairs.Pix2Pixcan generate outputs that are both visually convincing and contextually meaningful by conditioning the generator on the input photos. This allows Pix2Pix to capture complex details and structures in the translated images. Furthermore, Pix2Pix uses a mix of adversarial loss and pixel-wise loss to guarantee the generated images' local fidelity and global coherence. Pix2Pix is well-suited for tasks requiring precise picture transformations, such as image colorization, image inpainting, and semantic segmentation to image synthesis, because these training objectivesproviderealisticandsharpoutputs.

Table 1:-ComparisontableofStarGAN,CycleGAN,Pix2Pix

Comarison of StyleGAN, CycleGAN, Pix2Pix

StarGAN CycleGAN Pix2Pix

Usespaired dataset Usesunpaired dataset Usespaired dataset

Cangenerate intomultiple targetdomain

Maintain content integrity

Usedforpixel levelmapping task

Controlover attributes (hair,facial expression) Stylechanges contentremain same Condition inputimages orlabels

ForthispaperCycleGANisusedoverStarGANandPix2Pix becauseof:

 Flexibility: In real-world situations where acquiring paired data can be difficult, CycleGAN's capacity to operate with unpaired data offers greaterflexibility.

 Generalization: The cycle-consistency loss encourages generalization, which enhances resilience and allows for greater adaptability to newdata.

 Domain Adaptation: CycleGAN's method of learning cross-domain mappings while maintainingcontentconsistencymakesitaviable contender for tasks centered on domain adaptation or style transfer without explicit matchedexamples.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 05 | May 2024 www.irjet.net p-ISSN: 2395-0072

 Easy to Use: CycleGAN is more approachable and useful for many image-to-image translation jobs duetoitseaseofsetupandtraining,asopposedto Pix2Pix's dependence on paired data and StarGAN's complexity when handling numerous domains.

3. PROPOSED APPROACH

3.1

Approaches To Project

This project proposes a multifaceted approach that combines image processing techniques, rule-based transformations, and Generative Adversarial Networks (GANs) to achieve the transformation of ordinary JPEG imagesintoartworksinspiredbythetimelessstyleofRaja RaviVarma.

Data Collection: The dataset of Raja Ravi Varma paintings, ensuring representation of human portrait. This dataset forms the foundation for the subsequent stages of theproject.AlongwiththatCelebAdatasetiscollected

Preprocessing:Preprocessingtechniqueswillbeapplied to the images, enhancing relevant features and preparing thedatasetfortraining.

Splitting Data: Thepaintingdatasetisusedfortraining themodelandCelebAdatasetisusedfortestingthetrained model.

Model Selection: Selectinganappropriatemodelsuchas GANorCNN.ThisprojectwillbeusingGAN.

Model Architecture: The propose model will be consisting thefollowing:DefiningGANArchitecture

Generator (G):Thegeneratoracceptsrandomnoiseor latent space and generates a target sample using it. After thediscriminator'sevaluationitimprovesandgeneratesa better sample. Once the discriminator fails to identify a faketherecentlygeneratedsampleisusedasoutput.

Discriminator (D): The discriminator takes the input from generator G and compares it with already existing painting images present in the database. It produces an output in binary form i. e either 0 or 1. 1 indicates it has succeededinidentifyingthatimageisfakeand0indicates ithasfailedtoidentifytherealimage.Thisisanimportant stepasitdecideswhoshouldbepenalized.

Loss Function:Itisapenaltywhichisappliedbasedon the discriminator's decision.it is used after the discriminatormodel.

Adversarial loss: [2] Training the generators to produce images in the target domain that are identical to genuineimagesbasedonthematchingdiscriminatoristhe aimoftheadversarialloss.

Cycle-Consistency Loss:[1] This loss imposes the requirement that a picture that is translated from one domain to another and back again must resemble the originalimage.TheL1orL2distancebetweentheoriginal andreconstructedimagesisusedtocomputeit.

Optimizer: It helps in adjusting the learning rate and decreases the training time of the GAN model.Along with that it can maintain flexible data and also handle noisy data.

StochasticGradientDescentItupdatesthemodelbased onnegativegradientsandalsotunesthelearningrate.

Hyperparameter: It helps in determining the size of inputvector,numberoflayersandnumberofhiddenunits ineachlayer.

Normalization and activation function: Normalization helps in scaling all the features at a similar scale.

Activation function is applied on the output layer where itwill capture patterns, model artistic style , transform image,etc.

1. Block Normalization: [2] Help in improving training of modelsandreducingcovariateshift.Covariateshiftoccurs whenthedistributionofinputdatachangesduringtraining .ItIsusedintheConvolutional2Dlayer.Ihaveusedthisin bothdiscriminatorandGenerator.

2. ReLU Activation function: [1] Rectified LinearUnit (ReLU) is used to speed up the process. It is used in Conv2Dlayer.Ihaveusedthiningenerator

3. Leaky ReLU Activation function: [1] It is used to introducenon-linearity.

Training:Theheartoftheprojectliesintheutilizationofa Generative Adversarial Network (GAN) to learn and reproduce the intricate patterns, textures, and stylistic details inherent in Raja Ravi Varma's paintings which will learnadversariallybyGandD

Evaluation:Obtainedgeneratorsampleswillbecompared by discriminator and determine fake or real.based on it losswillbeassignedtoeitherGorD.

Hyperparameter tuning Hyperparameters such as activation function padding strides, filter, kernel will be adjusted looking at the output obtained and based on requirements.

Result analysis: Loss curve of generator and discriminator is computed which will help to analyze the resultobtained.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 05 | May 2024 www.irjet.net p-ISSN: 2395-0072

3.2 Proposed Algorithm [1]

1. Compile a dataset of pictures from two different categories,suchsnapshotsandpaintings.

2. The photos should be preprocessed (resized, normalized,andenhanced),thensplitintosetsfor trainingandvalidation.

3. Make two networks of generators G1 and G2, where G1 translates pictures from domain A to domainBandviceversaforG2

4. For both generators, use a comparable architecture,whichisfrequentlybasedonaU-Net structurefordetailcapture.

5. Construct two networks of discriminators. D1 and D2 to differentiate between created and actualimagesindomainsAandB,respectively.

6. To improve training stability, use PatchGAN discriminators to assess local image patches ratherthanthefullimage.

7. To trick the corresponding discriminators, define theadversarialloss(GANloss)foreachofthetwo generatornetworks.

8. Include cycle-consistency loss to guarantee that the translated image is almost identical to the original. The L1 or L2 distance between the original and reconstructed pictures is used to calculatethisloss.

9. ToensurethatanimagetranslatedfromdomainA to B looks comparable to other images in domain B, you can optionally include identity loss (and viceversa).

10. Initialize the generator and discriminator networkswithrandomweights.

11. Alternately train the generator and discriminator networksusingmini-batches:

12. Updatethediscriminatornetworks:

Generate fake images for domains A and B using theirrespectivegenerators.

Train 1 to distinguish between real images from domainAandfakeimagesgeneratedby 2.

Train 2 similarlyfordomainB.

13. Updatethegeneratornetworks:

Generate translated images from domain A to B andback(andviceversa).

Compute the adversarial loss to fool the discriminators and the cycle-consistency loss to preserveimagestructure.

Update 1 and 2 usingtheselosses.

14. Repeat this training loop for multiple epochs, monitoring generator and discriminator losses to ensureconvergence.

15. To test the quality of translated images, use a different validation set and consider things like visual fidelity, image-to-image consistency, and domaintransfercorrectness.

16. For quantitative assessment, use evaluation criterialiketheStructuralSimilarityIndex(SSIM), Peak Signal-to-Noise Ratio (PSNR), or perceptual similarity measurements (such employing CNNs thathavealreadybeentrained).

17. Deploy the trained generator networks for inference after the training process is over and desiredoutcomesareobtained.

18. Usingtheacquiredmappings 1 and 2 ,translate imagesbetweendomainsAandB.

19. If required, use any post-processing methods (such as color correction and denoising) to improvethetranslatedphotos'visualquality.

3.3 Dataset

The experiment is using Raja Ravi Verma’s paintings, collected from various sources and CelebA dataset from kaggle, which consists of 120 and 2,0,000 samples respectively.

The dataset is divided into training (80%) and testing (20%) sets using a stratified random split. We applied standard preprocessing techniques, including [resizing, normalizationanddataaugmentation].

3. CONCLUSIONS

This paper proposes a model that will transform digital image of human portrait into Raja Ravi Verma style paintingswhowaswellknowninthemid18thcenturyfor his unique style of blending european oil painting with traditional Indian art. The GAN uses three generators and two discriminators, where first two generators and a discriminator will be used to produce colours and later these colours will be used in paintings to generate target output.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 05 | May 2024 www.irjet.net p-ISSN: 2395-0072

REFERENCES

[1] J. -Y. Zhu, T. Park, P. Isola and A. A. Efros, "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017,pp.2242-2251,doi:10.1109/ICCV.2017.244.

[2] Sergey Ioffe and Christian Szegedy. 2015. “Batch normalization: accelerating deep network training by reducing internal covariate shift” . In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15).JMLR.org,448–456.

[3] R.Abdal,Y.QinandP.Wonka,"Image2StyleGAN:How to Embed Images Into the StyleGAN Latent Space?," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp.4431-4440,doi:10.1109/ICCV.2019.00453.

[4] Johnson, J., Alahi, A., Fei-Fei, L. (2016). ”Perceptual Losses for Real-Time Style Transfer and SuperResolution” . In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9906. Springer, Cham. https://doi.org/10.1007/978-3-31946475-6_43

[5] Gatys, L. A.; Ecker, A. S.; Bethge, M. “Image style transfer using convolutional neural networks.” In: Proceedings of the IEEE Conference on Computer VisionandPatternRecognition,2414–2423,2016.

[6] Y. Choi, et al., "StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-toImage Translation," in 2018 IEEE/CVF Conference on ComputerVisionandPatternRecognition(CVPR),Salt Lake City, UT, USA, 2018 pp. 8789-8797. doi:10.1109/CVPR.2018.00916

[7] P. Isola, J. -Y. Zhu, T. Zhou and A. A. Efros, "Image-toImage Translation with Conditional Adversarial Networks,"2017IEEEConferenceonComputerVision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017,pp.5967-5976,doi:10.1109/CVPR.2017.632.

[8] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'14). MIT Press, Cambridge,MA,USA,2672–2680.