TEXT TO IMAGE GENERATION USING GAN

Page 1

TEXT TO IMAGE GENERATION USING GAN

1 Assistant Professor, Department of Computer Science and Engineering

2,3,4,5 Undergraduate Student, Department of Computer Science and Engineering, Vallurupalli Nageswara Rao

Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana, India ***

Abstract - Producing good images from descriptions is a challenge in computer vision with practical applications. To address this issue, we propose Stacked Generative Interconnected Networks (StackGAN) to combine the 256×256 real images described in the annotation. The process splits the problem into two phases: Phase-I generates the low-level image by drawing pictures and colours from the text, while Phase-II edits the Phase-I results to create the high image with photorealistic details. . price picture. Conditional magnification is used to improve image contrast and stable GAN training. Extensive testing demonstrates that our development method is better than the state-of-the-art method and proves its effectiveness in creating realistic images based on descriptions. In summary, StackGAN provides a multi-level approach with additional optimization to improve composite images and shows great results in creating high-quality images from text.

Key Words: Stack GAN, Resolution, Conditioning Augmentation, Image generation

1.INTRODUCTION

Creatingrealisticimagesfromdescriptionsisanimportantanddifficulttask,applicableinmanyfieldssuchasphotoediting and computer aided design. Recently, prolific competing networks (GANs) are showing promise in creating real images. ConditionalGANsspeciallydesignedtogenerateimagesfromdescriptivetexthavebeenshowntobeabletogeneratetextrelatedimages.

However,trainingGANstocreaterealisticimagesfromannotationsremainschallenging.Increasingthenumberoflayersused tosolveimageproblemsincurrentGANmodelsoftenmakestrainingunstableandineffective.Theproblemarisesbecausethe classificationofthenaturalimageandtheimplicitclassificationmodeldonotoverlapwellinhighpixelspace.Thisproblem becomesmoreseriousastheimageresolutionincreases.Previoustutorialswerelimitedtocreatingsensible64×64images fromdescriptionswithoutrealobjectsanddetailslikeabird'sbeakandeyes.Also,annotationsarerequiredtocreatehigher resolutionimagessuchas128×128.

Tosolvetheseproblems,wepresentStackedGenerativeAdversarialNetworks(StackGAN),whichdecomposestheproblemof text-to photorealismimagesynthesisintotwomanageableproblems.Wefirstdevelopedthedecoderusingalevel-IGAN.Next, wesettheLevel-IIGANontopoftheLevel-IGANtocreateareallygoodimage(forexample,256×256)accordingtotheLevelIresultsanddescription.LeveragingtheTier-Iresultsandtext,theTier IIGANlearnstocaptureinformationtheTier-IGAN wouldmissandaddmoredetailtotheproduct.

Thisapproachimprovestherenderingofhigh-resolutionimagesbyincreasingtheabilitytomodelthedistributionoftheimage distribution.

Wealsoproposedevelopinganewmethodtohandlethedifferencesinvarioustexteventscausedbythelimitationofthe numberoftrainingtext-imagepairs.Thistechniquefacilitatessmoothingofthecentralcoolingmanifoldbyallowingsmall randomperturbations.ItimprovesthecontrastofsyntheticimagesandimprovesthetrainingofGANs.Ourcontributionscan besummarizedasfollows:

(1)Weproposeacommoncrosslinkingalgorithmthatimprovestheactual(256×256resolution)efficiencyofimagebinding withoutdecomposingtheproblemintocontrolproblems.

(2)WeintroducetheamplificationtechniqueleadingtodifferentdesignandstabilityfortrainingGANs.

(3) Through extensive and thorough testing, we demonstrate the effectiveness of the overall design and the impact of individualproducts.SuchusefulinformationcouldguidetheGANdesignpatterninthefuture.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page1479

2. RELATED WORK

Tosolvethisproblem,researchershaveproposedmanywaystoimprovethetrainingprocessandimproveimagequality.Some methodsfocusonpower-basedGANs,whileothersinvolvevariablessuchasattributesorclassrecords.Therearealsomethods thatusecoolimage-to-imagefunctionssuchasphotoediting,relocation,andsuperresolution.However,thesesuper-resolution techniqueshavelimitationsinaddingimportantdetailsorcorrectingimperfectionsinlow-resolutionimages.

Recently,attemptshavebeenmadetocreateimagesfromnegativedescriptions.

TheseincludemethodssuchasAlignDRAW,eventPixelCNN,andapproximateLangevinsampling.Whilethesemethodshave beenshowntobeeffective,theyofteninvolverepetitiveprocessesorhavelimitedfunctionality.

ConditionalGANswerealsousedtogenerateimagesfromtheannotations.Forexample,Reedetal.Reliable64×64birdand flowerimagesweresuccessfullyprocessedusingGANformalism.

Inthenextwork,theyextendedthe128×128renderingmethodusingannotationtools.

TherearemethodsthatuseaseriesofGANstocreateimages,ratherthanusingasingleGAN.Wangetal.S2-GANisproposed, whichseparatestheinternalsceneproductionintomodelandstyleproduction.Incontrast,thesecondlevelofStackGANaims toimprovethecontentoftheproductandfixthedefectsaccordingtothedescription.

Insummary,workontext-to-imagesynthesisusingStackGANleadstoadvancesinmodeling,includingVAEs,autoregressive models, and GANs. Various methods have been proposed to stabilize GAN training, including adaptive conversion and renderingfromannotations.Whilepreviousmethodshaveshowngoodresults,StackGANstandsoutbytacklingthechallenge ofcreatinghigh resolutionimageswithrealisticdetails.Itsbi-levelarchitectureandconditionalmagnificationtechniqueshelp synthesizediverseandvisuallyappealingimages.

3. METHODOLODY

OurproposedStackGANapproachaddressesthechallengeofcreatinghigh-qualityimagesfromannotations.Toachievethis goal,weproposea two-stepapproachthatbreaksdowntheproblemintomoremanageableproblemsandallowsforthe creationofrealisticimagesfromthedescriptions.InthefirstphaseofStackGAN,weuseagenerativecompetitornetwork

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page1480
Fig -1: SystemArchitecture

(GAN)togeneratealow-resolutionimagefromagivendefinition.Thisfirstphase,calledPhase-I,focusesoncapturingthe shapeandcoloroftheobjectsdescribedinthetext.Bycreatinglow resolutionimages,welaythefoundationforfurther developmentathigherlevels.Stage-IGANgeneratespreliminaryresultsfromlowresolutionimagesasinputforthenextstage.

OurproposedStackGANapproachaddressesthechallengeofcreatinghigh-qualityimagesfromannotations.Toachievethis goal,weproposea two-stepapproachthatbreaksdowntheproblemintomoremanageableproblemsandallowsforthe creationofrealisticimagesfromthedescriptions.InthefirstphaseofStackGAN,weuseagenerativecompetitornetwork (GAN)togeneratealow-resolutionimagefromagivendefinition.Thisfirstphase,calledPhase-I,focusesoncapturingthe shapeandcoloroftheobjectsdescribedinthetext.Bycreatinglow resolutionimages,welaythefoundationforfurther developmentathigherlevels.Stage-IGANgeneratespreliminaryresultsfromlowresolutionimagesasinputforthenextstage.

Intraining,wedomanyexperimentsusingtestdataandcomparethemwiththelatestmethods.Theseexperimentsareusedto validate the effectiveness of StackGAN in generating realistic images in descriptive text. We considered both quality and quantityinouranalysis.

Forqualityevaluation,weanalyzethevisual qualityofthegeneratedimageandcompareitwiththeactual imageonthe ground.Weevaluatefactorssuchasimageclarity,accuracyofthedescriptionprovided,andavailabilityofgoodcontentand spareparts.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page1481
Fig -2:ComparisonoftheStackGANimageswithothermodels

4. RESULTS

Thisproject’smajorobjectivewastogeneratetheimageupongiventext.Theimageswhicharegeneratedduringthetraining phaseareasfollows:

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page1482
Fig -3:TrainingPhaseImage Fig -4:TainingPhaseImage Fig -5: TrainingPhaseImage

Theoutputimageswhenthetextinputisprovidedareshownbelow:

INPUTTOMODEL:BoyRidingahorse

OUTPUTOFTHEMODEL:

INPUTTOMODEL:Adogwithwingsflyinginair.

OUTPUTOFTHEMODEL:

AnotherfinaloutputimagewhichisconvertedusingGANtechnology:

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page1483
Fig -6: OutputImage Fig -7: OutputImage Fig -8:OutputImage

3. CONCLUSIONS AND FUTURE SCOPE

Inthiswork,weintroduceanewmethod,StackGAN,tocombinereal timeimagesfromannotationsusinglinearcompositing andamplificationevents.Ourmethodsolvesthetext-to-imagesynthesisproblemwithauniquesketchoptimizationtechnique. Thefirststage-IGANcapturessimplecolorsandinformationofobjectsbasedonannotations.Thenextstage,Stage-IIGAN, improvesStage-Iresultsbycorrectingimperfectionsandaddingadditionaldetails,resultinginhigherresolutionandbetter imagequality.

Variousqualitativetestsareconductedtoevaluatetheeffectivenessofourplan.Theresultsdemonstratetheeffectivenessof StackGANingeneratinghigherresolutionimages(eg256×256)withaccurateandvariedcontent.Ourmethodoutperforms existingtext-to-imagegeneratormodelsintermsofoutputqualityandrelevancetoaparticulardescription.Thesuccessof StackGANopensupnewpossibilitiesformanypracticalapplicationssuchasimageprocessingandcomputer-aideddesign.By accuratelytranslatingtheannotationintohigh-qualityimages,ourmethodhelpsimprovecomputervisionperformanceand expandsthecapabilitiesofimagesynthesis.

REFERENCES

[1]SceneRetrievalWithAttentionalText-to-ImageGenerativeAdversarialNetworkbyrintaroyanagi,rentogo,takahiroogawa, &mikihaseyama.

[2]TexttoImageSynthesisforImprovedImageCaptioningbymd.zakirhossain,ferdoussohel,mohdfairuzshiratuddin,hamid laga,mohammedbennamoun.

[3]SemanticObjectAccuracyforGenerativeText-to-ImageSynthesisbyTobiasHinz,StefanHeinrich,andStefanWermter.

[4]DynamicMemoryGenerativeAdversarialNetworksforText-to ImageSynthesisbyMinfengZhu1,PingboPanWeiChenYi Yang.

[5]Real-worldImageConstructionfromProvidedTextthroughPerceptualUnderstandingbyKanishGarg,AjeetkumarSingh, DorienHerremansandBrejeshLall.

[6]ResearchonTexttoImageBasedonGenerativeAdversarialNetworkbyLiXiaolin1,GaoYuwei.

[7]Super-ResolutionbasedRecognitionwithGANforLow-ResolvedTextImagesbyMing-ChaoXu,FeiYin,andCheng-LinLiu.

[8]ANovelVisualRepresentationonTextUsingDiverseConditionalGANforVisualRecognitionbyTaoHu,ChengjiangLong, Member,andChunxiaXiao,Member.

[9]ASimpleandEffectiveBaselineforText-to-ImageSynthesisbyMingTaoHaoTangFeiWu1XiaoyuanJingBing-KunBao Changsheng.

[10]Knowledge-TransferGenerativeAdversarialNetworkforText-to ImageSynthesisbyHongchenTan,XiupingLiu,Meng Liu,BaocaiYin,andXinLi.

[11]Bastien,F.,Lamblin,P.,Pascanu,R.,Bergstra,J.,Goodfellow,I.J.,Bergeron,A.,Bouchard,N.,andBengio,Y.(2012).Theano: newfeaturesandspeedimprovements.DeepLearningandUnsupervisedFeatureLearningNIPS2012Workshop.

[12]Bengio,Y.(2009).LearningdeeparchitecturesforAI.NowPublishers.

[13]Bengio,Y.,Mesnil,G.,Dauphin,Y.,andRifai,S.(2013).Bettermixingviadeeprepresentations.InICML’13.

[14]Bengio,Y.,Thibodeau-Laufer,E.,andYosinski,J.(2014a).Deepgenerativestochasticnetworkstrainablebybackprop.In ICML’14.

[15]Bengio,Y.,Thibodeau-Laufer,E.,Alain,G.,andYosinski,J.(2014b).Deepgenerativestochasticnetworkstrainableby backprop.InProceedingsofthe30thInternationalConferenceonMachineLearning(ICML’14).

[16]Bergstra,J.,Breuleux,O.,Bastien,F.,Lamblin,P.,Pascanu,R.,Desjardins,G.,Turian,J., Warde-Farley,D.,andBengio,Y. (2010).Theano:aCPUandGPUmathexpressioncompiler.InProceedingsofthePythonforScientificComputingConference (SciPy).OralPresentation

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page1484

Turn static files into dynamic content formats.

Create a flipbook