GameDataScience: AnIntroduction
Youmayhaveheardoftheterm gameanalytics or gamedatascience. Infact,youmayhaveevenpickedupthisbookduetotheuseofthe terminindustryoracademiccircles. Gamedatascience hasbecome acornerstoneofgamedevelopmentinaveryshortperiodoftime.Infact, backinthe1990s,noonewouldhavethoughtthatgamedatawouldbecome afieldofstudyandinnovationingameresearchandindustry.Backinthe 1990s,wewerestillworkingondevelopingbettergraphics,developmenttools, anddesignpractices.Fastforwardtonow,gamedatascienceisemergingasa veryimportantfieldofstudyduetotheemergenceofsocialgamesembedded inonlinesocialnetworks.Theubiquityofsocialgamesgivesaccesstonew datasourcesandhasanimpactonimportantbusinessdecisions,giventhe introductionoffreemium1businessmodels.
Gamedatascienceisabroaddomaincoveringallaspectsofcollecting, storing,analyzingdata,andcommunicatinginsights.Itcansupportanyaspect ofdesignanddevelopment,anditisnot only aboutplayerbehavior,although thatiscertainlyanimportantpartoftheprocess.Withamaturedatascienceframeworkinplace,companieshavetheinstrumentstogainobjective knowledgeaboutworkflowsandcompetitors,understandtheircommunities andplayers,improvedevelopmentprocesses,increaseretentionandrevenue,
1 Freemium isamonetizationstrategywherethebareboneserviceisprovidedforfreebut customersareexpectedtopayforadditionalelementssuchasvanityitems,in-gamecurrency,and fastercooldowns.
GameDataScience.MagySeifEl-Nasr,TruongHuyNguyenDinh,AlessandroCanossa,andAndersDrachen,Oxford UniversityPress.©MagySeifEl-Nasr,TruongHuyNguyenDinh,AlessandroCanossa,andAndersDrachen(2021). DOI:10.1093/oso/9780192897879.003.0001
andbuildcapacitytooffergamesforfreetocustomers,asweshalltalkabout morebelow.
Gamedatasciencefundamentallyaimstoadddata-drivenevidencetosupportdecision-makingacrossoperational,tactical,andstrategiclevelsofgame development,andthisiswhyitissovaluable.Itallowsresearchersandthe industrytomoveawayfromguessworkandmakedecisionsbasedoncarefully collected,curated,andanalyzeddata.
Gamedatascienceisthesubjectofthisbook.Afterreadingthisbook,you shouldhaveaclearunderstandingofthecurrentstandardmethodsandtools usedtoanalyzedatacollectedfromgames.Astheknowledgeandpractices ingamedatascienceareexpandingrapidly,theideas,methods,andtools presentedinthisbookwillalsolikelyexpandasnewsolutionsbecomeavailable.Thisbookprovidesanintroductiontothefoundationalapproachesand theoriesthatwillhelpyouunderstandcurrentandfutureapproachesofgame datascience.
Withthisintroductorychapter,youbeginyourjourneyinthefieldofgame datascience.Inparticular,thischapterwillprovideahigh-levelpanoramic introductiontotheprocessesusedtoanalyzeandmakesenseofgamedataand suggestactionableinformationwiththescientificmethodasabaseprocess. Unlikeotherchaptersinthisbook,thisopeningchapterdoesnotcontain practicallabs.Thematerialdiscussedisconceptual,providingyouwiththe basicsasyouembarkonthejourneyofunderstandingandpracticinggame datascience.
1.1Whatisgamedatascience?
Fundamentally,gamedatascienceistheprocessofdiscoveringandcommunicatingpatternsindatawiththepurposeofinformingdecision-makingin differentdomains,suchasbusinessordesign,inthecontextofgames.Assuch, gamedatascienceincludesmanytypesofanalyses,suchassummarizingthe numberofactiveplayerswithinacertaintimeunit,predictingwhenplayers willstopplayingagame,orevaluatingtheperformanceofservers.
Inourpreviousbook, GameAnalytics (SeifEl-Nasr,Drachen,andCanossa, 2013),weusedthetermgameanalyticsratherthangamedatascienceto denotetheprocessofanalyzingandapplyingdatacollectedthroughoutthe developmentprocess.Here,weadoptedgamedatascienceratherthangame analyticsforseveralreasons,mostimportantly,becauseanalyticsinmany communitiesrelatestobusinessintelligenceormakingdecisionsaboutbusiness
aspectsusingdata.Therefore,thereissometimesaconfusionaboutwhether gameanalyticsrefersonlytotheapplicationofdatasciencetoinformdecisionmakingfortraditionalbusinesspurposesorifitalsocoverstheapplicationof datasciencetoinformdesignprocesses.Becausetheapplicationofdatascience toinformdesignisalargepartofthisbook,we,therefore,willusethebroader andmoreinclusivetermofgamedatascience.Thewayweusethistermdenotes thebreadthofthefieldofknowledgediscoveryusingdatacollectedthrough thegamedesign,development,andpost-launchproductionprocesses.
Gamedatascience,thus,overlapssubstantiallywithotherdata-informed processesingamedevelopment,includingGamesUserResearch(GUR)2, businessintelligenceasitisappliedinthegamesindustry,andmarketingand brandresearch.Whilethereismuchongoingdiscussioninthecommunity aboutwhatexactlygamedatascienceisandisnot,inthisbook,wewilladopt aninclusiveviewpoint,ratherthantryingtosetlimitsaroundtheterm.
Tosummarize,gamedatascienceisthetermweusecollectivelyforthe processofprovidingdata-drivenevidencefordecisionsmadeatvariousparts ofthegamedesign,development,andproductionprocesses.Youcanapplythe toolsandtechniquesofgamedatascienceacrossvirtuallyanyaspectofthegame designanddevelopmentprocesses.
1.2Whatisgamedata?
Agreatvarietyofdatacanbecollected,stored,analyzed,andleveragedto gatherintelligencethroughoutthelifetimeofagametitleorgamecompany. Typicalsourcesofdataincludebehavioraldatafromgames,informationfrom advertisingpartnersandotherthirdparties(i.e.,socialmediaplatforms),and datacollectedfrominfrastructure(suchasservers),thedevelopmentprocess itself,marketing,anduserresearch.
Thesevariedsourcesofdatacanbeusedinmanypartsoftheproduction processtoinformgamedesignanddevelopment,includingunderstanding oroptimizingdevelopers’workflowduringproduction,optimizingserver performanceafterrelease,andtestingtoidentifybugsorplayerengagement. Whileevaluationoftechnicalinfrastructureandplatformcompatibilitycan
2GamesUserResearch(GUR)isafieldofstudythatfocusesonunderstandinguserbehaviors, needs,andmotivationsbyanalyzinghowthedesignofacertainapplicationorgameimpactsits audience.Asyouwillseeinthehistorysectionwithinthischapter,researchersworkinginthisarea arealsotightlycoupledwithgameanalystsassomeoftheprocessesusedbygamesuserresearchers alsousegamedata.
providesubstantialdatasetsthatareimportanttotheoperationofagame,in thisbook,wewillnotfocusonthistopicastheintersectionbetweensoftware engineeringanddatasciencedeservesitsownbook.
Inthisbook,wewillfocuson playerdata.Thedataexamplesandpractical exercisesyouwillfindthroughoutthisbookwilluseplayerdata.Thisisbecause playerdatais,byfar,themostcommonlyusedandavailablesourceofdatain gamedatascience.Therearedifferentformsofplayerdata,includingbehavioral datacollectedinrealtimeasplayersplaythegame,andplayerpreferenceor statistics,suchashowmanygamestheyplayedandtheirranksorscores.The behavioraldatacollectedinrealtimeisoftencalled behavioraltelemetry.
Behavioraltelemetry,inamoregeneralsense,isdatathatweconstantly leaveastrailsthroughalltheactionsweperforminourdailylife:borrowing booksfromalibrary,visitingwebsites,purchasingahouse,workingasa middlemanager,orvacationinginSoutheastAsia.Whetherwedriveacar, amotorcycle,orarickshaw,almostanyactionwetakeinthepublicspace canrepresentasyllableofalongersentencethatcontributestocomposingthe narrativeofourlives.Thedigitaltrailsweleavebehindareeveneasiertocollect. Thewayweuseourphonescreatesaconstantlyevolvingrepresentationofwho weare.
Telemetry basicallymeansdatacollectedfromafar.Inthecontextofgames,as peopleplayagame,wecancollectdataaboutwhattheydointhegame,down tothepressofabuttonormovementofamouse,ifsodesired.Thistypeof userdataiscommonlycollectedacrosstheITsector.Theprocessofcollecting andstoringtelemetrydataiseasierthaneverduetocheapandlargestorage solutions,pervasivedeviceconnectivity,andinstrumentationofsoftwareand hardware.
Withindigitalgames,thetrailscanbesodetailedandcomplexthatthey revealaspectsofplayerpersonalities,motivations,andexperiencesthrough theactionsanddecisionstakenwhenplaying,declaredorinferredpreferences, movementpatterns,andtherelationshipsplayersbuild. Behavioraltelemetry is, withinthescopeofbehavioraldata,themostcommonsourceofinformation wehaveandcertainlythemostvoluminous.Behavioraldataallowsustomove beyondfindingpatternsindatatobegindrawinginferenceaboutthemeaning behinddigitalactions.Understandingwhyplayersdoparticularthingsor behavethewaytheydoisvaluable.Itcanbereadilyappliedtoevaluatingand informingdesign,userexperience,andmonetization.
Thegamesindustryhasinvested,especiallyinrecentyears,considerable effortstoestablishexpertise,implementtools,andbuildprocessesthatcan
leveragetheknowledgeextractedfromanalyzingthetrailsofdatathatplayers leavebehind.Themethods—thetoolsetofa gamedatascientist—inmany waysleveragetheknowledgeandmethodsthatalreadyexist,pioneeredin theriseofbigdata,datascience,andArtificialIntelligence(AI).However,it isimportanttorealizethatgamedatascienceoftenendsupdrawingupon knowledgeinfields,suchasdesign,psychology,sociology,informationsystems, userexperience,oruserresearch,whenitcomestoinformingwhatanalysisto runonplayerdata,howtointerprettheresultsofsuchanalyses,and,perhaps morecrucially,howtotranslatetheresultsintoaction.
Thoughknowledgeandanalyticalapproacheshavegrownrapidly,atthe timeofwritingthisbook,gamedatascienceisinmanywaysstillinitsinfancy. Therearenosetstandardsordefinitionsofmetrics,andmuchoftheavailable knowledgeislockedawayduetotheinherent(proprietary)valueindata.On thepositiveside,thismeansthatnowisanexcitingtimetoworkingamedata science.Italsomeansthatthereisanongoingchallengeindevelopingtoolsand methodsthatcanleverageexpertknowledgetoanalyzeandmakesenseofsuch vastamountsofdataandensurethatnewknowledgeinformsdecision-making thattranslatesintoaction.
1.3Advantagesofgamedatascience
Thebenefitsandadvantagesofintegratinggamedatascienceingamedevelopmentaremanyandfar-reaching.Withamaturedatascienceframework inplace,companieshavetheinstrumentstogainobjectiveknowledgeabout workflowandserverworkloadaswellasgainknowledgeabouttheirplayers, gatherinsightsintowhichelementsofacertaingamearemostpopular,and figureoutatwhatpointplayersstopplaying.Inadditiontoinsightsintodesign, thegamesindustryutilizesknowledgegainedfromdatatoincreaserevenue andimproveplayerexperience.Together,thesetwoissuesdrivebusinessand developmentdecisionssincethevectorsformonetizationandplayerexperience arealigned.Abetteruserexperienceturnsintohighersalesandhigherplayer retention.
Intherealmofacademic gameresearch and seriousgames3,theapplication ofgamedatasciencehasgainedsubstantialmomentum,asitallowscompanies
3Atermusedtodescribegamesdevelopedforpurposesotherthanentertainment,suchas training,promotinghealth,citizenscience,orpsychologicalexperiments.
andresearcherstoanalyzetherelationshipbetweenplayeroruserbehavior andtheoutcomesofsuchbehavior,e.g.,increasedawarenessofatopic,health benefits,orlearning.Thediscoveriesbeingmadeusingdata-driventechniques, suchasinthefieldoflearninganalytics,havemajorimplicationsforeducation andhealth.Citizenscienceandcrowdsourcinggamesalsorelyonsuchmethods toincreaseawareness,retention,andmotivation.
1.4Thehistoricalcontextforgame datascience
Gamedatascienceisinmanywaysarelativelyyoungdomain—especially viewedthroughthelensofacademicresearch.However,theapplicationofdata sciencemethodstodatafromgamesorfromgamecompanieshasexpandedso fastandevolvedsorapidlythatitiseasytooverlookthefactthat,adecadeago, usingmachinelearningalgorithmsongamedatawaslargelyunheardof.The historyofgamedatasciencecanthusbethoughtofasbeingshallowbutbroad.
Ingeneral,thereareseveralchallengestomappingthehistoryofgame datascience.First,thesubstantialamountsofknowledgegeneratedarenot recordedanywherethatispubliclyavailable.Companiesinvestresourcesin businessintelligence,andtheresultsareoftentreatedasconfidentialdueto theirbusinessvalue.Similarly,earlyacademicresearchintheareaispublished acrossadozenormoredomainsandthusisextremelyfragmented.Second, therehasbeenasubstantialparallelgrowthindifferentsectorsandcountries, andthusitishardtosaywhenaspecifictechnologywasdevelopedorhowit influencedthedevelopmentofthefield.Third,anyaccountofthehistorical perspectivewillnaturallybebiasedbythespecificareaoffocusorcommunity thattheauthorcomesfrom.
Tohighlightthechallengesindevelopingahistoricaloverviewofgame datascienceoraspectsofit,wehaveincludedanexercisespecificallyonthis topic(seeexercisesbelow).Inthissection,wewillfocusondiscussingsome ofthefactorsthatwethinkhasaffectedthegrowthofthefieldasweseeit, acknowledgingthatwehaveourownbiases.
Thereareseveralwavesofinnovationwithinthefieldoftechnologyand gamesthathavefacilitatedthedevelopmentofgamedatascience.Theobvious technologyinnovationsincludethedevelopmentofpersonalcomputers,the Internet,thedevelopmentofplatforms,suchasFacebookandSteam,thegrowth
ofserveranddatabasetechnologies,computingcapacity,andmachinelearning aswellastherecentdevelopmentsindeeplearning.Below,wediscusssomeof whatwethinkareimportantlandmarksthatledtothedevelopmentofgame datascienceasitstandstoday.
1.4.1TheriseoftheMMOG
Thereareaccounts,fromveryearlygametitles,ofplayerdatabeinggathered. However,priortotheintroductionoffirstMulti-UserDungeons(MUDs) andthenMassivelyMultiplayerOnlineGames(MMOGs),theapplicationof suchdataasanexternalprocess,towardinformingdesign,systems,virtual economies,andotheraspectsofthegameworld,hasbeenfragmentedatbest. WithMMOGs,suchas UltimaOnline,thereemergedaneedformonitoring apersistentgameworld,itsuserbase,andhowthatuserbasemighteven engageinout-of-gametrading(e.g.,selling UltimaOnline characters).MMOG economiesweredesignedandtested,andaccounts,suchastheonebySimpson (2000),showthatgamedatainformedpartofsuchdevelopment,albeitat asimplerlevelthanthekindsofeconomicanalysesthatareoftenrunon contemporaryMMOGs.
Ontheacademicside,earlyanalyticalworkonMMOGswasdevelopedin parallelwithsuchworkintheindustry.MMOGeconomiesandtheiranalysis weregivensubstantialvisibilitybyCastronova,whopublishedworkabout Everquest in2001documentinghowsyntheticworldsandtheireconomies operate,concludingthattheGrossNationalProduct(GNP)oftheseearlygame worldscouldrivalsomereal-worldcountries(Castronova,2001).Fromthis andothercontemporaryworks—e.g.,byWilliamsetal.(2011),Ducheneaut etal.(2006),Yee(2006),andDibbell(2006),aswellasthereleaseof Second Life andothervirtualworlds—broadpublicattentionhasemergedontheuse ofgameworldsandtheopportunitiestheyprovideastoolstoanalyzeplayer behavior.Thiswasturbo-chargedwiththereleaseof WorldofWarcraft andthe impressivesubscriptionnumbersitreached,bringingMassivelyMultiplayer OnlineRole-PlayingGames(MMORPGs)intopublicconsciousness,atleast intheWesternworld; Lineage and GuildWars,inAsiaandbeyond,also deservecredit.
Withtheemergenceofearlyanalyticalworkongamesduringtheyears 2003–2006,suddenly,manyresearchersrealizedthatvirtualworldsprovided fertilespacesforresearchacrosseconomics,behavioralscience,psychology, networklatency,andmore.Aroundtheseyears,someearlyworkssurfaced
acrossindustryandacademiathatshowcasedhowin-gameplayerbehavior couldbeanalyzedforvariouspurposes.Therewerealsomanyexamplesofhow playersthemselvesminedthegamesfordata,e.g.,tobuildonlineguidesand sitesaboutquestsorresourceharvesting.Ingeneral,thereexistedadegreeof dataaccessinMMOGsandinothergamesthatwasnotoftenseeninotherdataheavyITsectors.Itshouldbenotedthattheanalyticalmethodsatthetimewere stilllargelyconfinedtostatisticsand(simple)economicmodeling.
1.4.2Socialnetworkgames
Anotherangleonhowgamedatascienceemergedistheriseofonlinesocial networkplatforms,suchasFacebook.Theproliferationofsocialnetworksled totheemergenceofanewtypeofgame,the SocialNetworkGame (SNG) (Alsénetal.,2016).SNGscouldtapintosocialnetworkdataandusefree-to-play strategiestodrivemonetization,breakingwiththetraditionalretailsalesmodels.Duetotheabundanceofdataavailablefromthesocialnetworkplatforms, andtherequirementtomonitorin-gamebehaviorduetothemonetization strategyadopted,SNGshadabuilt-inimperativeforanalyzingplayerdata.This urgencybroughtanalytics(SeifEl-Nasretal.,2013)totheforefrontofthegames industrybyaround2007–2010.Termssuchas monetization,funnelanalysis, onboardingresearch,First-TimeUserExperience (FTUE),andothersstarted becomingcommonplace.In2011,oneofthefirstbooksaddressingthismarket waspublished,whichincludedalistofimportantmonetizationmetrics,such as DailyActiveUser (DAU)and AverageRevenuePerUser (ARPU)(Fieldsand Cotton,2011).
1.4.3Democratizingdatacollection
Afactorarisingbyaround2010onwardwasthedemocratizationofmetrics collection.Thankstotechnologicalinnovationsoutsidethegamesindustry andtheemergenceofnumerousstart-upcompaniesthatprovidedSoftware asService(SaaS)analyticsplatforms,suchasDeltaDNA,GameAnalytics, Swrve,NinjaMetrics,and,lateron,YokozunaDataandothers.Suchplatforms providedtoolsandcasestudiesshowingtheanalyticsprocess,whichunpacked theprocessofcollectionandanalysisofbehavioraltelemetry.Severalarticles publishedintechmagazinesdiscusshowcompanies,suchasWooga,Zynga, Microsoft,andUbisoft,usetelemetrydata,givingusmoreexamplesandcase studiesonhowananalyticsprocessisimplemented.
1.4.4GamesUserResearch(GUR)
AroundthesametimethatSNGsmadetheirfirstappearance,GURstartedto becomeamainpartofthegamedevelopmentprocess.ThehistoryofGUR isdocumentedinDrachen,Mirza-Babaei,andNacke(2018)andIsbisterand Schaffer(2008).Itisinterestingtonotethattheapplicationofbehavioral telemetrytoinformgamedesignwithinthecontextofAAA4gameswasdriven, toanextent,byuserresearch.Inthemid-2000s,Microsoft’sUserResearch divisiontookgameusertestingseriously,adaptingtechniquesfromthedomain ofHuman–ComputerInteraction(HCI)anddevelopingnewonestospecificallyworkintheuserexperience-focusedgameenvironments.Tothenew field’sbenefit,Microsoft,Bungie,andothercompaniesdiscussedtheirwork andmethods,e.g.,inThompson’sfamous Wired articlein2007(Thompson, 2007).Amayaetal.(2008)notablydetailedtheworkofMicrosoftUserResearch thatintegrateduserresearchwithautomatedrecordingofuserbehavior.Importantly,thisresearchandtheideasitpropagatedenhancedtheroleofbehavioral telemetryasausefulsourceofknowledge.Aroundthesametime,leadingup to2010,severalkeyblogposts,whitepapers,andpresentationsattheGame DevelopersConferenceshowcasedhowtheindustryatlargewasexploring gamedatascienceandbuildingnewtechnologies,methods,andideas.
AnimportantmilestoneinGURanditsinfluenceongamedatascienceis thestartoftheInternationalGameDevelopersAssociation’sGameResearch andUserExperienceSpecialInterestGroup(GRUXSIG)in2012.Thisspecialinterestgroupstartedorganizingandconnectinggamesuserresearchers acrossindustryandacademia,buildingsummits,andfacilitatingknowledge exchange.Thisincrediblywelcomingcommunityhadasignificanteffecton GURand,byextension,gamedatascience.Thegrouptodaycountsmore than2,100membersworldwideandhostsmultipleannualsummits(seeGRUX SIG,2015).
1.4.5GamesasaService
Moregamesdevelopedinthepastfewyearshavefocusedonbeingonline andpersistent,withdownloadablecontent,patches,andupdatesextending thelifetimeofthesegames.HavingaLiveOperations(LiveOps)teamfor
4AAAtitlesaregamesthattypicallyhaveahighermarkinganddevelopmentbudget.