Full Download Lexical variation and change: a distributional semantic approach geeraerts PDF DOCX

Page 1


https://ebookmass.com/product/lexical-variation-and-change-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Language Change, Variation, and Universals Culicover

https://ebookmass.com/product/language-change-variation-anduniversals-culicover/

ebookmass.com

An Anatomy of Chinese Offensive Words: A Lexical and Semantic Analysis 1st ed. 2021 Edition Adrian Tien

https://ebookmass.com/product/an-anatomy-of-chinese-offensive-words-alexical-and-semantic-analysis-1st-ed-2021-edition-adrian-tien/

ebookmass.com

Visual Experience: A Semantic Approach 1st Edition Wylie Breckenridge

https://ebookmass.com/product/visual-experience-a-semanticapproach-1st-edition-wylie-breckenridge/

ebookmass.com

The Irish Rogue (The Billionaire's Club Book 3) Elizabeth Lennox

https://ebookmass.com/product/the-irish-rogue-the-billionaires-clubbook-3-elizabeth-lennox/

ebookmass.com

Cummings Review of Otolaryngology 1st Edition Harrison W. Lin

https://ebookmass.com/product/cummings-review-of-otolaryngology-1stedition-harrison-w-lin/

ebookmass.com

Cupid's Love (Seasonal Paranormal and Fantasy Romances Book 3) Amelia Shaw

https://ebookmass.com/product/cupids-love-seasonal-paranormal-andfantasy-romances-book-3-amelia-shaw/

ebookmass.com

(eTextbook PDF) for Liberty, Equality, Power: A History of the American People 7th Edition

https://ebookmass.com/product/etextbook-pdf-for-liberty-equalitypower-a-history-of-the-american-people-7th-edition/

ebookmass.com

Discovering Psychology Eighth Edition – Ebook PDF Version

https://ebookmass.com/product/discovering-psychology-eighth-editionebook-pdf-version/

ebookmass.com

Banks and Fintech on Platform Economies Paolo Sironi

https://ebookmass.com/product/banks-and-fintech-on-platform-economiespaolo-sironi/

ebookmass.com

Strategic Management: Theory & Cases: An Integrated Approach, 13e 13th Edition

https://ebookmass.com/product/strategic-management-theory-cases-anintegrated-approach-13e-13th-edition-charles-w-l-hill/

ebookmass.com

LexicalVariationandChange

LexicalVariation andChange

ADistributionalSemanticApproach

DIRKGEERAERTS

DIRKSPEELMAN

KRISHEYLEN

MARIANAMONTES

STEFANODEPASCALE

KARLIENFRANCO

MICHAELLANG

GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom

OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries

©DirkGeeraerts,DirkSpeelman,KrisHeylen,MarianaMontes,StefanoDePascale, KarlienFranco,andMichaelLang2024

Themoralrightsoftheauthorshavebeenasserted Somerightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,forcommercialpurposes, withoutthepriorpermissioninwritingofOxfordUniversityPress,orasexpressly permittedbylaw,bylicenceorundertermsagreedwiththeappropriate reprographicsrightsorganization.

Thisisanopenaccesspublication,availableonlineanddistributedunderthetermsofa CreativeCommonsAttribution—NonCommercial—NoDerivatives4.0 Internationallicence(CCBY-NC-ND4.0),acopyofwhichisavailableat http://creativecommons.org/licenses/by-nc-nd/4.0/.

Enquiriesconcerningreproductionoutsidethescopeofthislicence shouldbesenttotheRightsDepartment,OxfordUniversityPress,attheaddressabove

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica

BritishLibraryCataloguinginPublicationData Dataavailable

LibraryofCongressControlNumber:2023937657

ISBN9780198890676

DOI:10.1093/oso/9780198890676.001.0001

Printedandboundby

CPIGroup(UK)Ltd,Croydon,CR04YY

LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.

PARTI.THEORETICALPRELIMINARIES

PARTII.DISTRIBUTIONALMETHODOLOGY

3.Parametersandproceduresfortoken-baseddistributionalsemantics

5.Makingsenseofdistributionalsemantics

6.Theinterplayofsemasiologyandonomasiology

PARTIV.LECTOMETRICMETHODOLOGY

7.Quantifyinglectalstructureandchange

PARTV.LECTOMETRICEXPLORATIONS

1.1Researchperspectiveswithinthelexeme-lection-lecttriangle 5

2.1Semasiologicalstructureof vest 35

2.2Graphicalrepresentationofthestepsinthedistributionalworkflow

3.12DrepresentationofDutch hachelijk ‘dangerous/critical’ 70

3.2Syntactictreeofexample(3.1)

3.3Syntactictreeofexample(3.2)

4.1Two2DrepresentationsofthesamemodelofDutch hachelijk ‘dangerous/critical’ 90

4.2Portalof https://qlvl.github.io/NephoVis/ asofAugust2022 94

4.3Level1for heffen ‘tolevy/tolift’ 95

4.4Level2forthemedoidsof heffen ‘tolevy/tolift’ 96

4.5Level1for heffen ‘tolevy/tolift’.Theplotiscolour-codedwithfirst-order part-of-speechsettings;NAstandsformissingdata,inthiscasethe dependencybasedmodels 97

4.6Level1for heffen ‘tolevy/tolift’withmedoidshighlighted 98

4.7Level2forthemedoidsof heffen ‘tolevy/tolift’,colour-codedwith categoriesfrommanualannotation.Hoveringoveratokenshowsits concordanceline 98

4.8Heatmapofdistancesbetweenmedoidsof heffen ‘tolevy/tolift’againstthe backdropofLevel2 99

4.9Heatmapofdistancesbetweenmedoidsof haten ‘tohate’againstthe backdropofLevel2 100

4.10Level2forthemedoidsof heffen ‘tolevy/tolift’,colour-codedwith categoriesfrommanualannotation.Brushingoveranareainaplotselects thetokensinthatareaandtheirpositionsinothermodels 101

4.11Frequencytableofcontextwordsofselectedtokensagainstthebackdropof Level2(medoidsof heffen ‘tolevy/tolift’) 102

4.12Level3forthethirdmedoidof heffen ‘tolevy/tolift’,withparameters 10-10.ALL.BOUND.WEIGHT.SOCALL.FOC 104

4.13Level3forthesecondmedoidof heffen ‘tolevy/tolift’,withparameters 10-10.ALL.BOUND.WEIGHT.SOCALL.FOC 104

4.14StartingviewoftheShinyAppdashboard,extensionofLevel3 106

4.15Topboxesofthe‘t-SNE’taboftheShinyAppdashboard,withactivetooltips

4.16Token-levelplotandbottomfirst-ordercontextwordsplotofthe‘t-SNE’ taboftheShinyAppdashboard,withonecontextwordselected

4.17Heatmapoftype-leveldistancesbetweenrelevantcontextwordsinthe ShinyAppdashboard 108

5.1Variableimportancepredictingdistancesbetweenallmodels 116

5.2Variableimportancepredictingaccuracyofmodels 117

5.3Conditionaltreepredictingtheaccuracyof herinneren ‘toremember/to remind’modelsaskNN 118

5.4Conditionaltreepredictingtheaccuracyof huldigen ‘tobelieve/tohonour’ modelsaskNN 118

5.5Modelsof heet ‘hot’and stof ‘substance,dust…’withparameters 5-5.LEX.BOUND.SELECTION.SOCALL.FOC 119

5.6Modelsof dof ‘dull’and huldigen ‘tobelieve/tohonour’withparameters 5-5.LEX.BOUND.SELECTION.SOCALL.FOC 120

5.7Modelsof haten ‘tohate’and hoop ‘hope,heap’withparameters 5-5.LEX.BOUND.SELECTION.SOCALL.FOC 121

5.8Modelof heilzaam withparameters10-10.ALL.BOUND.WEIGHT.SOCALL.FOC. Circlesare‘healthy,healing’,trianglesare‘beneficial’ingeneral 128

5.9Modelof herstructureren withparameters

3-3.ALL.BOUND.SELECTION.SOCALL.FOC

5.10Modelof grijs withparameters5-5.ALL.BOUND.ASSOCNO.SOCALL.FOC

5.11Modelof herroepen withparameters3-3.ALL.BOUND.SELECTION.SOCALL.FOC

5.12Modelof blik withparameters5-5.ALL.BOUND.WEIGHT.SOCNAV.5000

5.13Modelof schaal withparameters5-5.ALL.NOBOUND.WEIGHT.SOCALL.FOC

5.14Modelof herhalen withparametersREL1.SELECTION.SOCALL.FOC

5.15Modelof haken withparameters10-10.LEX.BOUND.SELECTION.SOCNAV.FOC

5.16Modelof huldigen withparameters 3-3.LEX.NOBOUND.SELECTION.SOCALL.FOC

5.17Networkofcontextwordsofthe huldigen ‘tohonour’cluster

5.18Modelof heffen withparameters10-10.ALL.BOUND.WEIGHT.SOCNAV.FOC

5.19Modelof hachelijk withparameters5-5.ALL.BOUND.WEIGHT.SOCALL.FOC

5.20Modelof herinneren withparameters 10-10.ALL.BOUND.WEIGHT.SOCNAV.5000

5.21Modelof heet withparameters5-5.ALL.BOUND.ASSOCNO.SOCALL.FOC

5.22Modelof stof withparameters5-5.LEX.BOUND.SELECTION.SOCALL.FOC

5.23Modelof horde withparameters5-5.ALL.BOUND.SELECTION.SOCALL.FOC

5.24Modelof geldig withparameters10-10.LEX.BOUND.SELECTION.SOCALL.FOC

6.1Modelsfor woedend and laaiend 157

6.2Modelsfor briljant and geniaal 158

6.3Scatterplotoft-SNEvisualizationofonemodelof vernielen and vernietigen, colouredbyfourclustersandshape-codedbyvariant 164

6.4Scatterplotwithcoloursformanualcodingofagenttypeandshapesfor variants 165

6.5Scatterplotwithcoloursformanualcodingofpatienttypeandshapesfor variants 166

6.6Clusteranalysesof vernielen and vernietigen inthefourdiachronic subcorpora 172

7.1Hierarchicaldestandardizationasincreasingdistancebetweenstrata 192

7.2Informalizationastop-downdecreasingdistancebetweenstrata 192

7.3Dehomogenizationasincreasingvariationwithinonestratum 193

8.1VisualizationofamodelforSECONDARY 211

8.2Exampleworkflowforcalculatingdistributionaltokenstability 216

9.1Hierarchical(de)standardizationscoresinBelgianDutch(caterpillarplot with‘models’ony-axis) 239

9.2Hierarchical(de)standardizationinNetherlandicDutch(caterpillarplot with‘models’ony-axis) 240

9.3Hierarchical(de)standardizationscoresinBelgianDutch(caterpillarplot with‘concepts’ony-axis) 241

9.4Hierarchical(de)standardizationscoresinNetherlandicDutch(caterpillar plotwith‘concepts’ony-axis) 241

9.5Hierarchical(de)standardizationscoresacrosssemanticfields

9.6(In)formalizationscoresforBelgianDutch(caterpillarplotwith‘models’ ony-axis) 245

9.7(In)formalizationscoresforNetherlandicDutch(caterpillarplotwith ‘models’ony-axis) 245

9.8(In)formalizationscoresforBelgianDutch(caterpillarplotwith‘concepts’ ony-axis)

9.9(In)formalizationscoresforNetherlandicDutch(caterpillarplotwith ‘concepts’ony-axis) 246

9.10(In)formalizationscoresacrosssemanticfields

9.11(De)homogenizationscoresforBelgianDutch(caterpillarplotwith ‘models’ony-axis)

9.12(De)homogenizationscoresforNetherlandicDutch(caterpillarplotswith ‘models’ony-axis)

9.13(De)homogenizationscoresforBelgianDutch(caterpillarplotswith ‘concepts’ony-axis) 250

9.14(De)homogenizationscoresforNetherlandicDutch(caterpillarplotwith ‘concepts’ony-axis)

9.15(De)homogenizationscoresacrosssemanticfields

ColourversionsoffigurescanbeconsultedviathefreePDFdownloadat https://global.oup.com/academic/product/lexical-variation-and-change-978019 8890676orviaOUP’sonlineplatformathttps://doi.org/10.1093/oso/9780198 890676.001.0001.

1.1Terminologicaldistinctionsindenotationallyexpandedlexicology: phenomena 11

1.2Terminologicaldistinctionsindenotationallyexpandedlexicology:subfields

1.3Differencesinconceptualonomasiologicalsalienceamongco-hyponyms

1.4Structuralandusage-orientedperspectivesinlexicalresearch 15

1.5OnomasiologicalprofilesforNONSENSEinthefictitiousTzaraandBall dialects 18

1.6OnomasiologicalprofilesforNONSENSEinthefictitiousArpandPicabia dialects

2.1Partialmatrixunderlyingtheanalysisof vest inFigure2.1

3.1Exampleoftype-levelvectors

3.2Smallexampleoftoken-levelvectorsofthreeinstancesof

5.1Examplesofsyntagmatic(columns)andparadigmatic(rows)perspectives onthelinguisticinterpretationofclouds 125

6.1Codingschemaforagentandpatientexpression

6.2Clustersinthemodelfor vernielen and vernietigen incontemporarydata 164

6.3Inflectedformsandspellingvariantsoccurringfor vernielen and vernietigen inthediachroniccorpus 169

6.4Frequencyof vernielen and vernietigen,andtotalnumberoftokensper century 170

6.5Parametersettingsinthecontemporarystudycomparedtoparameter settingsinthediachronicstudy 171

6.6Clustersinthemodelfor vernielen and vernietigen inthe16thand17th centuries 173

6.7Clustersinthemodelfor vernielen and vernietigen inthe18thcentury 174

6.8Clustersinthemodelfor vernielen and vernietigen inthe19thcentury 175

6.9Clustersinthemodelfor vernielen and vernietigen inthe20thcentury 176

6.10Summaryoftheclusteranalysesof vernielen and vernietigen inthe subcorpora 177

9.1CorpuscompositionfortheDutchstandardizationstudy

9.3OverviewofparametersfortheDutchstandardizationstudy

9.4Downsamplingschemeforconceptsizes

9.5OverviewofdestandardizationscoresinBelgianDutchandNetherlandic Dutch

9.6Summaryofstandardlanguagechangescoresacrosssemanticfields

10.1SizesofthesixlectsintheWeb/DialectscorpusintheCorpusdelEspañol

10.3OverviewofparametersfortheSpanishpluricentricitystudy

10.4Conceptsforwhichnomodelswereretained

10.5Completelyuniformconceptsresultingfromtheapplicationofa significancetesttothelectalcomparisons

10.6Overviewoftheconceptsbymodelretention

10.7U-valuesforallthree AnyMod tokensetsfrombothapan-Hispanicand pan-Americanperspective

10.8U-valuesforallthree 19+Mod tokensetsfrombothapan-Hispanicand pan-Americanperspective

10.9U-valuesforallthree<18Mod tokensetsfrombothapan-Hispanicand pan-Americanperspective

10.10Characteristicsofthethreegroupsbasedonmodelretention

Introduction

Incorpuslinguistics,distributionalsemanticsembodiestheideathatthecontext inwhichawordoccursrevealsthemeaningofthatword.Bywayofillustration,considerthewords underground and subway,bothreferringtosubterranean railwaysystems.Thesynonymyrelationshipthatexistsbetweenthewordsmay berecognizeddistributionallybecausetheybothco-occurfrequentlywithwords like line, station, terminal, urban, crosstown, northbound, passenger, transit, train, run, operate.Thatistosay,thesimilardistributionofthewords underground and subway overcontextsfeaturingitemslike line, station, terminal,andsoon tellsussomethingaboutthemeaningofthetwowords.Importantly,thereare computationaltechniquesthatallowustoidentifythesimilarityinthedistributionalpatterningof underground and subway.Thosetechniquescanrecognize that underground and subway aresemanticallycloserthan,say, subway and sunshine.But underground alsohasthemeaning‘asecretorganizationfightingthe establishedgovernmentoroccupationforces’,whichco-occurswithwordslike clandestine,resistance,insurrection,attack,army,hidden,andwhichthusblursthe synonymyrelationshipwith subway. Amorefine-graineddistributionalapproach thentriestomodel,nottheoverallsimilaritybetween underground and subway, butthesimilaritybetweentheoccurrencesof underground inthesense‘subterraneanrailway’andthoseinthesense‘resistancemovement’.Suchamoredetailed typeofdistributionalsemanticsiscalleda token-based approach,whereatokenis anyofthespecificoccurrencesofthewords,incontrastwithatype-basedapproach thatonlylooksatthelevelofthewordsasawhole.Computationally,token-based approachesgroupoccurrencestogetherbasedontheirsemantic(read:distributional)similarity,justlikeatype-basedapproachgroupswordsassuchtogether. Sointhecaseof underground,youexpecttocomeacrossagroupoftokensfor the‘subterraneanrailway’senseandanotherforthe‘resistancemovement’sense, andwhenyouaddtheoccurrencesof subway tothemodel,youexpecttofind themintermingledwiththegroupof underground tokensthatrepresentsthe‘subterraneanrailway’sense.Ifwerefertosuchclustersofgrouped-togethertokens asclouds—tokenclouds—thenthedistributionalapproachconsistsofanalysing configurationsoftokencloudstoseewhatlighttheyshedonthemeaningsofthe expressions.

Onemajorgoalofthepresentmonograph,then,istoexploretheinsandouts ofadistributional,token-cloud-basedapproachtowordmeaning.Whatdoesit involve,inwhatflavoursdoesitcome,howefficientlycanitbeimplemented,and whatexactlyisitssemanticimport?Thestakesforcorpussemanticsarehigh:if

distributionalmodellingatthelevelofindividualtokensofwordsworkswell,the automatedorsemi-automatedanalysisofmeaninginlargetextcorporacanbe broughttoanextlevelofdetailandprecision.Thereisalsoaverypracticalside tothemethodologicalobjectivesofthebook.Thetoolsandalgorithmsthatwe willusearemadeavailableforpublicuse,andsothebookcanalsobeseenas aportfolioofsamplestudiesthatmightinspireotherresearchers.Atthesame time,wewillpointouttherestrictionsonthekindofdistributionalmodelling thatwehaveimplementedandargueforsomecautionregardingitsintroduction inlinguisticsemantics.Itturnsoutthatthesemanticinformationpickedupbydistributionalmodelsdoesnotcorrespondinastableandstraightforwardwaywith theinformationalinguistmaybelookingforandthisrecognitioncallsforspecific measuresastohowdistributionalmodelsmaybeincorporatedintoalinguistic workflow.

Butapartfromthismethodologicalpurpose,thebookhasanequallyimportant theoreticalgoal.Ourexplorationofdistributionalsemanticscontinuesalexicologicallineofresearchthatwasdevelopedoverthepastquartercenturyinthe QuantitativeLexicologyandVariationalLinguistics(QLVL)researchgroupat theUniversityofLeuven.Situatedwithinthebroadcontextofcognitivelinguistics,thisresearchlinetranslatesthecognitivelinguisticinterestincategorization phenomenaandsemanticvariabilityintoaresearchprogrammethattakesthe interplayofsemasiological,onomasiological,andlectalvariationasitscorequestion.Tobrieflyandsimplisticallyunpackthisterminologicaltriad(detailsfollow inaseparatechapter):semasiologicalvariationlooksfromawordtoitsmeanings;itstudiespolysemy,likethevarioussensesof underground.Onomasiology reversestheperspectiveanddescribeshowagivenmeaningcanbeexpressedby variouswords,likethesynonymyofundergroundandsubwayinthe‘subterranean railway’sense.Lectalvariationinvolvesthewayinwhichdiversityalongsociolinguistic,stylistic,geographical,andsoondimensionsinfluencessemasiologicaland onomasiologicalphenomena,liketheobservationthat underground istypically BritishEnglishand subway typicallyAmericanEnglish.Thislectalperspective includesaso-calledlectometricone:measuringthefrequenciesof underground andsubwayasexpressionsfor‘subterraneanrailway’inBritishandAmericantexts allowsustocalculatehowcloselexicalusageinthetwovarietiesiswithregard toeachother,andtoaddressthequestionwhethertheyaregrowingtogetheror apart.Thepresentvolumewilldetailthisframeworkandexaminehowtokenbaseddistributionaltechniquesmightbeusedtoscaleuptheresearchtothelevel oflarge-scalecorpora.Althoughwewillnotexhaustivelycoverallthedimensions oftheprogramme,thevariousstudiesshowcasingthedistributionalmethodwill treatcrucialcomponentsofthetheoreticalframeofreference:thedetectionof polysemy,theinterplayofsemasiologicalandonomasiologicalvariation,thetreatmentoflexicalvariationasasociolinguisticvariable,andtheuseofthosevariables tomeasureconvergenceordivergencebetweenlanguagevarieties.

Thebookisstructuredinfivepartsoftwochapterseach.Thefirstsetoftwo chapters,Theoreticalpreliminaries,introducestheframework.Chapter1describes thevariousperspectivesthatmaybetakeninlexicalvariationresearch,andhow thesehavesofarbeencoveredinexistingresearch.Chapter2laysouttheconceptualfoundationsofatoken-baseddistributionalmethod.Theremainingeight chaptersfallintotwogroups.Afirstsetoftwotimestwochaptersdealswith semasiologyandonomasiology,thatis,withtherelationshipbetweenlexical expressionsandtheirmeanings,andhowthismaydifferoverchronologicalperiodsandlanguagevarieties.Asecondgroupoftwotimestwochaptersreverses theperspective.InChapters3to6,weareinterestedinhowlectalvariationmay influencelexicalvariation.InChapters7to10,weareinterestedinwhatlexical variationhastosayaboutlectalvariation.Ineachsetoftwotimestwochapters, thefirstpairofchaptersisdevotedtomethodologicalissueswhilethesecond pairillustratesthemethodologywithcasestudies.Accordingly,the Distributional methodology partintroduces,inChapter3,thetechnicalspecificsofthedistributionalsemanticworkflowwewilluse,andinChapter4thevisualizationtoolthat wehavedevelopedtoexploreitsoutcome.Thechaptersinthe Semasiologicaland onomasiologicalexplorations partputthisexplorationintopractice.UsingDutch materials,Chapter5examineshowfaradistributionalapproachcantakeusonthe pathofsemanticanalysis,andChapter6appliesthedistributionalmethodtothe interplayofsemasiologyandonomasiologyinlexicalsemanticchange.Thefinal fourchaptersaresimilarlysplitupbetweentwomethodologicalandtwodescriptivechapters.The Lectometricmethodology partintroducesthevariousstepsin alectometricworkflow.WhileChapter7introducestheformulaethatuselexical variationtoquantifytherelationshipbetweenlanguagevarieties,Chapter8specifieshowatoken-baseddistributionalmethodidentifiesthesetsofsynonymous expressionsthatprovidethebasisforthatquantification.Thechaptersinthefinal part,Lectometricexplorations,illustratethelectometricworkflow.Chapter9looks diachronicallyattheevolutionofDutch.Chapter10presentsasynchronicview ofinternationalvarietiesofSpanish.Thebookcloseswithaconclusiondetailinginwhatwaystheresearchprogrammecanbefurtherdeveloped—andreaders beware:thereareplentyofthem.

Inlightofthisoverview,webelievethebookoffersthefollowinguniqueand innovativefeatures.First,itpresentsa comprehensiveviewoflexicalvariation, basedonthedistinctionbetweensemasiologyandonomasiology,andtheaddition ofalectaldimension.Bydescribinghowthesedistinctionsdefinedifferentperspectivesforlexicalresearch,andhowthedifferentphenomenainteract,thebook drawsamoreadequatepictureoftherichnessandcomplexityoflexicalphenomenathancanbefoundintheexistingliterature.Inparticular,bytreatinglexical variationasasociolinguisticvariableinthesenseofvariationistsociolinguistics, therelationshipbetweenlanguagevarietiescanbequantifiedatanaggregatelevel

basedonsuchvariables.Themonographshowshowsuchalexicallectometrycan bedeveloped,andhowitcanprofitfromdistributionalmethods.

Second,bycomparingthesemanticclassificationsproducedbycount-based distributionalmodelswithmanuallyannotateddisambiguateddata,weoffera criticalinsightintothemachineryofdistributionalmodelling.Whereasacomputationalperspectiveondistributionalmethodsisprimarilyconcernedwiththeir successinmodellinglinguisticphenomena,weaimforadeeperunderstandingof themechanismsbehindthoseresults:howtechnicalchoiceswithregardtothedistributionalprocessinfluencewhichtextualinformationispickedupbythemodels, andhowthatrelatestoahumaninterpretationofthedata.Crucially,ouranalysisdemonstrates,first,thatthereisnoone-to-onerelationshipbetweenthetoken clustersthatfalloutofadistributionalmodellingandwhatwouldtraditionallybe considereddifferentsenses,andsecond,thatthereisnosinglechoiceofmodelbuildingparametersthatisoptimalacrosstheboard,thatis,thatyieldsthebest possiblesolution(theoneclosesttoahumanperspective)foranylexicalitem.

Third,thebookisaccompaniedbyasetof digitaltools supportingtheanalytic workflowsdemonstratedinthecasestudies.Ontheonehand,someofthesetools involvePython3andRpackagesusedtoextractinformationfromcorpora,create distributionalmodels,andapplyclusteringandotherstatistical,viz.lectometric, analyses.Ontheother,visualizationtoolshavebeendevelopedwithinthecontextofthesemasiologicalworkflowforthequalitativeexaminationoftoken-level models.Theavailabilityofthesetoolsgreatlyenhancestherelevanceofthebook asasourceoffurtherresearch.

Theseassetssuggestforwhichgroupsofreadersthemonographmaybeof interest.Semanticistsandlexicologistswillbeinterestedintheformulationof acomprehensiveviewoflexicalvariation,intheexplorationofthepossibilitiesandlimitsoftoken-baseddistributionalsemantics,andinthetoolswe offerfortheincorporationoftoken-baseddistributionalmodellinginlexicaland semanticresearch.Computationallinguistswillbeinterestedinthedistributional workflowsweoffer,withtheiraccompanyingtools,andourexplorationofthepossibilitiesandlimitsofatoken-baseddistributionalapproach.Sociolinguistsand historicallinguistswillbeinterestedinourtreatmentoflexicalvariationasasociolinguisticvariable,andthesynchronicanddiachroniclexicallectometrybased onit.

Becauseweintendtoreachadiverseaudienceoflinguists,thetextiswritten withminimalassumptionsregardingbackgroundknowledge.Specifically,thefirst twochaptersaremeanttobridgethegapbetweendescriptivelyorientedlinguists, whomayneedanintroductiontothemodusoperandiofdistributionalsemantics, andmoretechnicallymindedresearchers,whomaybeunfamiliarwiththevariety ofperspectivesindescriptivelexicalandsemanticresearch.Inaddition,because thetrajectorywewilldescribeisonewithmanyoptionalturnsandsideways,we

willendeachchapterwithasummarythatwillhelpthereadertotracktheprogress oftheargument.

TheprojectfromwhichthismonographemanateswasfundedbytheResearch CounciloftheUniversityofLeuven(projectC16/15/023,withDirkGeeraertsas principalinvestigator).Apartfromtheauthorsofthepresentvolume,participants intheprojectincludedBenediktSzmrecsanyi,StefaniaMarzo,WeiweiZhang,Tao Chen,ChristianAndersen,andKristinaGeeraert.Althoughthepresenttextisa collectiveproduct,resultingfromseveralyearsofjointresearchefforts,theauthors havecontributedindifferentdegreestothevariouschapters.DirkGeeraertswas leadauthorforChapters1,2,and7,MarianaMontesforChapters4and5,andfor Chapter3togetherwithKrisHeylen.KarlienFrancotooktheleadforChapter6, StefanoDePascaleforChapter9,andMichaelLangforChapter10.StefanoDe PascaleandKarlienFrancowerejointlyresponsibleforChapter8.

PARTI

THEORETICALPRELIMINARIES

Twointerwovenstrandsofresearchdeterminetheorganizationofourmonograph:adescriptiveone,focusingonlexicalvariation,andamethodologicalone, focusingondistributionalcorpussemantics.Inthisfirstpartofthebook,two chapterspresentthebasicsandthebackgroundofbothstrands,with Chapter1 introducingthedescriptiveframework,andChapter 2 informallyexplainingthe essentialsofdistributionalvectorsemantics.Bothchaptersnotonlylayoutthe conceptualgroundworkforthesetopics,butalsosituatetheminawidercontext ofexistinglinguisticresearch.

1

Lexicalvariationandthe lexeme-lection-lecttriangle

Asourinvestigationissituatedatthecrossroadsoflexicalvariationresearchand distributionalsemantics,wehaveadoublebackgroundtodescribe.Inthischapter, weintroducethefirstofthesetwobackdrops:whatmodeloflexicalvariationdo westartfrom,wheredowesituateourownresearchwithinthatfield,andhow dowerelatetopreviousresearch?Thefirstsectionofthechapterchartsvarious conceptualperspectivesthatmaybetakeninlexicalvariationstudies;specifiesthe focusofourresearchinlightofthosealternatives;andindicateshowourchoice ofperspectivetranslatesintothestructureofthemonograph.Thesecondand thirdsectionthendetailourchoiceoffocus.Thethirdsectioninparticularintroducesthelectometricperspectivethatplaysacentralroleinlaterchapters,from Chapter 7 onward.Thefinaltwosectionssketchtheresearchbackground:onone hand,lexicalstudiesinthebroadercontextoflinguisticvariationresearch,onthe other,ourlocalresearchcontext.Thepresentstudycontinuesalong-termresearch linewithintheQuantitativeLexicologyandVariationalLinguisticsresearchgroup attheUniversityofLeuven,andaccordingly,weneedtoprovidesomedetailabout previousworkandhowthepresentapproachbuildsonearlierachievements.

1.1 Choicesoflexicologicalperspective

Imagineapairoftrousersendingjustbelowtheknee,tightenedroundthelegso thatthebottomendisslightlybaggy.Howwouldtheybecalled?Severalterms exist: knickerbockers, knickers,and breeches.Atthesametime,theycouldsimplybereferredtoas trousers,butthentheiteminquestionwouldbecategorized differently.Itwouldthennotbeidentifiedasamemberofthespecificcategory BREECHES‘pairoftrousersendingjustbelowtheknee,tightenedroundtheleg (etc.)’thatreceivesaunique,category-specificnamewithknickerbockersorknickers or breeches,butitwouldbeidentifiedasamemberofthebroadercategory TROUSERS‘garmentextendingfromthewaistdowntothekneeortheankle, coveringeachlegseparately’.(Typographically,wewillbeusingsmallcapsfor conceptsorcategories,specificallywhentheyarerepresentedbyvarioussynonymousexpressions.Italicsareusedforlexicalforms,anddefinitions,glosses,or

LexicalVariationandChange.DirkGeeraerts,DirkSpeelman,KrisHeylen,MarianaMontes,StefanoDePascale,Karlien Franco,andMichaelLang,OxfordUniversityPress.©DirkGeeraerts,DirkSpeelman,KrisHeylen,MarianaMontes, StefanoDePascale,KarlienFranco,andMichaelLang(2024).DOI:10.1093/oso/9780198890676.003.0001

explanationswillappearwithinquotes.)Buthowuniquearetermslike knickerbockers and knickers?Atleastfor knickers,thereisapolysemytobeconsidered, becauseitmayalsosignify‘underpants’,andthesynonymybetween knickers and knickerbockersdoesnotextendtothissecondsenseofknickers.Asimilarsituation actuallyholdswithregardto trousers:itissynonymouswith pants,butinapolysemoussense, pants issynonymouswiththe‘underwear’readingof knickers.In addition,thereislectalvariationinthedistributionoftheterms.Withoutbeing toodetailedaboutit,wemaynotethattrousersistypicallyBritishEnglishwhereas itssynonym pants (like knickerbockers incomparisonto breeches)isAmerican English,andaccordingly,the‘underwear’senseof pants isnotcommoninAmericanEnglish(likethatof knickers).Termslike typically areimportanthere:the lexicalchoicesareseldomofablack-and-whitenature,butmoreofteninvolve preferentialpatterns.

Thisbriefexample,towhichwewillcomebackinSection1.2,isstructured alongtwobasicdimensions.Thefirstonelinkslinguisticformstoreadings, whereasthesecondonebringsindifferentlanguagevarietiesanddescribeshow theassociationbetweenformandsemanticsdiffersaccordingtothedialect(inthe broadestpossiblesenseoftheterm)underconsideration.Crucially,bothdimensionscanbetraversedintwodirections.Ifyoustartfromalexicalitemand describethesemanticsofhowitisused,youtakea semasiological perspectiveand yourinterestbasicallylieswithpolysemy.Butifyoufocusonsynonymy,youlook fromthesemanticleveltothelevelofforms,describinghowameaningcanbe expressedbyvariouslexicalitems;thisisanonomasiologicalperspective.Thevariationaldimensioncansimilarlybesubjectedtoaperspectivalswitch.Ontheone hand(andthisisthemostcommonview),youcantaketheassociationofforms andmeaningsasaresponsevariableandinvestigatehowthatassociationchanges whenyoucomparedifferentlanguagevarieties.Ontheotherhand,therelationshipbetweenthosevarietiescanbeyourresponsevariable:ifyouaggregateovera largerpartofthevocabularyanditssemasiological/onomasiologicalcharacteristics,whatdoesthattellyouaboutthelanguagevarietiesinwhichthatvocabulary appears?Howclosearethey,andifyoulookovertime,aretheygrowingapartor growingtogether?Thefirstoftheseperspectives,lookingfromvarietiestovariable word-meaningpairs,maybecalled variationist,becauseitsoutlookcorresponds withthatofvariationistlinguisticsasthemajorbranchofsociolinguisticsinitiatedbyLabov’sworkfromthe1960s.Thesecondperspectiveisa lectometric one, becauseitfocusesonmeasuringdistancesamonglects. Lect inthisdefinitionisa covertermforallkindsoflanguagevarieties.IntheterminologyofCoseriu(1981), thisvarietyofvarietiesmaybestructuredalongfourcross-classifyingdimensions: adiatopicone,involvingthedialects,regiolects,chronolects,nationalvarieties, andsoon,usedindifferentpartsandlocationsofalinguisticarea;adiastraticone, involvingsociolectsbelongingtodifferentsocialgroups;adiaphasicone,involvingthedifferencesofstyleandregisterthatshowupindifferentspeechsituations

andcommunicativecontexts;andadiachronicone,involvingthechronological developmentandthehistoricalstagesofalanguage.Lectometryhassofarprimarilybeenanenterprisewithadiatopicperspective,butinaccordancewitha genericconceptionof lect,wethinkofitasageneralizationofthatdialectometric tradition.(Ondialectometry,see Goebl2011, WielingandNerbonne2015,and thediscussioninSection1.3.)

Giventhesetwodimensionsandtheassociatedperspectivalswitches (semasiological-onomasiological,variationist-lectometric),thescopeofourstudy canbedescribedintermsofwhatwewillcallthe lexeme-lection-lecttriangle.Terminologically,lexemesarethelexicalitemsunderinvestigation,andalectionisthe specificreadingwithwhichsuchawordappearsinatext(likewhether,tocome backtotheexample,knickersisusedinan‘underwear’readingora‘breeches’reading).Inthesenseintendedhere, lection isaratheroutdatedphilologicalterm,and weareadmittedlyselectingitlargelyforitsalliteratingqualities.Butthedefinition itreceivesinTheNewShorterOxfordEnglishDictionaryas‘aparticularwayof readingorinterpretingapassage;areadingfoundinaparticularcopyoredition ofatext’,adequatelycaptureswhatisofconcerntoushere,viz.themeaning-incontextofaword,theparticularinterpretationwithwhichitisusedinagiventext passage. Lect,asindicated,isageneraltermforallkindsoflanguagevarieties.

Lexemes,lections,andlectsinteract,andtalkingabouta lexeme-lection-lect triangle providesuswithahandyimagetoschematicallyrepresentthevarious aspectsofthatinteraction—orperhapsmoreprecisely,thecombinationsofthe twoperspectivaldimensionsthatweintroducedabove:seeFigure 1.1.Atthebase ofthetriangle,thedifferencebetweenasemasiologicalandanonomasiological perspectiveisexpressedbythedirectionofthearrowlinkinglexemeandlection.

Figure1.1 Researchperspectiveswithinthelexeme-lection-lecttriangle

Thepanelsontheleft-handsideembodyasemasiologicalperspective:looking fromlexemestotheirreadings.Thepanelsontherightembodytheconverse,onomasiologicalperspective:lookingfromreadingstotheformsthroughwhichthey areexpressed.Orthogonaltothesemasiological/onomasiologicaldimension,the perpendicularlinerepresentstheotherbasicperspective.Inthetoppanels,lectal variationisanexplanatoryvariable:ifyoulookateithersemasiologicaloronomasiologicalvariation,towhatextentisitinfluencedbylectaldiversity?Inthe bottompanels,theperspectiveisreversed,andlectalvariationbecomesaresponse variable:ifyouaggregateovereithersemasiologicaloronomasiologicalvariation, whichlectalstructureemerges?

Thevariouspartsofthepresentmonographtaketheirstartingpointinthese perspectives.PartIII,Semasiologicalandonomasiologicalexplorations,focuseson thetop-leftandthetop-rightapproaches. PartV, Lectometricexplorations,deals withthebottom-rightapproach.Thebottom-leftperspective—semasiological lectometry—willnotfeatureseparatelyinthevolume(butsee Speelmanand Heylen2017 foranexample).Therearetworeasonsfortheomission.First,if youstudyasampleofthevocabularythatislargeenough,thelectalstructure thatemergeswillbethesame,regardlessofwhetheryousumoversemasiological differencesorwhetheryousumoveronomasiologicaldifferences:everysemasiologicaldifferencebetweenlectAandlectBwillalsoshowupifyoustartfromthe onomasiologicalside,andviceversa.Ofcourse,thisisonlyanargumentinprinciple,becausestudyingtheentirevocabularyisnotfeasible.Second,however,there isatraditionincontemporaryvariationistlinguisticstostudylectaldifferences fromaformalpointofview,thatis,toassumethatlinguisticdifferencesbetween dialects,sociolects,andwhathaveyouarebestseeninalternativelectalpreferencesforfunctionallyequivalentformsofexpression.Thisideaiscapturedbythe notionofsociolinguisticvariable.Putsimply,asociolinguisticvariableinthesense ofcontemporarysociolinguistics(see Labov1966)isasetofalternativewaysof expressingthesamelinguisticfunctionorrealizingthesamelinguisticelement, whereeachofthealternativeshassocialsignificance:‘Socialandstylisticvariationpresupposetheoptionofsaying“thesamething”inseveraldifferentways: thatis,thevariantsareidenticalinreferenceortruthvalue,butopposedintheir socialand/orstylisticsignificance’(Labov1972:271).Assuch,asociolinguistic variableisalinguisticelementthatissensitivetoanumberofextralinguisticindependentvariableslikesocialclass,age,sex,geographicallocation,ethnicgroup, orcontextualstyleandregister.Classicalcasesofsociolinguisticvariablesinvolve pronunciation.Pronouncingthet inbutter asaglottalstopisindicativeofaCockneyaccent,justlikeafullpronunciationofthe n in chemin istypicalofsouthern FrenchincontrastwithstandardFrench.Exampleslikethesehadbeenstudiedfor alongtimeintraditionaldialectology,butmodernsociolinguisticsasitemerged inthe1960senlargedthescopeofinvestigationbeyondthetraditionaldiatopic dialectstootherlects.Ifyouapplytheconceptofasociolinguisticvariabletothe

lexicon,youinevitablyreachanonomasiologicalperspective,becauseonomasiology(andmorespecifically,formalonomasiology)preciselyinvolvesalternative lexicalexpressionsforthesamesense.

Twomorethingsneedtobesaidaboutthewaywewillcovertheterrainoutlined above.Inthefirstplace,thesubsequentpartsofthetextbuildoneachother.PartI, Theoreticalpreliminaries,laysthegroundwork.PartsIIandIIIthenfocusonthe semasiologicalandonomasiologicalperspectivesthatbelongtotheupperlayerof Figure 1.1,whereas PartsIV andVtakealectometricpointofviewasinthelower layerofthefigure.Ineachofthesetwosets,thefirstpartisdevotedtomethodologicalissueswhilethesecondillustratesthemethodologywithcasestudies.Thus PartII,Distributionalmethodology,introducestheparticularsofthedistributional semanticworkflow,togetherwiththevisualizationtoolthatwewillusetoexplore itsoutcome. PartIII, Semasiologicalandonomasiologicalexplorations,putsthis explorationintopractice.Itexamineshowfaradistributionalapproachcantake usonthepathofsemanticanalysis(asweshallsee,thereareanumberofrestrictionsondistributionalinformationthatwillmakeusadoptacertainamountof cautionforthefurthersteps)andappliesthedistributionalmethodtotheinterplay ofsemasiologyandonomasiologyinlexicalsemanticchange. PartIV, Lectometricmethodology,introducesthevariousstepsinalectometricworkflow:howto determinetherelevantsetsofalternatingexpressionsandthecontextsinwhich theyalternateasequivalents(whatsociolinguisticsreferstoastheenvelopeofvariation),andhowtofeedthedistributionofthecompetingexpressionswithinthe envelopesintoacalculationoflectometricdistances. PartV, Lectometricexplorations,illustratesthisworkflow.Overallthen,thestructureofthetextembodiesa gradualbuild-up.ItisnotjustthatthechaptersinPartIIsmooththewayforthose in PartIII,andthosein PartIV for PartV,but(totheextentthatidentifyinglexicalsociolinguisticvariablesrequiresasemanticanalysis)PartsIIandIIItogether alsopreparethegroundforPartsIVandV.

Inthesecondplace,thedegreetowhichwewillcovertheperspectivallydefined domainsschematicallyrepresentedinFigure 1.1 willbynomeansbecomplete, evenapartfromtheabsenceofasemasiologicallectometricapproach.Ourpurposeistodefine,illustrate,andexplorearesearchprogramme,nottotreatit exhaustively—ifthatwouldbepossibleatall.Throughoutthechapters,wewill explicitlypointtoopenissuesandpossibilitiesforfurtherinvestigation.

Inthefollowingtwosectionsofthepresentchapter,wewilllookmoredeeply intothetwodimensionsandtheassociatedquestionsthatshapethestructureof thebookandthataregraphicallysummarizedinFigure 1.1.Alongthesemasiology/onomasiologydimension,Section1.2willconsiderthestatusofavectorspace approachfromthepointofviewofsemanticandconceptualanalysis.Alongthe variationist/lectometricdimension,Section1.3detailswhatitimpliestotreatlexicalvariationasasociolinguisticvariableintheLaboviansenseandtousethat variationasthebasisforlexicallectometry.

Turn static files into dynamic content formats.

Create a flipbook