Skip to main content

A Credibility Scoring Model for News Authenticity using SBERT/Logistic Regression

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

A Credibility Scoring Model for News Authenticity using SBERT/Logistic Regression

Dr. C. P. Divate1 , Mr. S. M. Patil2 , Shreyash Potdar3 , Kedar Potdar4 , Tanmay Lad5 , Nilesh Lokhande6 , Raj Patil7

1Dean, Dept of Computer Engineering, Shri Ambabai Talim Sanstha’s Sanjay Bhokare Group of Institute Miraj(poly), Maharashtra, India

2Lecturer, Dept of Computer Engineering, Shri Ambabai Talim Sanstha’s Sanjay Bhokare Group of Institute Miraj(poly), Maharashtra, India

3,4,5,6,7Student Dept of Computer Engineering, Shri Ambabai Talim Sanstha’s Sanjay Bhokare Group of Institute Miraj(poly), Maharashtra, India ***

Abstract - The rapid spread of fake news on digital platforms has made it difficult to trust online information. Manual verification of news is slow and unreliable. This project presents A Credibility Scoring Model for News Authenticity using SBERT and Logistic Regression to automatically detect fake news. SBERT is used to capture the semantic meaning of news text, while Logistic Regression classifies the content as real or fake. The system also provides a credibility score showing prediction confidence. The model delivers accurate results and is integrated into an Android application for easy and real-time news verification.

Key Words: Fake News Detection, News Authenticity, Credibility Scoring, Sentence-BERT (SBERT), Logistic Regression, Natural Language Processing, Machine Learning, Text Classification

1. INTRODUCTION

Through websites, social media, and mobile applications. While this makes information easily accessible, it also increases the risk of fake and misleading news reaching a large audience. False information can create confusion, influencepublicopinionincorrectly,andsometimesleadto serious social and economic consequences. As a result, verifyingtheauthenticityofnewshasbecomeanimportant challenge.

Traditionalmethodsoffact-checkingrelyheavilyonhuman effort,whichisslowandcannothandlethemassiveamount of online content generated every day. To overcome this problem,automatedsystemsbasedonmachinelearningand naturallanguageprocessingarebeingwidelyexplored.These systems can analyze news content and determine its credibilitymoreefficiently.

A Credibility Scoring Model for News Authenticity using SBERT and Logistic Regression, focuses on automatically identifyingwhetheranewsarticleisrealorfake.

2. PROBLEM STATEMENT

The rapid expansion of digital news platforms and social media has made information easily accessible to users. However, this growth has also led to the widespread circulation of fake and misleading news. Such content can influence public opinion, create panic, and spread misinformationatalargescale.Manualverificationofnews authenticity is time-consuming and impractical due to the massivevolumeofonlinecontent.

Existingfakenewsdetectionsystemsoftenrelyonkeywordbasedorshallowtextanalysistechniques.Theseapproaches failtounderstandthecontextualmeaningofnewsarticles, resulting in inaccurate predictions. Additionally, many systemsonlyclassifynewsasrealorfakewithoutproviding anyconfidencelevel,makingitdifficultforuserstojudgethe reliability of the result. Therefore, there is a need for an intelligent, accurate, and user-friendly system that can evaluatenewscredibilityeffectively.

3. PROPOSED SOLUTION

Toovercometheidentifiedchallenges,thisprojectproposesa credibility scoring model for news authenticity using Sentence-BERT(SBERT)andLogisticRegression.SBERTis used to convert news text into meaningful semantic embeddings that capture contextual information. These embeddingsarethenclassifiedusingLogisticRegressionto determinewhetherthenewsisrealorfake.

In additionto binaryclassification,the system generates a credibilityscorebasedonpredictionprobability.Thisscore helpsusersunderstandhowconfidentthesystemisabout theauthenticityofthenews.Thetrainedmodelisintegrated into an Android application, enabling real-time news verification through a simple and intuitive interface. The proposedsolutioniscomputationallyefficient,scalable,and suitableforpracticaldeployment,makingitaneffectivetool forcombatingmisinformationindigitalmedia.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

4. LITERATURE REVIEW

[1] Shu et al. (2017): Studied the problem of fake news detection and highlighted how misinformation spreads rapidlythroughonlineplatforms.Theirworkemphasizedthe needforautomatedsystemstoverifynewscredibilityusing machinelearningtechniquesinsteadofmanualfact-checking.

[2] Castillo et al. (2011): Analyzedthecredibilityofonline newscontentusingtraditionalmachinelearningmodelssuch as Decision Trees and Support Vector Machines. Although these modelsshowed reasonable performance,they relied mainly on surface-level features and lacked contextual understandingoftext.

[3] Devlin et al. (2018): Introduced BERT (Bidirectional Encoder Representations from Transformers), which significantly improved natural language understanding by capturingdeepcontextualrelationshipsbetweenwords.This modeldemonstratedbetterperformancecomparedtoearlier wordembeddingtechniques.

[4] Reimers and Gurevych (2019): Proposed SentenceBERT(SBERT),anextensionofBERTdesignedtogenerate meaningfulsentenceembeddingsefficiently.SBERTreduced computational complexity and proved effective for text classificationandsimilaritytasks.

[5] Hosmer et al. (2013): Discussed the effectiveness of Logistic Regression for binary classification problems, highlighting its simplicity, interpretability, and consistent performance.Duetotheseadvantages,LogisticRegression remainswidelyusedinreal-worldapplications.

Research Objectives:

1. Tostudytheimpactoffakenewsondigital media platforms and understand the challenges in identifyingnewsauthenticity.

2. To design and develop a machine learning–based model for detecting fake and real news automatically.

3. To use Sentence-BERT (SBERT) for extracting meaningfulsemanticfeaturesfromnewstext.

4. ToimplementLogisticRegressionforefficientand accurateclassificationofnewscontent.

5. To generate a credibility score that reflects the confidencelevelofthenewsauthenticityprediction.

6. Toevaluatetheperformanceoftheproposedmodel usingstandardaccuracyandclassificationmetrics.

5. METHODOLOGY

“A Credibility Scoring Model for News Authenticity using SBERT and Logistic Regression” is designed in six major modules.Eachmodulefocusesonaspecificfunction,from data collection to final deployment, ensuring a structured andefficientapproachtodetectingfakenewsandgenerating credibilityscores.

MODULE -1.Data Collection and Preprocessing: Thefirstmoduleiscriticalasthequalityofinputdatadirectly affectstheperformanceoftheentiresystem.

Core Functions:

 Collect a comprehensive dataset of real and fake newsarticlesfrommultiplesourcesincludingnews websites,andverifiedsocialmediafeeds.

 Ensure the dataset represents various categories and types of news, such as politics, sports, health, andtechnology,tomakethemodelrobust.

Key Functionalities:

 Data Cleaning: Removal of unwanted characters, symbols, HTML tags, and punctuation that may interferewithanalysis.

 Normalization: Convert all text to lowercase to maintainuniformityacrossthedataset.

 Noise Removal: Remove stopwords, numbers, redundant spaces, and irrelevant words to retain onlymeaningfulcontent.

 Tokenization:Breaksentencesintoindividualwords ortokenstoprepareforembeddinggeneration.

Major Components:

 Raw data files (CSV, JSON, or TXT formats) containinglabeledrealandfakenewsarticles.

 Preprocessing scripts developed in Python using librarieslikeNLTK,SpaCy,orPandas.

 A clean, structured dataset ready for feature extraction,ensuringhighermodelaccuracy.

MODULE -2.Feature Extraction using Sentence-BERT (SBERT):

Thismoduleconvertsthepreprocessedtextintonumerical vectorsthatamachinelearningmodelcaninterpret.

Core Functions:

 Transform textual news data into meaningful numericalrepresentationsthatretainthesemantic meaning.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

 Capturecontextualinformationfromeachsentence, allowing the model to understand nuances that indicatefakenews.

Key Functionalities:

 Sentence Embedding Generation: SBERT converts each news article or sentence into a fixed-length densevector.

 Semantic Context Understanding: Embeddings capture the relationship between words and the overallmeaningofsentences.

 Dimensionality Reduction: SBERT embeddings providecompactandefficientrepresentationoftext whileretainingimportantsemanticinformation.

 Similarity Measurement: Enables comparison betweennewsarticlestodetectpatternsindicative offakecontent.

Major Components:

 Pretrained SBERT model loaded through Python librarieslikesentence-transformers.

 Embeddinggenerationscripts forconvertingeach articleintoavector.

 Feature matrix storing embeddings for all news articles,usedasinputfortheclassifier.

MODULE -3.Model Training and Optimization using Logistic Regression: This module focuses on training the machine learning classifiertodistinguishrealandfakenews.

Core Functions:

 Learn patterns and relationships from the SBERT embeddingstoclassifynewsaccurately.

 Optimize the classifier to maximize accuracy and reliabilitywhilekeepingitinterpretable

Key Functionalities:

 TrainingtheModel:LogisticRegressionistrainedon the feature vectors derived from SBERT embeddings.

 HyperparameterTuning:Adjustparameterssuchas regularizationstrengthandsolvertypetoimprove modelperformance.

 Validation: Test the model on unseen data to evaluategeneralizationandavoidoverfitting.

 Prediction Generation: Provide binary outputs, labelingeachnewsarticleasrealorfake.

Major Components:

 Logistic Regression Classifier implemented using Python’sscikit-learnlibrary.

 Trainingandtestingscriptsforprocessingfeature matricesandevaluatingresults.

 Saved trained model for deployment in real-time applications.

MODULE -4.Credibility Scoring and Prediction:

Thismoduleaddstransparencytothesystembyprovidinga confidencemeasurealongwithclassificationresults.

Core Functions:

 Generateacredibilityscoretoindicatethelikelihood thatthenewsisreal.

 Provide users with insights into the confidence of eachprediction,enhancingtrustinthesystem.

Key Functionalities:

 ProbabilityCalculation:Usetheoutputprobability from Logistic Regression to derive the credibility score.

 Interpretation of Scores: Higher values indicate higher confidence in the authenticity of the news, whereas lower scores indicate potential misinformation.

 IntegrationwithPrediction:Bothclassificationand credibilityscoresarereturnedtogethertotheuser.

 ThresholdManagement:Setthresholdsforrealvs. fake classification based on probability scores to reducemisclassification.

Major Components:

 Prediction module that combines Logistic Regressionoutputwithscorecalculationlogic.

 Scriptsformappingprobabilityoutputstoahumanreadablecredibilityscore(e.g.,0–100%).

 Data structures to store predicted labels and associatedscores.

MODULE-5.SystemDeploymentandAndroidApplication Integration:

Thismoduleensuresthatthetrainedmodelisaccessibleto usersinreal-timethroughapracticalinterface.

Core Functions:

 DeploythetrainedmodelusingaRESTAPI,allowing external applications to access prediction and credibilityscoring.

 IntegratethemodelintoanAndroidapplicationfor easyuserinteraction.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

Key Functionalities:

 API Development: Develop endpoints that accept newstextasinputandreturnpredictionresultsand credibilityscores.

 Android App Interface: Provide a simple interface foruserstoenternewscontentandviewresults.

 Real-TimePrediction:Ensurelow-latencyresponses sousersgetimmediatefeedback.

 ErrorHandling:Manageinvalidinputsandsystem errorsgracefully.

Major Components:

 REST API built using Python frameworks like FastAPIorFlask.

 Android application coded in Java/Kotlin with frontendinterfaceforuserinput.

 Integrationscriptstoconnecttheappwithbackend APIendpoints.

MODULE -6.Testing, Evaluation, and Maintenance: The final module ensures that the system performs accurately,reliably,andconsistentlyovertime.

Core Functions:

 Test the system on unseen data to validate performance.

 Monitoraccuracyandreliabilityovertimeandadapt thesystemfornewnewstrends.

Key Functionalities:

 Accuracy Evaluation: Measure performance using metricslikeaccuracy,precision,recall,andF1-score.

 UsabilityTesting:EnsuretheAndroidappisintuitive andprovidesclearresults.

 Maintenance: Regularly update the model and datasettohandleevolvingfakenewspatterns.

 Error Analysis: Identify cases where the model misclassifiesandrefinepreprocessing,embeddings, orclassifierparameters.

Major Components:

 Evaluationscriptsforperformancemetrics.

 User feedback loop integrated into the app for reportingerrors.

 Maintenance plan for updating datasets and retrainingthemodelperiodically.

6. RESULTS AND DISCUSSION

Theproposedcredibilityscoringmodelwasevaluatedusing alabeleddatasetofrealandfakenewsarticles.Thedataset

was divided into training and testing sets to measure the model’s performance on unseen data. The results demonstratethattheintegrationofSentence-BERT(SBERT) withLogisticRegressionprovidesreliableandmeaningful newsauthenticityclassification.

Key Results:

 The model achieved good classification accuracy, showing its ability to effectively distinguish betweenrealandfakenewsarticles.

 Precisionandrecallvaluesindicatethatthesystem correctly identifies fake news while minimizing falsepredictions.

 TheuseofSBERTsignificantlyimprovedcontextual understanding compared to traditional keywordbasedapproaches.

 Logistic Regression ensured fast and stable predictions, making the system suitable for realtimeusage.

Credibility Score Analysis:

 Along with classification, the system generates a credibilityscorebasedonpredictionprobability.

 Newsarticlesclassifiedasrealgenerallyproduced higher credibility scores, indicating strong confidence.

 Fake news articles resulted in lower credibility scores,helpingusersidentifyunreliablecontent.

 Thisscoringmechanismimprovestransparencyand enhancesusertrustinthesystem.

Discussion:

 SBERT enables deep semantic understanding of newscontent,allowingthemodeltodetectsubtle differencesinmeaning.

 LogisticRegressionprovidesaninterpretableand efficientclassificationframework.

 The combination of these techniques offers a balanced solution between performance and computationalefficiency.

 Integration with an Android application demonstrates the practical applicability of the systeminreal-worldscenarios.

7. CONCLUSION

Thisprojectpresentedacredibilityscoringmodelfornews authenticity using Sentence-BERT (SBERT) and Logistic Regression.Theproposedsystemeffectivelyaddressesthe challenge of fake news detection by understanding the semanticmeaningofnewscontentratherthanrelyingonly onkeyword-basedanalysis.SBERTenablesaccuratefeature extraction by capturing contextual information, while

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

Logistic Regression provides efficient and interpretable classificationresults.

Thesystemnotonlyclassifiesnewsasrealorfakebutalso generates a credibility score that indicates the confidence level of the prediction. This improves transparency and helps users make informed decisions while consuming onlinenews.Theintegrationofthetrainedmodel with an Android application makes the solution practical and accessibleforreal-timeuse.

8. REFERENCES

[1]S.Shu,A.Sliva,S.Wang,J.Tang,andH.Liu,“FakeNews DetectiononSocialMedia:ADataMiningPerspective,” IEEE Intelligent Systems,vol.32,no.1,pp.22–36,2017.

[2] C. Castillo, M. Mendoza, and B. Poblete, “Information Credibility on Twitter,” in Proceedings of the 20th InternationalWorldWide WebConference(WWW),2011,pp. 675–684.

[3]J.Devlin,M.Chang,K.Lee,andK.Toutanova,“BERT:Pretraining of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.

[4] N. Reimers and I. Gurevich, “Sentence-BERT: Sentence EmbeddingsusingSiameseBERT-Networks,”in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP),2019.

[5]D.W.Hosmer,S.Lemeshow,andR.X.Sturdivant, Applied Logistic Regression,3rded.,Wiley,NewYork,2013.

[6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” in Proceedings of the International Conference on Learning Representations (ICLR),2013.

[7] A. Thakur, “Fake News Detection using Machine Learning,” International Journal of Engineering Research & Technology (IJERT),vol.9,no.6,pp.45–49,2020.

Turn static files into dynamic content formats.

Create a flipbook