Issuu

Automatic radiology report generator using transformer with contrastbased image enhancement

R.Lenitha 1 , L. Kebila Anns Subi 2 , D.Deiva Thaya Sweetlin 3

1PG Student, M.E Communication and networking, Ponjesly college of engineering, Kanyakumari, Tamil Nadu, India.

2Professor M.E Communication and networking, Ponjesly college of engineering, Kanyakumari, Tamil Nadu, India.

3 Assistant Professor M.E Communication and networking, Ponjesly college of engineering, Kanyakumari, Tamil Nadu, India.***

Abstract It takes a lot of time and requires the knowledge of experienced radiologists to write radiology reports based on radiographic pictures. Therefore, it would be beneficial to incorporate technology that can generate reports automatically. The primary difficulty with automatic report generating is creating a logical predictive text. Techniques that can improve the relevance of features in generating predictive text must be developed. This study used the transformer approach and image enhancement technology to build a model for generating medical reports. An methodology to improve the medical image's noiseproneness is investigated in this study, along with the transformers way to produce a radiologist report based on chest X-ray images, in order to take advantage of the visual and semantic elements. The impact of image improvement techniques on the radiology report generator was examined using four contrast-based image enhancement approaches. The encoder-decoder model is employed with a pre-trained model ChexNet and Multi-Head Attention (MHA) mechanism for visual feature extraction and Bidirectional Encoder Representation from Transformer (BERT) for text feature embedding. With a 0.412 value, MHA outperforms the baseline model by 15% as well. This approach can perform better than the baseline model and other earlier studies. It can be said that BERT and the transformer MHA encoder layer work well for utilizing textual and visual information. Furthermore, it has been discovered that incorporating an image enhancement technique improves the model's performance.

Key Words: Bidirectional Encoder Representation from Transformer (BERT). Multi-Head Attention (MHA)

INTRODUCTION

Asmedical imagingtechnologyadvances,medical image diagnostics become increasingly complex, necessitating

the necessity for medical professionals. According to data published in the Medical Journal of Radiology in 2015, radiologists' workload increased by 26% over the preceding ten years. [1] Due to medical advancements, radiologists now have to compare a lot more information, including many factors, in order to make comprehensive and reliable diagnosis based on medical images.Thegoalofthisprojectistocreateasystemthat employscutting-edgenaturallanguageprocessing(NLP) models, specifically transformers, in conjunction with image enhancement techniques to automatically generate radiology reports from medical pictures, such as X-rays, scans, or MRIs. [2]. Its goal is to make the process of creating radiology reports from medical imagesasefficientaspossible.Themethodimprovesthe visibility of important anatomical features and abnormalitiesinmedicalimagingincludingX-rays,MRIs, and CT scans by employing sophisticated contrast enhancement techniques. Following image processing, a transformer-based natural language model is used to evaluate the improved images' attributes and produce precise,thoroughradiologistreports.[3]Sectionslikeas patient information, an evaluation of the image quality, comprehensive findings, and a conclusion with suggestions are usually included in these reports. Structured, context-aware medical reports in natural language can be produced by integrating a transformer model, such GPT or MedBERT. By automating regular report generation, this method not only saves radiologists time but also guarantees full reports, lowering the possibility of oversight. Additionally, in both developed and underprivileged locations, the system could improve accessibility to timely medical interpretations, help healthcare providers, and increase the efficiency of diagnostics. [4] Enhance medical image quality to make pertinent information (such tumors, fractures, orotheranomalies)easiertoseeandidentify. To improve image features and highlight important

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net

structures, use contrast enhancement techniques like adaptive contrast enhancement and histogram equalization.Thisisparticularlyhelpfulwhenphotosare taken in less-than-ideal conditions or when certain details are difficult to see. [5] To provide more individualized and precise results, make sure the generated reports are context-aware, which means they consider notjust the resultsbutalsotheclinical context supplied in the image (such as the location of the abnormality, the patient's age, and gender) [6]. Using medical picture data, automatically produce thorough and contextually accurate radiology reports by utilizing transformer-based natural language processing (NLP) models. Include recommendations for additional imaging, testing, or follow-up appointments in the preparedreport,alongwithactionableinsightsandnext steps for radiologists [7]. Make the system flexible enough to accommodate various radiology imaging modalities(e.g., brainMRIs,abdomenCTscans,orchest X-rays) as well as the various report formats and language used in various medical specialties. Increase the availability of high-quality medical reports and interpretations in underserved or isolated areas where radiologists may be hard to find. Improve the interpretabilityofreportsproducedbyAIandmakesure that medical experts may examine and validate them before using them to make therapeutic decisions [8]. In order to make an appropriate diagnosis, medical image quality is essential. Important details may be obscured by poor image quality, which could cause misunderstandingsandpostponetreatment.Thisproject makes sure that even difficult images (such as lowcontrast scans or scans with artifacts) are enhanced for greater visibility of anomalies including cancers, fractures, and lesions by implementing contrast-based image enhancement algorithms. More accurate diagnostic results are a direct result of this improvement. Despite their experience, radiology reports are typically written by human specialists, human error and variability in interpretation can occur [9]

p-ISSN:2395-0072

2.METHODOLOGIES

Radiology reports are produced from medical pictures utilizing a structured procedure by the Automatic Radiology Report Generator employing Transformer with Contrast-Based Image Enhancement technology. Using transformer-based natural language processing, the system creates written reports based on the improved photos after integrating image enhancement algorithmstoincreaseimagequality.Amedicalimageis firstfedintothesystem.Thisimagemaybeinoneofthe regularly used radiology diagnostic formats, such as Xray, CT scan, or MRI. Preprocessing is the process of applying contrast-based image enhancement algorithms to the input image. This can use techniques such as Adaptive Histogram Equalization (AHE) or Histogram Equalization. Enhancing the visibility of critical features in the image like lesions, tumors, or abnormalities that may be challenging to spot in low-contrast pictures is the aim. The model finds it easier to extract pertinent elements from enhanced photos. Important visual aspects that will be used to explain the findings in the radiology report are identified by the CNN. These characteristics are essential for making the diagnosis. After that, the image's features are mapped to the appropriate textual descriptions. A transformer model that has been trained is used in this step. The retrieved features and their mapped descriptions are used by the transformer model to create a textual radiological report.

2.1 Dataset Collection

The suggested approach makes use of the ChestX-ray dataset. When creating autonomous radiology report generating systems, particularly those that use sophisticated models like transformers with contrastbased image augmentation, the ChestX-ray dataset is

Fig 1: Block diagram

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net

essential. Models are trained using ChestX-ray datasets, whichprovideimagepairingstogetherwiththeradiologist reports that go with them. These datasets aid in the development of models that are able to link X-ray visual patterns to diagnostic and medical terminology. Duplicationdataforcomparablediagnosesisminimizedin normal diagnoses, and minor diagnoses are collected in normal and abnormal to lessen imbalances. Prior to being implanted, the diagnoses, which are medical reports, undergo pre-processing. Word deconstruction, character and number removal, and letter conversion to lowercase are all part of the medical report's pre-processing. After that, it is trained using BERT to extract features from text data and obtain text embedding. Additionally, the medical report is filtered according to the data's occurrence. The data was divided into training, validation, and test data according to the percentage of occurrences. The highest frequency group is undersampled, and the lowest frequencygroupis oversampled,according to thenumber of recorded occurrences used to segment the data. The percentage of each minor and major data is then used to splitthedataintotrain,validation,andtestdata.

2.2 Image Enhancement Method

To increase the communication range and dependability, putinplacea relay-basednetwork.RelayNodePlacement and Selection: Based on signal strength, distance, or energy, this algorithm establishes the best relay node placement and selection criteria. Relay Handover Mechanism: Maintains the strongest communication link by controlling relay switching. An essential part of improving underwater wireless optical communication (UWOC)systems'communicationrangeanddependability is the relay network module. This module ensures strong and long-lasting underwater communication lines by reducing signal attenuation and scattering effects by adding relay nodes to the communication network. use algorithms to find the best location for relay nodes, guaranteeing maximum coverage and little signal loss. To guarantee effective network topology, variables including water turbidity, node distance, and environmental factors are taken into account. uses variables such as distance measurements, energy efficiency, and signal intensity to dynamically choose the best relay node. Link quality monitoring in real time aids in network adaptation to shiftingcircumstances.

optimizes the number and placement of relay nodes to reduceenergyconsumption.Thisisparticularlyimportant for underwater gadgets that run on batteries. Continuous datatransmissionisensuredbyseamlessswitching,which controls relay node changeover without interfering with

p-ISSN:2395-0072

thecommunicationchannel.Handovertothepathwiththe bestlinkqualityiscarriedoutbyutilizingachannel-aware algorithm that assesses the quality of alternative relay paths.Loadbalancing:Toavoidoverloadingandguarantee steadynetworkperformance,data trafficisdividedacross several relay nodes. By linking several relay nodes, multihop communication allows for multi-hop data transmission and greatly increases the communication range. The network is adaptable for bigger underwater deployments thanks to its scalable design, which can accommodate the addition of more relay nodes. ClusterBased Relay Networks: Enhance network efficiency and structure by implementing clustering techniques in which nodesinaparticularregionconnectwithoneanotherviaa specifiedclusterheadrelay.LatencyReduction:Minimizes the number of hops needed for data transmission and optimizesrelayplacementtoreducecommunicationdelay.

2.3 Image Encoder

Data from 14 chest X-ray datasets were used to train ChexNet, a 121-layer dense convolutional network (DenseNet). DenseNet can be used to optimize existing networks by enhancing gradients and information across the network. More X-ray pictures were used to train the DenseNet model on top of the pre-trained ChexNet. ChexNet is used as an encoder in this study to extract picture attributes as convolutional features from the data. Through the model's last unfrozen layer, the pretrained ChexNet will send the weight to the image. We eliminate thetoplayeronChexNetinordertomodifytheoutputinto 1024 as a dimensional picture beginning features value. A batch size of 100, a dropout rate of 0.2, a learning rate of 10-2, and sigmoid activation were the parameters used in thisinvestigation. After that,the outputvectorsizeof512 andReLUactivationwillbeusedtoforwardthepoolingof the initial weight initiation. In order to obtain context information and text features in the multi-head attention, thisdimensionwillbemodifiedviaBERTembedding.

2.4 Multi-Head Attention

In order to align the data with the semantic features as a queryinaparallelprocess,theencoder'sinitialweightwill be sent as an input to multi-head attention as a key and value. Technically speaking, attention is the process of mappinga querytothekeysfromdata thathasa valueas an output. It is made up of three parts: value, key, and query. Multiple single attentions that function frequently and concurrently are referred to as multi-head attention. Themulti-headattentionfunctioncanberepresentedas multihead(k,q,v)=concat(head1..n)W

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net

Inthiscase,k,q,andvstandforthereportembeddingand image features. Mapping a query using key-value pairs to obtain a new value is known as attention. in which the values, keys, and query are all vectors. The compatibility function between the query and keys determines the weights given to each value. The sum of the weighted valuesisusedtocalculatethefinal output.Thismethodis particularly effective at correlating textual and picture information to provide context. The context vector, which is then supplied as input to the decoder, is the attention weightthatisproduced.

2.5 Bert Embedding

The pre-trained BERT embedding is used to extract the semantic characteristics. Google released a training methodcalledBidirectionalEncoderRepresentationsfrom Transformers(BERT),whichisbasedon neural networks. A whole set of words from a sentence or inquiry can be usedbyBERTtotrainalanguagemodelinbothdirections. BERT enables language models to examine a word's context by considering the words that surround it, rather than only the ones that come before or after it. BERT examines the contextual relationships between words in the medical report by using an attention mechanism duringthedataencodingphase.Tounderstandtheword's context,theencoderwillreadtheentireword.

2.6 Evaluation

Themodel'sabilitytodescribetheimagewillbeevaluated using the Bilingual Evaluation Understudy (BLEU). The BLEU technique methodology counts word occurrences in the model-generated text using real data as a guide. The ngram calculation is the process of figuring out how well thewordsgeneratedbythemodelmatchtherealdata.The value of the sentence's n-gram words will be computed in order to compare the expected text quality value with the referencedata.

p-ISSN:2395-0072

Together,the precisionvalue ��of theanticipatedtext,the weight given to words ��, and the selected gram value �� helptocalculatebloodpressureinasubtlemanner.When comparing candidate sentences to reference sentences, it is common practice to assign candidate sentences a gram value of 4. This indicates that the four words in the forecast are analyzed to determine how closely they resemble the reference sentence. The calculation of the shortness penalty is made more complex by the gram value, represented by the letter ��, which is crucial in establishing the scope of the comparison. Fundamentally, then-gramvalueturnsintoacrucialfactorinthisanalysis, directing the matching procedure and influencing the accuracyevaluationofthepredictedtextincomparisonto the reference text. This thorough comprehension of the relationship between precision, brevity penalty, and gram values provides a thorough awareness of the complexities involved in evaluating the accuracy and concision of predictedtextinnaturallanguageprocessingtasks.

RESULT & DISCUSSION

:Precisionofpredictedtextfor��n-grams. :Weightassignedtothe��n-gramword. N:Gramvalueused(typically4forcandidatesentences). BP: Brevity penalty (BP penalizes errors in predicted text length).

In contrast, BP is a shortness penalty for predicted text faults and is a critical component that is carefully crafted to assess and penalize anticipated text length errors.

Fig.2. Histogram Equalization

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net p-ISSN:2395-0072

CONCLUSION

Anautomatedradiologicaldiagnosisgeneratorthatmakes use of a transformer architecture's transforming powers. Our main goal is to clarify the complex relationships among model design, evaluation measures, and image improvementproceduresinthecontextofmedicalpicture captioning.ThesuggestedapproachusesBERTembedding to extract complex text features and integrates a MultiHead Attention (MHA) mechanism. The investigation of

different enhancement procedures and their effects on radiological pictures was then carried out in order to improve the model's overall effectiveness. The Gamma Correction technique proved to be the most successful of thefourenhancementtechniquesused,allofwhichshown performance gains over the original photos. This sophisticated comprehension of picture improvement methods and how they affect enhancing model performance. Contrast noise is a problem that frequently affects radiography images and has a big impact on the evaluation results of generated captions. Our results demonstrate how important it is to improve visual contrast in order to allow the model to extract richer and more contextually relevant data. This subtle realization is in line with current debates about how crucial preprocessing procedures are to maximizing model performanceformedicalimageanalysis.

REFERENCES

[1] L. I. T. Lee, S. Kanthasamy, R. S. Ayyalaraju and R. Ganatra, "The Current State of Artificial Intelligence in MedicalImagingandNuclearMedicine,"BritishInstituteof RadiologyJournal,vol.1,no.1,2019.

[2] J. Ker, L. Wang, J. Rao and T. Lim, "Deep Learning Applications in Medical Image Analysis," IEEE Special SectionOnSoftComputingTechniquesForImageAnalysis In The Medical Industry Current Trends, Challenges And Solution,pp.9375-9389,2018.

[3] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S.B. A.Kashem,M.T.Islam, S.A.Maadeed,S. M. Zughaier, M. S. Khan and M. E. Chowdhury, "Exploring the effect of image enhancement techniques on COVID-19," ComputersinBiologyandMedicine,vol.132,2021.

[4]H.Tsaniya,C.FatichahandN.Suciati,"ExposureFusion Framework in Deep Learning-Based Radiology Report Generator," IPTEK Journal of Science and Technology, vol. 33,2022.

[5]A.Vaswani,N.Shazeer,N.Parmar,J.Uszkoreit,L.Jones, A. N. Gomez, L. Kaiser and L. Polosukhin, "Attention Is All YouNeed,"NeuralInformationProcessingSystem,2017.

[6]P.Rajpurkar,J.Irvin,K.Zhu,B.Yang,H.Mehta,T.Duan, D.Ding,A.Bagul,C.Langlotz,K.Shpanskaya,M.P.Lungren and A. Y. Ng, "Chexnet: radiologistlevel pneumonia detectiononchestx-rayswithdeeplearning,"StanfordML GroupProject:CheXNet,2017.

Fig 3: CLAHE

Fig 4: EFF Enhanced Image

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net p-ISSN:2395-0072

[7] X. Wang, Y. Peng, L. Lu and Z. Lu, "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu,HI,USA,2017.

[8] M. Ivašić-Kos and I. Hrga, "Deep Image Captioning: An Overview," in International Convention on Information and Communication Technology, Electronics and Microelectronics(MIPRO),2019.

[9] Adriyendi, "A Rapid Review of Image Captioning," Journal of Information Technology and Computer Science, vol.6,pp.158-169,2021.

[10] Z. Khong, Y. Cui, Z. Xia and H. Lv, "Convolution and Long Short-Term Memory Hybrid Deep Neural Networks for Remaining Useful Life Prognostics," Applied Science, vol. 9, no. Machine Fault Diagnostics and Prognostics, 2019.