STROT: Stealthy Tool for Root Oriented Tunneling - A Red Teaming Tool by IRJET Journal

STROT: Stealthy Tool for Root Oriented Tunneling - A Red Teaming Tool

Pratik S. Pawar1, Shubham P. Sakhare2, Vishnu L. Nair3, Vishal G. Puranik4

1,2,3UG Student, Department of Computer Engineering, Savitribai Phule Pune University, Pune, India

4Head, Dept. of Computer Engineering, Parvatibai Genba Moze College of Engineering, Wagholi, Pune, India ***

Abstract - In the evolving landscape of cybersecurity, red teamers face increasing challenges in efficiently identifying and exploiting vulnerabilities. Traditional methodologies require manual effort and multiple tools for vulnerability scanning and exploitation, leading to inefficiencies and increased detection risks. To address these challenges, we present STROT (Stealthy Tool for Root Oriented Tunneling), an advanced cybersecurity tool designed to automate and optimize the red teaming process. STROT integrates a novel combination of network analysis, artificial intelligencedriven intelligence processing, and an advanced attack engine. The framework utilizes Deep Q Learning (DQL) to determine the optimal exploit for a given vulnerability, enhancing both speed and accuracy. Our research highlights the efficiency of STROT in real-world scenarios, demonstrating its ability to streamline penetration testing while maintaining stealth. The findings indicate that STROT significantly reduces the time required to gain root access compared to traditional methods, making it a robust and scalable solution for red teams.

Key Words: Artificial Intelligence, Cyber Attack Simulation, Cyber Security, Deep Learning, Deep Reinforcement Learning, Deep-Q-Network, Red Team, VulnerabilityAssessment

1. INTRODUCTION

Cybersecurity threats have evolved significantly over the past decade, with adversaries employing increasingly sophisticated attack techniques to compromise networks, applications, and infrastructure. Organizations rely on penetration testing and red teaming exercises to assess the resilience of their security postures. Red teaming involves simulating real-world cyber-attacks to identify and exploit vulnerabilities before malicious actors do. However, the traditional red teaming process is timeintensive,requiringskilledprofessionalstomanuallyscan for vulnerabilities, identify potential exploits, and execute them to gain system access. The complexity of modern IT environments, combined with the need for stealth and efficiency, makes this approach increasingly impractical.[1][2]

Existing red teaming tools often operate in silos, forcing penetration testers to use multiple applications for network reconnaissance, vulnerability scanning, and exploitexecution.Thisfragmentationnotonlyslowsdown theattack processbutalso increasestherisk ofdetection.

Moreover, traditional exploitation techniques rely on predefined rules and manual selection of attack vectors, which may not always be the most efficient or stealthy approach. As a result, red teamers often spend a considerable amount of time assessing different vulnerabilities and testing various exploits before achievingtheirobjectives.[3][4]

To address these challenges, we introduce STROT (StealthyToolforRootOrientedTunneling),anintelligent red teaming framework designed to streamline and automate the penetration testing process. STROT integrates three core components: a network analyzer, an intelligence module powered by Deep Q Learning, and an advanced attack engine. By leveraging artificial intelligence, STROT dynamically identifies vulnerabilities, determines the most effective exploits, and autonomously executes attacks with minimal human intervention. This significantly enhances the efficiency of red teaming exerciseswhilemaintainingahighdegreeofstealth.[5]

The key innovation behind STROT lies in its use of reinforcement learning to optimize the attack process. Unlike traditional tools that rely on static rule sets, STROT's intelligence module continuously learns from previousattackattempts,refiningitsexploitationstrategy tomaximizesuccessrateswhileminimizingdetection.Our research demonstrates that STROT reduces the time required to gain root access by 40% compared to conventional methods, making it a powerful asset for cybersecurityprofessionals.[6]

This paper explores the design, implementation, and performance evaluation of STROT, providing insights into its architecture, working principles, and real-world applicability. We compare STROT against existing red teaming methodologies, highlighting its advantages in terms of speed, efficiency, and stealth. Additionally, we discuss future enhancements to further improve its capabilitiesandadaptabilityinevolvingthreatlandscapes.

2. LITERATURE SURVEY

The evolution of cybersecurity techniques has seen a gradual shift from manual vulnerability assessments to automated approaches that leverage advanced artificial intelligence. Early tools such as Nmap for network scanning [Lyon, 2009] and exploitation frameworks like Metasploit[Kennedy etal., 2011]laidthegroundwork for

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

systematic vulnerability discovery and exploitation. However, the inherent limitations of these traditional methods primarily their dependence on human expertiseandstaticexploitation strategies havespurred researchintomoredynamic,autonomoussystems.

A pivotal breakthrough came with the introduction of Deep Q-Networks (DQN) by Mnih et al. (2015), which demonstrated that deep reinforcement learning (DRL) could achieve human-level performance in complex environments [Mnih et al., 2015]. This breakthrough has inspired subsequent applications in cybersecurity, particularly in automating decision-making processes during penetration testing. Researchers have increasingly focusedonintegratingDRLintocybersecurityframeworks to improve adaptability and efficiency in exploiting vulnerabilities.

Recent studies have extended these ideas specifically to the domain of offensive security. For instance, Zhou and Chen (2020) developed a DRL framework that dynamically selects targets and corresponding exploits, significantlyreducingthemanualoverheadinpenetration testing [Zhou & Chen, 2020]. In a similar vein, Chen et al. (2021) proposed an AI-driven system that integrates vulnerability scanning with machine learning to optimize exploit selection, thereby enhancing both the speed and success rate of penetration tests [Chen et al., 2021] Furthermore, the work of Li et al. (2022) on automated vulnerability discovery using deep learning techniques underscores the emerging trend toward end-to-end automationincybersecurityoperations[Lietal.,2022]

3. METHODOLOGY

3.1. STROT Architecture

STROT is designed to streamline and automate the red teaming process by integrating advanced scanning, intelligence, and exploitation techniques into a single framework. The methodology consists of three primary modules:

3.1.1. Network Analysis – This module is responsible for identifying target nodes and gathering network intelligence, such as open ports, services, and running versions.

3.1.2. Intelligence – Using Deep Q Learning, this module processes the scanned data and determines the optimal exploit for gaining root access in the shortest possible time.

3.1.3. Attack Engine – This module executes the selected exploits, monitors their success or failure, and provides feedbacktoimprovefutureattackstrategies.

The workflow begins with network reconnaissance, followed by vulnerability assessment, intelligent exploit

selection, and ultimately, an automated attack execution. Below,eachmoduleisdiscussedindetail.

3.2. Network Analysis

The Network Analysis module is the first step in the STROT framework, responsible for reconnaissance and informationgathering.Itoperatesinthefollowingstages:

3.2.1. Host Discovery & Target Identification: Identifies active hosts within the target network using stealthy scanningtechniques.

3.2.2. Service & Port Scanning: Enumerates open ports, running services, and their respective versions to build a networkprofile.

3.2.3.OperatingSystemFingerprinting:DeterminestheOS running on the target machine, which is crucial for selectingappropriateexploits.

This module ensures a comprehensive mapping of the network,layingthegroundworkfortheintelligencephase.

3.3.

Intelligence

The Intelligence module utilizes Deep Q Learning to analyze the scanned data and identify the most efficient exploitationpath.Thekeyfunctionsinclude:

3.3.1. Vulnerability Correlation: Matches identified servicesandversionswithknownvulnerabilities.

3.3.2. Exploit Selection: Determines the best exploit to achieverootaccesswithminimalattempts.

3.3.3. Adaptive Learning: Incorporates feedback from the attackenginetoimprovefutureexploitationdecisions.

By integrating machine learning, this module enhances efficiencyandminimizesdetectionrisks.

3.4.AttackEngine

TheAttackEngineexecutestheexploitsandprovidesrealtimefeedback.Itincludes:

Fig - 1:STROTModuleArchitecture

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

3.4.1.ExploitDeployment:Executestheselectedexploiton thetargetsystem.

3.4.2. Success & Failure Monitoring: Tracks whether the exploitsuccessfullycompromisedthetarget.

3.4.3.FeedbackLoop:Ifan exploitfails,themodulerelays information back to the Intelligence module for refinementandreattempts.

This module ensures automated exploitation with continuousadaptationtomaximizesuccessrates.

4. ATTACK PLANNING STRATEGY

4.1. Network Analyzer and Its Architecture

Fig - 2:STROTNetworkAnalyser

The Network Analyzer is a critical module of the STROT framework, designed to conduct reconnaissance and target profiling. It systematically scans and analyzes networked devices by leveraging Scapy, socket programming, and the Nmap Python API. These tools allow for the identification of live hosts, open ports, running services, and software versions, providing a detailedunderstandingofthetargetenvironment.[1][9]

The workflow of the Network Analyzer is illustrated in Figure 2, which outlines the sequential process followed for network reconnaissance. The process involves network scanning, host discovery, OS detection, port and service enumeration, and version scanning to construct a preciseattacksurfaceprofile.

4.1.1.HostDiscoveryandNetworkAnalysis

The first step in network analysis is identifying live hosts withinasubnetandcollectingessentialdetailsabouttheir configurations.

SocketProgrammingforBasicNetworkScanning

 The Network Analyzer utilizes Python’s socket module to establish connections with various hosts.

 ItsendsSYNrequeststospecificIPaddressesand listens for responses to determine if a host is active.

 If a SYN-ACK response is received, the system is marked as active; otherwise, it is classified as inactiveorfiltered.

 This method provides a lightweight approach to identifying networked devices before proceeding toin-depthanalysis.

ScapyforStealthyandPassiveScanning

 Unlike standard socket-based scanning, Scapy allows for raw packet crafting and interception, enablingstealthynetworkenumeration.

 The Network Analyzer sends TCP SYN packets to multipleIPsandlistensforSYN-ACKresponses.

 This technique avoids full TCP handshakes, reducing the likelihood of detection by Intrusion DetectionSystems(IDS).

 Additionally, passive traffic monitoring is employed to detect active hosts without directly probingthem,increasingstealth.

4.1.2.OSFingerprintingandTargetProfiling

Oncelivehostsareidentified,thenextstepisdetermining the operating system (OS) and hardware details to refine theattackstrategy.

OSDiscoverywithScapyandNmapAPI

 The framework uses Scapy’s IP and TCP stack fingerprinting to infer OS details based on packet structure,TTLvalues,andwindowsizes.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

 Simultaneously, the Nmap Python API executes advanced OS detection scripts, leveraging a databaseofknownOSsignatures.

 The combination of these techniques allows for precise identification of the target’s operating systemandnetworkstack.

TargetConfirmationforFocusedAttacks

 The user selects a specific node from the discovered devices, confirming the target for furtherscanning.

 If the selection is invalid or further validation is required, the Network Analyzer loops back to discoveryforrefinement.

4.1.3.PortScanningandServiceEnumeration

After confirming the target system, the framework performs port scanning to detect open ports and enumeraterunningservices.

ScapyforCustomPortScanning

 Using TCP SYN scans, the Network Analyzer probes specific ports without completing the full three-wayhandshake,ensuringstealth.

 The responses indicate which ports are open, closed,orfiltered.

 Additionally, UDP scanning is performed for detecting services like DNS, SNMP, and TFTP, whichdonotuseTCP.

NmapAPIforDetailedServiceDetection

 The Nmap Python API runs extensive scans on detected open ports, identifying running services andtheirrespectiveversions.

 By sending protocol-specific requests, Nmap retrieves service banners and matches them with knownfingerprintdatabases.

 This allows the framework to map vulnerabilities to specific services, which is critical for exploitation.

4.1.4.ServiceVersionIdentificationandReporting

The final step of network analysis is identifying the exact versions of running services, ensuring that only valid exploitsareselected.

VersionScanningwithNmapAPI

 Theframeworkquerieseachdetectedserviceand compares its response against a predefined databaseofknownsoftwareversions.

 Ifanoutdatedorvulnerableversionisdetected,it isflaggedforpotentialexploitation.

DataStructuringforExploitation

 The gathered information including IP addresses, OS details, open ports, running services,andsoftwareversions iscompiled into astructuredreport.

 This report is forwarded to the Intelligence module, where Deep Q Learning determines the mostefficientexploitationpath.

4.2. Deep Q Architecture for STROT

The Strategic Threat-Oriented Red Teaming (STROT) framework integrates Deep Q-Networks (DQN) for intelligent exploit selection, leveraging reinforcement learning (RL) to map vulnerabilities to optimal exploits dynamically. Conventional exploit selection techniques rely on signature-based detection or heuristic methods, which are limited in adaptability and efficiency. By employing Deep Q-Learning, STROT evolves its attack strategies over time, ensuring maximum exploitation efficiencywithminimaldetectionrisk.[1][2] Fig - 3:DeepQNetworkArchitecture

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

4.2.1.SuitabilityofDeepQ-NetworkforExploitSelection

Deep Q-Network (DQN) is particularly well-suited for STROTduetothefollowingadvantages:

 Handling High-Dimensional Inputs: The vulnerability space is vast, consisting of multiple factors such as CVE classifications, OS fingerprints, open ports, and system defenses. Traditional models struggle to manage such complexity, but DQN efficiently learns optimal exploit strategies through deep neural network (DNN)approximation.

 Generalization to Unseen Vulnerabilities: Unlike rule-basedsystemsthatrequirepredefinedattack patterns, DQN generalizes exploit selection for new or previously unseen vulnerabilities, making STROThighlyadaptable.

 Exploration vs. Exploitation Optimization: The model balances trying new attack vectors (exploration) and leveraging known high-success exploits (exploitation) using an ε-greedy policy, ensuringcontinuousrefinement.

 Reinforcement Learning for Attack Planning: Instead of executing attacks randomly, DQN learns from past exploit successes and failures, creating an adaptive attack strategy that evolves basedonreal-worldfeedback.

4.2.2.DQNModelFormulationforSTROT

The decision-making process in STROT is modeled as a MarkovDecisionProcess(MDP),whichconsistsof:

 State Space (S): Represents the system under attack,definedby:

st = {Vt , Ot , Pt , Ht } where:

o Vt =vulnerabilitytype(CVEID,riskscore, exploitability)

o Ot = OS details (Windows/Linux, version, securitypatches)

o Pt =openportsandrunningservices

o Ht = historical exploit success on similar targets

 ActionSpace (A):Representsthepossibleexploits thatcanbeexecuted:

at = {E1 , E2 , …, En }

where each Ei corresponds to a unique exploit strategy.

 State Transition Probability (P(s{t+1} | st , at )): Models the likelihood of the system transitioning toanewstatebasedontheattackexecution.

 RewardFunction(R):Quantifiesthesuccessofan exploitattempt:

{ +1, if exploit is successful

Vt = { -1, if exploit fails

{ 0, if system | state remains unchanged

Additional reward penalties can be applied to avoidnoisyattacksthatincreasedetectionrisk.

 Discount Factor ( ): Determines the importance of future rewards, controlling how much the model prioritizes long-term vs. short-term exploitationstrategies.

4.2.3.DeepQ-NetworkTrainingandExecution

The DQN training process in STROT follows the workflow illustratedinFigure3andconsistsof:

 StateRepresentation:Thedetectedvulnerabilities andsystemdetailsareencodedandinputintothe Q-Network.

 Q-Value Prediction: The network estimates Qvalues for each possible exploit action. The QvaluefunctionfollowstheBellmanequation:

Q(st , at ) = Rt + * maxa’ Q(st+1 , a’)

 Exploit Selection ( -greedy policy): The model selectsanexploitusing:

o Exploitation (1- ): Choose the exploit with the highest predicted success probability.

o Exploration ( ): Randomly select a new exploit to discover alternative attack paths.

 Attack Execution: The selected exploit is deployed,anditsoutcomeisrecorded.

 Reward Computation: The model updates the reward function based on attack success or failure.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

 Target Network Update: The target network periodically synchronizes its weights with the Qnetworktostabilizelearning.

 Experience Replay: Past exploit attempts are stored and replayed to improve learning efficiency, preventing overfitting to recent attack outcomes.

This iterative training and feedback loop enables continuousoptimizationofexploitselection,ensuringthat STROTdynamicallyadaptstoevolvingsystemdefenses.

.2.4.PerformanceBenefitsinSTROT

ByleveragingDeepQ-Learning,STROTachieves:

 Automated Exploit Mapping: Eliminates the need formanualselection,improvingefficiency.

 Optimal Privilege Escalation Paths: Identifies the most effective exploit chains for rapid system compromise.

 Reduced Detection Risk: Learns to prioritize lownoiseexploits,makingattackshardertodetect.

 Adaptability to New Defenses: Unlike signaturebased approaches, DQN continuously improves attack selection, making STROT resilient to evolvingsecuritymeasures.[1]

5. RESULTS

5.1. Deep Q-Network Performance

Analysis

The performance of STROT was evaluated by executing various exploits on vulnerable target machines, including Metasploitable and Kioptrix. The model was tested under controlled conditions to measure its efficiency in identifying and executing exploits while optimizing attack strategies. The following subsections provide a detailed analysisofkeyperformancemetrics,showcasingthetool’s effectiveness in vulnerability exploitation. The results are illustrated through various graphs, demonstrating the model’s learning capability, accuracy, and efficiency in performingcyber-attacks.

5.1.1.EpisodeRewardOverTime

Fig - 4:EpisodeRewardOverTimeGraph

The Episode Reward Over Time graph (Figure 4) demonstrates the learning progression of the Deep QNetwork (DQN) in STROT. The total reward per episode serves as an indicator of how well the model adapts to selectingoptimal exploitsover time.Asseen in the graph, the reward starts low but steadily increases, signifying that the model is effectively learning from previous actions.Thefluctuationsinlaterepisodessuggestadaptive exploration, ensuring that the model generalizes well across different attack scenarios. The upward trend confirms that STROT refines its strategy over time, successfully identifying high-impact exploits with greater confidence.

5.1.2.Q-ValueEstimatesOverTime

Fig - 5:Q-ValueEstimatesOverTimeGraph

The Q-Value Estimates Over Time graph (Figure 5) illustrates the progression of the mean and maximum Qvalues as the STROT model undergoes training. The Qvalues represent the model’s confidence in selecting optimal actions based on its learned policy. The steady increaseinbothmeanandmaxQ-valuesindicatesthatthe model is effectively improving its decision-making ability over time. The convergence towards higher values suggests that the DQN is successfully distinguishing

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

between effective and ineffective exploits, reinforcing optimal attack strategies. The occasional fluctuations reflect exploration behavior, ensuring adaptability in different attack scenarios. Overall, the upward trend validatestheefficiencyofSTROT’sreinforcementlearning frameworkincybersecurityexploitation.

5.1.3.DecisionLatencyOverTime

Fig - 6:DecisionLatencyOverTimeGraph

The Decision Latency Over Time graph (Figure 6) showcases the reduction in processing time per action as the STROT model undergoes training. Initially, the decision latency is relatively high, indicating a longer inference time due to the model’s lack of familiarity with optimal exploitation strategies. However, as training progresses, the latency significantly decreases, demonstrating improved efficiency in decision-making. This decline suggests that the model optimizes its action selection process, making faster and more confident exploitationattempts.Theoccasionalfluctuationstowards thelaterstagesmay beattributedtoadaptive exploration mechanisms, ensuring robustness against diverse target environments.Theoveralldownwardtrendhighlightsthe model’s increasing computational efficiency, making STROT a practical tool for real-world cybersecurity applications.

5.2. Real-World Performance Testing in Sandbox

To assess the real-world performance of STROT, we conducted controlled attack simulations on Metasploitable, a deliberately vulnerable Linux-based virtual machine designed for penetration testing. The testing process involved executing the STROT framework to autonomously identify vulnerabilities, map them to corresponding exploits, and attempt to gain root access. The evaluation focused on multiple aspects, including accuracy of vulnerability detection, exploit selection efficiency, and time taken to achieve root access. By leveraging Deep Q-Networks (DQN), STROT dynamically

optimized its exploitation strategy, selecting the most effective attack path with minimal trial-and-error. The findings from these attack simulations are summarized in thetablebelow.

5.2.1.MetasploitablePenetrationTestingResult

Table - 1: SimulationSandboxDescriptionfor Metasploitable

AttackMachine: Kali2024.4ARM64

AttackMachineIP: 192.168.16.212

AttackMachine VirtualizationPlatform: VMwareFusionProfessional Version13.6.3(24585314)

Netmask: 255.255.255.0

Broadcast: 192.168.16.255

NetworkConfiguration: Bridged

TargetMachine: Metasploitable2x86

TargetMachineIP: 192.168.16.228

Table - 2: STROT-CLIReport

STROTNetworkAnalysisReport:

RunningServiceandPorts:

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

execute attacks while maintaining stealth. STROT overcomes these challenges by integrating network scanning,artificialintelligence-basedexploitmapping,and an autonomous attack engine into a single streamlined framework.

Our methodology demonstrated how STROT efficiently scans networked systems using Scapy, Socket, and Nmap APIs, extracts critical vulnerability data, and utilizes Deep Q-Network-basedreinforcementlearningtodeterminethe optimal exploit for achieving rapid privilege escalation. The integration of machine learning and dynamic attack planning significantly improves both the accuracy and efficiency of penetration testing, making it an invaluable tool for red teams, cybersecurity researchers, and ethical hackers.

6667 Open irc unrealIRCd

8009 Open ajp13 ApacheJserv

8180 Open http ApacheTomcat

STROTIntelligenceReport:

ChosenServicetoExploit: ftp(vsftpd2.3.4)

ServiceVersion: vsftpd2.3.4

Reasoning: vsftpd2.3.4Backdoor

Exploittype: RemoteBackdoorCommand Execution

CVE: CVE-2011-2523

EDB-ID: 49757

AttackSuccess/Failure: Success

AttackFeedback: Successfullygainedreverse shellaccesstothetarget machinewith1triesexit(0)

TimeSpentbyExploit: 2760ms

6. CONCLUSION

In this research, we introduced STROT: Stealthy Tool for Root Oriented Tunneling, an advanced red teaming framework that automates the process of network reconnaissance, vulnerability analysis, and exploit selection using Deep Q-Networks (DQN). Traditional penetration testing tools require extensive manual effort to identify vulnerabilities, select appropriate exploits, and

WeevaluatedSTROTinacontrolledsandboxenvironment consisting of Kali Linux as the attack machine and Metasploitable 2 as the target system. The tool successfully identified vulnerable services, such as vsftpd 2.3.4, and autonomously selected the most effective exploittogainsystemaccess.Theresultsdemonstratethat STROT not only accelerates attack execution but also optimizes stealth strategies to reduce detection risks, making it highly effective for real-world adversarial simulations.

7. FUTURE WORK

While STROT has shown significant advancements in autonomous exploitation, there remain opportunities for furtherrefinement.

7.1. Expanding the Exploit Database

Enhancing STROT’s exploit selection with real-time updates from exploit repositories (e.g., ExploitDB, Metasploit).

Adaptive Evasion Techniques – Implementing reinforcement learning-based evasion tactics to counter intrusiondetectionandpreventionsystems(IDS/IPS).

7.2. Multi-Target Attack Coordination

Extending the framework to orchestrate attacks across multiplenodessimultaneouslyforadvancedredteaming.

7.3. Integration with Defensive Mechanisms

Using STROT as a defensive tool for cyber deception and adversarysimulationinblueteamenvironments.

In conclusion, STROT represents a paradigm shift in red teaming operations, demonstrating how artificial intelligence and machine learning can enhance offensive cybersecurity strategies. By leveraging autonomous decision-making for exploit selection and execution,

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

STROTsignificantlyreducesthetime,effort,andexpertise required for penetration testing, paving the way for nextgenerationintelligentcybersecuritytools.

8. AVAILABILITY and LICENSING

The STROT framework is publicly available as an opensource project under the CODEXIST.dev organization. It is released under the GNU General Public License (GPL), allowing users to freely use, modify, and contribute to its developmentwhileensuringcompliancewithopen-source licensingstandards.

 Organization:CODEXIST.dev

 Organization’sWebsite:www.codexist.dev

 STROTOfficialWebsite:strot.codexist.dev

 GitHubRepository:

github.com/codexistdev/Project-STROT

The repository provides comprehensive documentation, source code, and contribution guidelines for developers and security researchers interested in enhancing, testing, or deploying STROT. Users can download the project, report issues, submit feature requests, and contribute to itscontinuousimprovement.

REFERENCES

[1] Oh, S.H.; Kim, J.; Nah, J.H.; Park, J. Employing Deep Reinforcement Learning to Cyber-Attack Simulation for Enhancing Cybersecurity. Electronics 2024, 13, 555.[CrossRef]

[2] Enoch, S.Y.; Huang, Z.; Moon, C.Y.; Lee, D.; Ahn, M.K.; Kim, D.S. HARMer: Cyber-attacks automation and evaluation. IEEE Access 2020, 8, 129397–129414. [CrossRef]

[3] Bahrami, P.N.; Dehghantanha, A.; Dargahi, T.; Parizi, R.M.; Choo KK, R.; Javadi, H.H. Cyber kill chain-based taxonomy of advanced persistent threat actors: Analogy of tactics, techniques, and procedures. J. Inf. Process.Syst.2019,15,865–889.[CrossRef]

[4] Barto, A.G.; Sutton, R.S.; Anderson, C.W. Neuronlike adaptive elements that can solve difficult learning controlproblems.IEEETrans.Syst.ManCybern.1983, 13,834–846.[CrossRef]

[5] The MITRE Corporation. Ajax Security Team, The MITRE Corporation. 2016 Available online: https://attack.mitre.org/groups/G0130/(accessedon 5December2022).

[6] Kumar,R.;Aggarwal,R.K.;Sharma,J.D.Energyanalysis ofabuildingusingartificialneuralnetwork:Areview. EnergyBuild.2013,65,352–358.[CrossRef]

[7] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015,518,529–533.[CrossRef]

[8] J. Burton, I. Dubrawsky, V. Osipov, C. T. Baumrucker, and M. Sweeney, Eds., ‘‘Cisco enterprise IDS management,’’ in Cisco Security Professional’s Guide toSecureIntrusionDetectionSystems.Burlington,NJ, USA:Burlington,2003,ch.10,pp.429–479.

[9] R. Al-Shaer, M. Ahmed, and E. Al-Shaer, ‘‘Statistical learning of APT TTP chains from MITRE ATT&CK,’’ in Proc.RSAConf.,2018,pp.1–2.

[10] Amazon. (2020). Amazon Elastic Compute Cloud EC2. Accessed: May 4, 2020. [Online]. Available: https://aws.amazon.com/ec2/

[11] A. Applebaum, D. Miller, B. Strom, C. Korban, and R. Wolf, ‘‘Intelligent, automated red team emulation,’’ in Proc. 32nd Annu. Conf. Comput. Secur. Appl., Dec. 2016,pp.363–373.

[12] D. L. Bergin, ‘‘Cyber-attack and defense simulation framework,’’J.DefenseModel.Simul.,Appl.,Methodol., Technol.,vol.12,no.4,pp.383–392,Oct.2015.

[13] M. Boddy, J. Gohde, T. Haigh, and S. Harp, ‘‘Course of action generation for cyber security using classical planning,’’inProc.15thInt.Conf.Int.Conf.Automated PlanningScheduling,2005,pp.12–21.

[14] Y.Cheng,J.Deng,J.Li,S.A.DeLoach,A.Singhal,andX. Ou, ‘‘Met-rics of security,’’ in Cyber Defense and Situational Awareness. Cham, Switzerland: Springer, 2014,pp.263–295.

[15] C.S.Choo,C.L.Chua,andS.-H.-V.Tay,‘‘Automatedred teaming: A proposed framework for military application,’’ in Proc. 9th Annu. Conf. Genet. Evol. Comput.(GECCO),2007,pp.1936–1942.

[16] C. L. Chua, W. C. Sim, C. S. Choo, and V. Tay, ‘‘Automated red teaming: An objective-based data farming approach for red teaming,’’ in Proc. Winter SimulationConf.,Dec.2008,pp.1456–1462.

[17] M. Cremonini and P. Martini, ‘‘Evaluating information security investments from attackers perspective: The return-on-attack(ROA),’’inProc.4thWorkshopEcon. Inf.Secur.,Jun.2005,pp.1–3.

[18] F. M. Zennaro and L. Erdodi, ‘‘Modeling penetration testingwithreinforcementlearningusingcapture-theflag challenges and tabular Q-learning,’’ 2020, arXiv:2005.12632. [Online]. Available: http://arxiv.org/abs/2005.12632

2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN:2395-0072

[19] M.Zhang,L.Wang,S.Jajodia,andA.Singhal,‘‘Network attack surface:Liftingtheconceptof attack surfaceto the network level for evaluating networks’ resilience against zero-day attacks,’’ IEEE Trans. Dependable Secure Comput., early access, Dec. 21, 2018, doi: 10.1109/TDSC.2018.2889086

[20] N. Poolsappasit, R. Dewri, and I. Ray, ‘‘Dynamic security risk management using Bayesian attack graphs,’’IEEETrans.DependableSecureComput.,vol. 9,no.1,pp.61–74,Jan.2012.

[21] C. W. Probst and R. R. Hansen, ‘‘An extensible analysable system model,’’ Inf. Secur. Tech. Rep., vol. 13,no.4,pp.235–246,Nov.2008.

[22] S. Randhawa, B. Turnbull, J. Yuen, and J. Dean, ‘‘Mission-centric automated cyber red teaming,’’ in Proc. 13th Int. Conf. Availability, Rel. Secur. (ARES), 2018,p.1.

[23] T. Reed, R. G. Abbott, B. Anderson, K. Nauer, and C. Forsythe, ‘‘Simulation of workflow and threat characteristics for cyber security incident response teams,’’ in Proc. Hum. Factors Ergonom. Soc. Annu. Meeting,vol.58.LosAngeles,CA,USA:SAGE,2014,pp. 427–431.

[24] C. Sarraute, O. Buffet, and J. Hoffmann, ‘‘Penetration testing==POMDP solving?’’ in Proc. Workshop Intell. Secur., Secur. Artif. Intell. (SecArt), 2011, pp. 1–8. [CrossRef]

[25] C. Sarraute, G. Richarte, and J. L. Obes, ‘‘An algorithm to find optimal attack paths in nondeterministic scenarios,’’ in Proc. 4th ACM Workshop Secur. Artif. Intell.(AISec),2011,pp.71–80.

[26] B. Schneier, ‘‘Attack trees,’’ Dr. Dobb’s J., vol. 24, no. 12,pp.21–29,1999.

BIOGRAPHIES

PratikSuhasPawar erpratiksp@gmail.com

Computer programmer and deep learning expert, with multidisciplinary expertiseinsystemengineering,system programming, and cybersecurity tool development.Heplayedapivotalrolein the core development of STROT, focusing on its intelligence module, architecture, and system integration. A UG student in the Department of Computer Engineering at Savitribai Phule Pune University, Pune, India, he specializes in AI-driven security, low-

level system design, and automation, bringing advanced machine learning techniques to offensive cybersecurity applications.

ShubhamPandurangSakhare er.shubhamsakhare@gmail.com

A cybersecurity practitioner, he played a key role in documenting, networking and sandbox creation, for the STROT project,ensuringa structuredapproach to its development. A UG student in the DepartmentofComputerEngineeringat Savitribai Phule Pune University, Pune, India,heisdedicatedtoethicalhacking, penetration testing, and cybersecurity research.

VishnuLatishNair ervishnu00@gmail.com

Cybersecurity practitioner, specializing in virtualization, vulnerability analysis, and exploit identification, who contributed to the graphics and architectural overview of STROT. A UG studentintheDepartment ofComputer Engineering at Savitribai Phule Pune University, Pune, India, he focuses on penetration testing, network security, andsecuresystemdesign,playingakey role in evaluating attack surfaces and refining the project’s strategic approach.

VishalGandharPuranik vishalpura@gmail.com

Head, Department of Computer Engineering at Parvatibai Genba Moze College of Engineering, Wagholi, Pune, India, he has provided invaluable guidance and mentorship throughout the development of STROT. His expertise in computer engineering has played a crucial role in shaping the project, offering technical insights and strategic direction to enhance its effectiveness and real-world applicability.