International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 09 Issue: 05 | May 2022
p-ISSN: 2395-0072
www.irjet.net
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN Dhruvang Gondalia1, Omkar Gurav2, Ameya Joshi3, Aniruddha Joshi4, Prof. Sangeetha Selvan5 1,2,3,4UG
Student, Dept. of Computer Engineering, Pillai College of Engineering, New Panvel, India Professor, Dept. of Computer Engineering, Pillai College of Engineering, New Panvel, India ---------------------------------------------------------------------***--------------------------------------------------------------------5Assistant
Abstract - With the increasing number of fraudulent claims
can be improved by using other balancing techniques like ADASYN which is grouped under over sampling technique data balancing technique.
in the insurance industry, this issue needs to be contained. Car insurance fraud is the most common compared to all other types of fraudulent claims. Therefore, it is necessary to have a system to detect and prevent such fraud, and it is necessary to build a system to detect insurance fraud. Many fraud detection models are created using a variety of algorithms and techniques. We used a random forest as a classifier and ADASYN to balance the dataset. One Hot Encoding was used to resolve an issue of undesirable attributes during balancing the dataset. This application we created can be used by car insurers to evaluate customer claims more quickly than other traditional methods that involve manual tasks. Therefore, this application helps find out if the claim is genuine or fraud while the customer is claiming insurance. It is more accurate and free of fraud than traditional methods. Other techniques such as SVM can be used, but for this particular problem, Random Forest seems ideal because it provides significantly better accuracy than other techniques.
ii. Performance comparative study of machine learning algorithms for automobile insurance fraud detection: The author showed a study comparing ten of the most frequently used machine learning algorithms for detecting fraud in insurance claims. The study shows that the Random Forest algorithm has the best performance for insurance fraud detection. iii. Detecting Fraudulent Motor Insurance Claims Using Support Vector Machines with Adaptive Synthetic Sampling Method: They have used ADASYN for balancing the dataset where it tries to increase minority class samples by adding similar entries in it. Base model used in this project was SVM but the dataset used in it consists of only 1000 rows out of which 25% of the data consists of fraudulent claim and rest were genuine claim.
Key Words: ADASYN, SVM, Random Forest, Data Sampling, Insurance Fraud, Fraud Detection, One Hot Encoding
iv. Automobile Insurance Fraud Detection using Supervised Classifiers: The dataset used in this project is not available on internet the dataset consists of 11 different columns such as Gender of Policyholder, Police Report File ,Model of Car etc So for balancing the dataset the author used SMOTE to balance it and tested dataset with 3 different classifier they are Multi-Layer perceptron, Decision tree, and Random forest, Author found that Random forest is best technique for this problem statement.
1. INTRODUCTION Insurance fraud occurs when an insurance provider, advisor, adjuster, or consumer intentionally deceives in order to obtain an illegal gain. There has been an increase in fraudulent insurance claims in recent years, particularly in the automobile insurance industry. Falsify insurance claim information, exaggerate insurance claims to represent an accident, or submit a claim form for damage or injury that has never occurred by making a false claim for car theft. That's all an example of a car insurance fraud. When insurance companies use fraud detection systems, they not only detect fraud but also save millions, if not billions, of dollars that would otherwise be paid to the person who made the fraudulent claim.
v. Fraud Detection by Machine Learning: Here the author discusses different types of credit card frauds. He proposed the dataset should be in 1:1 ratio for fraud and genuine cases. And he tested different machine learning algo such as logistic regression, support vector machine, boosted trees, random forest, and neural network etc. and found random forest to be the best fit algorithm for his dataset.
3. Dataset and Parameters
2. LITERATURE REVIEW
The experimental dataset used in this study is provided by the user Jwilda on kaggle[6]. The dataset has 15,420 rows with 33 columns of data. Each row in the dataset has 33 attributes in total. Out of which, 32 are claim features that will help to predict the last 1 variable, called the class label. Here, FraudFound is our target variable which will contain a
i. Detecting Fraudulent Insurance Claims Using Random Forests and Synthetic Minority Oversampling Technique: The author used SMOTE to balance the dataset and used Random Forest for the prediction of the claim, So SMOTE with random forest gives accuracy upto 94%. But it
© 2022, IRJET
|
Impact Factor value: 7.529
|
ISO 9001:2008 Certified Journal
|
Page 104