Based on the Observations, it was found that the hacker might choose to go for few small level of attacks before making a big move. At this point it is assumed that we (being the vendor), are already observant and knowledgeable about emerging trends in cybercrime. Extracting few cents out of few user bank accounts which can easily be neglected by the user when viewing the user copy of the monthly transaction report becomes a task which is not easily noticeable. We will read such patterns and inform the Bank about the occurrences, providing the exact date, location and frequency of transactions. Feeding the data received from the bank in SQlite3 database in Python and then using dictionaries to analyze and separate the targeted anomalies as and when the concerns arise. This is done so that this database can be monitored in real time and any concerns which seem fraudulent can be brought to the bank’s notice at the first instance.
This Solution is built using SQlite3 which is integrated in Python. The information received from the bank, has been split into different tables of SQLite. The idea is to have the database updated in real time and extraction of any fraudulent data which is identified based on certain guidelines in this case which is assumed to be three of them.
- The first anomaly is identified when there is any discrepancy for amount withdrawals less than a certain amount from the user’s account which is unlikely to be have taken place.
- Second Anomaly is based on Frequency. If there are transaction on the same date multiple times.
- Third Anomaly is based on the Preferred location of an individual user which is gathered as per the trend provided.
Each user has been assigned an ID. This ID acts as a primary key/foreign key for other tables. . The following steps describe the working of the Solution in detail:
- Five tables are created
- Preferred Location
- To detect the Fraud using ‘Amount’ parameter, we have set a threshold value of 10$. Any transaction less than this value will be recorded. The date and User Account number for that transaction will be saved in an excel sheet which will be further used to draw the results.
- If there are more than two transactions which are less than 10$, we have recorded that as well.
- We received details from bank which has records of ‘Preferred location’ where a user can use the Credit card. In this Solution we have accepted Zip codes as a record from the bank.
- The Dictionary function was used to create key value pair for Date and Amount. Dictionary function was used to determine the number of transaction that occurred for less than 10$ value.
- These segregated values are then compiled on one data sheet as per the number of Users.
For all the results received from the steps above, an analysis is performed and can be put in a graphical representation shown in Below Figure:
The x-axis in figure 1 is ‘Time and Date’ and Y-axis the ‘Card numbers’ used for the transaction. The small lines represents the transaction amounts during which are greater than 10$, however we have highlighted the transactions for less than 10$ using the taller lines because we are using that as a threshold for this Solution. As we can see from figure 1, that once the data provided by us is integrated into one single form, a pattern of attack could be detected. The area marked in the figure in red represents that: There has been more than 2 transactions, for less than 10$ at non-Preferred locations at the same time. There will be a very less probability that ‘n’ number of users are performing transaction at the same time, and all from the non-preferred location. This could be termed as a possible Compromise of their credit card. However, we leave the decision to the bank to use their ways of communication to reach the user to inform about the anomalies.
I hope this tutorial will surely help you. If you have any questions or problems please let me know.
Happy Hadooping with Patrick..