Using Knowledge Distillation on Augmented Graph Convolutional Networks to Detect Money Laundering in Bitcoin Transactions
Based on its real meaning, cryptocurrency is more than digital money. It is decentralized digital money built on innovative blockchain technology, i.e. there is no authority governing cryptocurrency. Instead, users take on various tasks associated with managing the value of a cryptocurrency through digital media.
Bitcoin, the most popular cryptocurrency today, was also the first launched by an anonymous person or group called Satoshi Nakamoto in 2008. As a decentralized and digitized form of currency, investing in Bitcoin has many advantages. But it also has its share of drawbacks.
Anonymity for all users in a decentralized climate also offers anonymity to criminals. It encourages money laundering, theft and other malicious activities. Governments and institutions quickly realized this and put in place anti-money laundering (AML) regulations for everyone. However, these policies also have shortcomings.
So, can technology be the ultimate solution to detect and mitigate money laundering for a currency it created? Let’s find out in this research project.
The gaps posed real challenges
To find solutions, the project had to overcome some key challenges that the fraud detection field has faced so far:
- Machine learning (ML) could be the answer to money laundering problems. Through studies of data models, it can provide adaptable techniques to detect fraud. To date, a fair amount of work has already been done to build such ML models. However, almost all of these models reflect real-world scenarios poorly, as they use synthetic data sets. In addition, no current literature has sufficient information on the execution load of models.
- As studied by Weber et al. (2019), a model made significant progress combining Random forest with EvolveGCN. But there is practical limits to apply it in the real world. It requires excessive storage and compute expense, which is not widely applicable.
- When the previous work was examined in depth, it was concluded that existing research and models assume that incoming data will always be organized and static. This is far from the truth in a real world scenario. Real world data is in a constant state of flux due to numerous cryptocurrency transactions and interactions.
- Cryptocurrency fraud detection is in its infancy because the industry itself is only ten years old. Researchers and data scientists around the world only understand the potential it holds and the problems it creates. Therefore, technology-assisted fraud detection is not as advanced. Investigative agencies take months to trace a transaction and determine its authenticity.
- One shortcoming in AML regulations is the cost it imposes on users. This gap negatively impacts the poor, low income groups as well as refugees and immigrants. They have limited access to earn money. Add to that the costs involved in identifying themselves as honest operators, and they are even more discouraged from participating in economic activities.
Building the Bitcoin Fraud Detection Model
The above challenges prompted this project to draw inspiration from several ML techniques and models. Eventually, he came up with a new model that adds to the capabilities of the previous models and gets better mixed performance. The final model is compact and maintains or tries to exceed the set benchmarks.
A freely available dataset, courtesy of the collaboration between Elliptic Co. and Weber et al. (2019), was used. It is one of the most labeled cryptocurrency datasets in the industry. This dataset contains Bitcoin transaction data in graphical format and stores transactional entities as nodes. All nodes have 166 characteristics and one of three labels – legal, illegal, or unknown. 21% of the data is legal, 2% illegal, while the rest is unknown.
The project wanted to find the latest techniques from AML in cryptocurrency.
First, he looked at the results after applying attributes to the dataset. Then he implemented a graphical neural network with classification algorithms to improve learning and classification skills. He then used the knowledge distillation to compress the model and reduce his memory footprint. Finally, the model captured and compared the performance of the proposed model with benchmarks.
The process went as follows:
- Literature review:
The literature review provided answers to understand cryptocurrency and fraud detection. It revealed technological gaps as well as opportunities for improvement. It also formed the basis of the research, from the description of the problem statement to the definition of the objectives.
- Proposed implementation:
The algorithms found during the literature review were implemented in a specific order – starting with the realization of a baseline, the application of the random forest and the distillation of knowledge, and the evaluation of the model relative to the baseline.
- Precision, recall and weighted F score:
Precision and recall were essential parameters to verify production efficiency. Most of the existing models work with synthetic data sets that do not reflect real world scenarios. These do not provide precise results. The weighted F score also helped provide the most unbiased estimate of the new model’s performance.
- Model run time:
This was the time it took to label a given data point. First, however, the metric was verified by applying the proposed knowledge distillation technique.
- Storage space:
To understand the computational intensity of the model, the performance benchmarks used were model storage space and CPU usage. Knowledge distillation has been applied to reduce this.
The idea was to implement the future work proposed by Weber et al. (2019).
Node functionality was built using EvolveGCN and Random Forest to build integrations. Then a variant of the decision forest – logistic regression – was used as the output layer. This method was considered the best way to integrate both the EvolveGCN and Random Forest algorithms.
Results of the fraud detection model
The results of the proposed models were:
|EvolveGCN-O||Reference||● Good ranking performance
● Decent performance time and disk space usage
● Quite complex
|d-EvolveGCN||Distilled||● Improved classification performance
● 10% increase in illegal metrics and 8% increase in performance of MicroAvg metrics
● Slightly lower run time
● Less complex and less disk space
|dNDF||Reference||● Similar performance to EvolveGCN-O
● Best precision score
● Less performance time and 337% increase in execution time
● 23% more disk space
|d-dNDF||Distilled||● Much better classification performance than dNDF
● Au pair with d-EvolveGCN
● Maximum CPU memory was best of all
Where can this fraud detection model be used?
The project successfully used existing models to create a robust and accurate algorithm and compact it for widespread application.
The proposed model is designed to unmask fraudulent transactions in the ever-growing cryptocurrency industry. In addition, it will help to discourage malicious individuals from defaming a decentralized monetary system.
Everyone deserves equal opportunity. In order for people to reap the benefits of cryptocurrency, they need to be included in new financial systems without apprehension of rules, regulations and fear of being cheated. This model will take governments one step closer, using technology to minimize physical and monetary restrictions.
The model could also be used for prediction of complaints and defects, prediction of attrition and conversions, spam and anomaly detection, intrusion detection, etc.
Suraj Krishnamoorthy is a upGrad learning, and as part of his program, he developed the thesis report titled – Using Knowledge Distillation On Augmented Graph Convolutional Networks To Detect Money Laundering In Bitcoin Transactions.
Subscribe to our newsletter
Receive the latest updates and relevant offers by sharing your email.