Hi @brett, as the resident data scientist at Xinja I thought I would clarify how we are currently doing categorisation, and how we might go about solving this problem in the future.
At present Xinja is not leveraging any AI for categorisation - what we use is something called the merchant category codes (or MCC for short ).
WTF is an MCC Code?
You may be thinking WTF is an MCC code? Well in short the MCCs are an assortment of hundreds of 4-digit codes set by VISA that categorise all merchants that are compatible with VISA globally.
While they’re not perfect the categories themselves are quite comprehensive. They cover everything from Aquariums (7998) , to Art Galleries(5198) , and they even cover Snow Mobile Dealers (5598) .
But how does the actual categorisation work?
Xinja essentially has a lookup table that maps each of the MCCs to the higher-level categories that you know and love i.e - Eating Out & Drinks, Entertainment, General, etc.
If it’s so simple why does it take so long to categorise?
Because of the nature of the Prepaid Card we don’t get the enriched financial information until the payment has settled. This means on occasion it can take sometime for the transaction to properly categorise.
Can we use Machine Learning to do Better?
As you pointed out there are some clear limitations to using only the MCCs. Namely, they’re often not accurate, not adaptive, and don’t take into account the context of the spend.
To solve this issue we would most likely using a machine learning approach referred to as Supervised Learning. This is where we take labelled training data, which comprises of features - which you can think of as data points that can encode predictive information about what the true category could be - The MCC, the payment amount, date, time, customer feedback, etc. It also contains the target - the true category of the transaction.
In theory if we collect enough data our model approximate how each of the features should be weighted and thus we should be able to more or less return true category of the payment in near realtime. Additionally, as more data is collected model learn improve as data increases .
I’ve glossed over how the model algorithms work and how they’re evaluated. However, that’s an article for another day.
Build Vs. Buy
To build this kind of service ourselves requires considerable effort. Collecting data, training the model, putting the model into production, testing, monitoring…It’s an enormous task!
Instead we might want to leverage other fintech’s that offer Categorisation as a Service that can do all this for us. What this means is that we can offer a fantastic categorisation experience and focus our resources on other awesome features for our customers .
Categorisation is Key
Spend Categorisation is one of the most effective ways to get context around our own financial behaviour. As such getting this right is critical to our mission of helping people make better financial decisions.
Hopefully this writeup gives you some insight into how our current systems work and where our thinking is at around categorisation.
Thanks for reaching out