In recent years, with the support of national policies, small and micro enterprise loans have received more and more attention, and have become an important indicator to measure the development potential and capabilities of banks. Because the risk is too high, many banks are reluctant to lend to small and micro enterprises, and how to avoid risks and reduce the non-performing rate of small and micro enterprises is particularly important.

At present, most banks use whitelisting mechanism for risk management of small and micro enterprise loans, while whitelisting is achieved through screening rules and risk models. Both the rules and the risk model depend on the relevant data of small and micro enterprises and their controllers. For risk management, relevant data can include central bank credit reporting, taxation, reputation, finance, intangible assets, etc. However, for banks In fact, only has the central bank’s credit report. On the data side, banks have no advantage over other E-commerce companies or ERP software companies that can directly analyze small and micro enterprise transaction data. As shown in Figure 1.

Fig.1

Data is an important asset for data providers. If it can be applied reasonably, it can bring considerable benefits to themselves. However, as user privacy protection becomes more and more strict, it is often difficult for data providers to simultaneously protect user privacy and protect their own data security and their benefit.

A new machine learning framework called federated learning was proposed. Federated learning provides a promising approach for model training without compromising data privacy and security. As shown in Figure 2. Heterogeneous federated learning can solve the problem of not leaking data to others and achieving equivalent or close to the effect of the full data model.

Fig.2

For instances, Webank, has label Y and central bank credit feature X3 (risk management related), conducts heterogeneous federal modeling with cooperative companies. The cooperative company has invoice related data X (such as X1, X2), as shown in Figure 3. Webank expecting to optimize its own prediction model. Based on the traditional method, there are two problems. First, the cooperative company can’t train model because they don’t has label Y. Second, because of the privacy security issue, it is not feasible to transfer the invoice data X of the cooperative enterprise to the Webank directly. Fortunately, Webank, the first Internet bank of China, introduces a novel approach called Federated Learning to solve this problem. Federated learning enables multiple institutions building a federated model without sharing their data physically.

Before heterogeneous federated learning, We should find the Commons Users, like Taxpayer Identification Number, between Webank and the cooperative company. But it can’t let the other party know the Users which is not match. This technique is called PSI (Private Set Intersection),as shown in Figure 3。Using RSA encryption technology, the partner can securely find the common users by interacting with the encrypted intermediate result instead of raw user data。The specific details can be found in:

https://github.com/FederatedAI/FATE/tree/master/federatedml/statistic/intersect

Fig.3

Using heterogeneous federated learning, as shown in Figure 4, can train a federated model under privacy protection conditions, and the federated model effect exceeds one-sided data modeling. In order to achieve federated learning, FATE (Federated AI Technology Enabler), an open-source project initiated by Webank’s AI Department has been taken advantage.

Fig.4

With the usage of FATE, Webank train a federate model with the cooperative company, which has invoice data . The federated trained model is called Heterogeneous Logistic Regression(Hetero-LR). different from the traditional Logistic Regression. Hetero-LR, maintains their models in Webank and partners, uses their own data for training on each side. Their uses encrypted intermediate results to interact and aggregate the final model gradient, and update the model in each side. The whole training process is both data security and model security. If you are interested in this and want to know more about it, you can find more information in FATE(https://github.com/FederatedAI/FATE/tree/master)

This cooperation has achieved great success which improve the performance of model Significantly. Compared to the central bank’s credit score using only Webank, the AUC of Hetero-LR model has increase 12%. The loan non-performing rate is obviously declining with the improvement of the model effect, as shown in Figure 6.

Fig.5

With utilization of FATE, the data island problem has been solved creatively which greatly expand the range of artificial intelligence applicable. Simultaneously, user privacy and institution data security can be protected much better.