Tech Blog

# Machine learning model explainability through Shapley values

by Christiane Ahlheim & Markus Kunesch

October 31, 2019

In the first of our series of technical blogs exploring explainability, we will look at how to use Shapley values to explain black-box models.

Model explainability aims to provide visibility and transparency into the decision making of a model.

On a global level, this means that we understand which features the model is using, and to what extent, when making a decision. For each single feature, we would want to understand how this feature is used, depending on the values it takes.

And on a local level, that is, for any individual data point, we would want to see why the model made a certain decision. This can give us more insight into where and why the model might fail.

## What are Shapley values?

Shapley values are a concept from game theory (see Wikipedia for more info). Shapley values measure how much an individual player contributes to a game.

For this, we look at each coalition of players and which outcome they achieve.

The Shapley value of player $x$ is defined as the weighted average difference between the coalitions that include player $x$ and those that don’t.

When using Shapley values for model explainability, we can think of each feature as a player, the game being the prediction of the target variable, and the score being the model output when predicting the target.

We calculate Shapley values by taking different coalitions of features, that is, various number of features with their true values.

In an ideal world, we would only pass those features, but unfortunately, a trained ML model needs to be passed a value for each feature it has been trained on.

We therefore fill features that are not part of the current coalition with random splices from the dataset.

The number of possible coalitions grows exponentially, so in practice we typically need to sample coalitions.

The noise that is introduced by random splicing and the sampling of coalitions causes the Shapley values to be estimates, and means they will have an uncertainty associated with them.

The more coalitions we sample, the smaller this uncertainty. The equation below describes how local Shapley values are calculated:

$\phi_y(x_i; x) = \sum_{i \, \not \ni \, S} \, \frac{|S|! \, (n-1-|S|)!}{n!} \, \big[ f_y(x_S \cup x_i) - f_y(x_S) \big]$

Here, $\phi_y(x_i; x)$ is the Shapley value for feature $x_i$ for one datapoint $x$ in our dataset.

The sum is over all coalitions $S$ not containing feature $i$, and $n$ is the dimensionality of $x$.

The factor $\big[f_y(x_S \cup x_i) - f_y(x_S)\big]$ is the marginal contribution to $f_y(x)$ that feature $x_i$ makes when added to coalition $S$.

This factor gets multiplied by $|S|!$ and $(n-1-|S|)!$, the number of permutations of $x_S$ and $x \setminus (x_S \cup x_i)$, respectively.

Dividing this by $n!$ makes the Shapley value $\phi_y(x_i; x)$ an average over all orderings in which $x$ can be constructed.

Shapley values have a number of useful properties and benefits over other measures of feature importance:

1. Unit: Shapley values sum to the model accuracy.
2. Symmetry: Two features that have the same importance also have the same Shapley value.
3. Linearity: When building a linear ensemble model, the total Shapley value of a feature is the linear combination of its Shapley values across models.

Shapley values can be defined on a global level, indicating how the model overall uses the features, and a local level, indicating how the model made a decision for an individual data point.

The local Shapley values sum to the model output, and global Shapley values sum to the overall model accuracy, so that they can be intuitively interpreted, independent of the specifics of the model.

In what follows, we’ll walk through an example data set and see how global and local Shapley values can be calculated, visualised, and interpreted.

The dataset we are using is the Lending Club dataset. LendingClub is the world’s largest peer-to-peer lending platform. According to Wikipedia:

Lending Club enables borrowers to create unsecured personal loans between $1,000 and$ 40,000. Investors can search and browse the loan listings on Lending Club website and select loans that they want to invest in based on the information supplied about the borrower, amount of loan, loan grade, and loan purpose. Investors make money from interest. Lending Club makes money by charging borrowers an origination fee and investors a service fee.

We’ve trained a neural network on Lending Club’s data to predict loan outcomes: charged off versus fully paid.

## Global Shapley values

We’ll start with an example for global Shapley values. They tell us how the model works overall.

For each feature, we draw multiple coalitions as described above and calculate the change in model accuracy as we add the feature in question to the coalition.

The weighted average of this change in accuracy over all drawn coalitions is the estimate of the Shapley value.

The uncertainty in the estimate is the error in the mean, that is, the standard deviation over all drawn coalitions, divided by $\sqrt{m}$, where $m$ is the number of coalitions that we sampled:

$\frac{\frac{1}{M}\sum_{m=1}^M((\hat{f}(x^{m}_{+j})-\hat{f}(x^{m}_{-j})) - \hat{\phi}_{j})^2}{m}$

We can visualise these results to see which features are more or less important.

The most important feature by far is sub_grade. The overall small error bars show us that we can be fairly certain about the ordering and the values of these Shapley values.

However, if those were larger, we would need to sample more coalitions – in particular for models with a large number of features, this can become an issue.

## Local Shapley values

We’ll now move on to calculating local Shapley values. Local Shapley values are calculated on a single-row basis, i.e., for a single datapoint.

There are two things we could look at:

1) What is the distribution of local Shapley values for a given feature across the dataset?

This tells us how the model might be using the feature differently, depending on the value the feature takes.

2) What are the local Shapley values of all features for a specific row in my dataset?

This tells us why the model made a particular decision for a row in our dataset.

### 1) Distribution of local Shapley values

We’ll look at the feature fico_score, which is the third most important feature of our model, based on the global Shapley values.

We will calculate the local Shapley value for a number of datapoints, and can visualise which Shapley value fico_score takes, depending on the value of fico_score for a given datapoint.

In the plot, we further colour the data points depending on their true label, and set the marker depending on the model prediction, which could be correct or incorrect.

We see that most data points fall closely together, and that the local Shapley values of fico_score vary between -0.1, indicating that they’d support a model decision of the wrong label and +0.15, supporting a model decision of the true label.

From this plot, we can also identify outliers – points that do not fall within that narrow band, and where the model makes the wrong prediction.

Of course, we would now like to understand better why the model made the wrong prediction.

To that end, we can look at a similar bar chart as before, but this time calculating the local Shapley values of all features for the particular data point that we are interested in:

The most important feature by far here is application_type – but we see that it is actually pushing the model prediction away from the true label. Why would this happen?

We can plot the distribution of application_type, split by the true label. We see that the application type is highly imbalanced, and only a small number of applications are filed jointly.

The datapoint which we were investigating fell into the latter category – and due to the lack of data here, our model is less accurate and cannot be trusted for people filing joint applications.

This example highlights how important explainability for ML models is. Without it, we would have missed this relationship, and would have judged our model on its overall performance. By calculating the Shapley values, we were able to understand how the model uses features in individual cases to make a decision, and we could now take action to build a more robust model and prevent these misclassifications from happening.

In our next blog we will be looking at different approaches to explainability and how they compare.

To find out more about what Faculty can do for you and your organisation, get in touch.

Faculty Science Ltd (“Faculty”, “we”, “us” or “our”) respect the privacy of its users (“User”, “you” or “your”) and is committed to protect the information that you share with us, whether it’s directly, through using our Services such as our Data Science Platform Faculty Platform (“Faculty Platform”), or through a third party (“Third Party” or “Third Parties”). We want to be transparent about our practices regarding the data we may collect when you use our Sites and our Services.

Our Sites

This Privacy Policy covers the information practices of faculty.ai, https://cloud.my.faculty.ai, and subdomains of both. Collectively these are referred to as our “Sites”.

Our Services

This Privacy Policy also covers other ways you might interact with us – such as by attending one of our events, signing up to our mailing list or the use of Faculty Platform – collectively these are referred to as Faculty’s “Services”.

What this policy does not cover

This Policy covers all Services and Sites of Faculty unless another Privacy Policy is displayed. In any such circumstance you will be made fully aware of the existence of another Policy. An example of this is when you sign a contract under which we supply you with our bespoke data science services.

End Users

Our Services are primarily used by Companies and Organisations. Where we are providing Services to you under a Company or Organisation contract (for example where a company holds a licence enabling you to use Faculty Platform), any data held about you personally is controlled by your Company or Organisation. If this applies to you, you can find further information below in the section entitled “Notice to End Users”.

The information we collect

Faculty collects information from individuals who visit our Sites and individuals who register to use the Services, either directly on our Sites or on third party Sites.

Types of Data

We may collect two types of data from our Users:

(1) Non-identifiable and anonymous information (referred to in this Policy as “Non-Personal Data”) where we are not aware of the identity of the User from which we have collected the Non-Personal Data;

(2) Individually identifiable information (referred to as “Personal Data”) where we may be able to identify an individual or the information may be of a private and/or sensitive nature.

Faculty will not request any “Sensitive Personal Data” (that is, information concerning an individual’s racial or ethnic origin, political opinions, religious or similar beliefs, trade union membership (or non-membership), physical or mental health condition, criminal offences or related proceedings, or any other data considered as sensitive under applicable law) unless it is in connection with your employment by Faculty or an application for employment or is related to our bespoke services which are covered by separate Privacy Policies.

As a User you may choose to ask us to process Sensitive Personal Data where you do so we will only use that data as you have requested as explained below (see Data Added or Collected by you).

Data we collect from you

Registration and Contact Information:

When you register to use our Services, or amend your previous registration details, we collect your username, first name, last name, company name, email address and in some circumstances where it is necessary to contact you about the Services, a postal address and phone number (“Registration Information”).

Billing Information

When purchasing Services which require payment, we collect billing information such as billing name, address, credit/debit card information. Sometimes we require some additional information to calculate and verify your bill, such as the number of people in your Company that require licences, your VAT registration number, and your Company registration number (“Billing Information”).

Information you provide through our Support Service

When you request help from us to use our Sites or Services through the Contact Form or Chatbot, you may choose to submit information about your usage of our Services. We will require an email address and name to provide you with assistance, and may ask you to provide further information in order to be able to solve your query (“Support Information”).

Optional Information

Whilst using our Sites and Services, you may provide us with additional information that is not required (“Optional Information”). Such Optional Information might include your job title, survey answers, feedback, or additional information in your support requests. We may ask you for feedback on our Support Service, but such information is optional and you do not have to give it to us. If we ask for this information from you and it is not required for use of our Services, such information will be clearly marked as optional. All such Optional Information shall be treated as Personal Data for the purposes of this policy.

We automatically collect information as you use our Sites and Services about how you interact with us. Such information includes your IP address, the browser you are using, the type of device you are using to connect to us, the links that you click on, and the date and time you interact with us (“Navigational Information”). We use cookies to help us collect Navigational Information. You can find further information about our use of cookies in the section at the end of this document entitled Our Cookie Policy.

Data Added or Collected by you

As a User of our Services, in particular Faculty Platform, you may choose to add / invite other Users to our Services. Where you do so, we will only use that data as you have requested, to invite the User to our Services. Such data will be retained in our system until you remove it and will not be used other than for the purposes specified by you. You may also upload or ask us to collect (via APIs – application program interfaces – or other means) various types of information or data for processing and hosting (“Customer Material”). We will only process such Customer Material for the purposes set out in the Terms of Services.

Third Party Collectors

In some situations we may use a third party (that is, a separate organisation) to register your information so that you can use our Services, for example invitees to our events are asked to register via Eventbrite. You can find out more information about these “Third Parties” and their activities  in the section entitled “Third Party Processing”.

Other Information

If you provide us with any information not covered in the above, we will still use such information in accordance with this policy, or as permitted by you.

How we use the information we collect

We use your Registration Information, Billing Information and Optional Information in order to:

Operate the Service:

To provide customer support

We will require Registration Information and Optional Information in order to provide technical assistance, answer your queries, send you updates on account (for example if your payment is overdue), and to provide other support where it is requested from you.

To improve our Services

We may use Support Information, Optional Information, and Navigational Information to improve delivery of our Services to you. For example to identify common issues and fix them, or to identify bugs. Where we collect such data, such as bugs, your Personal Information will be removed, so we only have statistical information. Where we ask for Optional Information such as User feedback or surveys, such data helps us improve our Services in the future, and is anonymised when stored.

To provide to third party contractors who provide services to Faculty

In some cases we use third party contractors to assist us in providing our Services, for example, we use Stripe to process your payments, and Zendesk to process your Support requests. A list of the third parties we work with is provided in the Third Party Processing section below.

To enforce our policies, or identify criminal behaviour

We may use your Registration Information, Billing Information and Navigational Information to ensure that your use falls within our Acceptable Use Policy and Terms and Conditions, or to identify any cases of fraudulent or criminal activity.

To update you on our Services

We may use your Registration Information to contact you about important updates to the Services for which you are Registered, such as product updates or changes to our Terms and Conditions, Acceptable Use Policy or Privacy Policy. We may from time to time contact you about updates to our Service which we feel you may be relevant to you, where it satisfies a legitimate interest (which is not overridden by your data protection interests) such as user surveys, or similar Services. You can request that we do not send you similar updates at any time.

To send you information you have consented to

Where you have given us your specific consent, we will send you information about our Services in general, such as our newsletter. You may withdraw your consent at anytime by clicking the link in any of the correspondence, or by clicking here.

Legal bases for processing

The legal bases for collecting and using your data vary depending on the way in which you are interacting with our Services. We collect and use your data only where:

• We require it for the provision of the Services, to protect the safety and security of the Services, and without such data we would not be able to provide the Services
• You have given consent for us to use it for specific purposes. Where you have provided consent, you may withdraw it at any time through this link.
• We need to process your data to fulfil a legal obligation (e.g. to report criminal activity)
• It satisfies a legitimate interest (which is not overridden by your data protection interests) such as the provision of updates on our Services. You may object to this use at any time by clicking this link

Sharing with Third Parties

We do not sell, share or transfer your data to Third Parties, except in the following specific situations:

Requested by you, the User

For Collaboration

You may request for us to share your Customer Material with a Third Party for the purposes of collaborating on our Services. An example of this is when you invite a User to collaborate on a Faculty Platform project, they will be sent an invitation by us which includes your user name and the name of your organisation (if appropriate), and if accepted, they will get access to any of your Customer Material that you choose to share with them.

Managed Services

You may request us to share information with Third Parties where you are interacting with our Services as an organisation and wish us to share Customer Material with other people in your organisation. An example might be where you ask us to share training information via our Sites to your employees, or where you ask us to issue licences for Faculty Platform to your employees.

To interact with other Third Party Services

You may request that we link other Third Party Services to your Services with us. An example of this is when you create an API (Application Program Interface) on Faculty Platform. You may be required to include your Registration credentials for such Third Parties in order to operate the API.

Necessary for the Sites or Services

For third party processing

We may share your data with Third Parties where it is necessary for the operation, integration, hosting, or support of our Services.  We ensure that each Third Party has the same stringent confidentiality and security measures as Faculty.

We use the following Third Party processors for the following reasons and copies of their respective Privacy Policies are available if you follow the links provided:

• Active Campaign – for the storage of your Registration Information, and if you have consented, or the purposes of issuing our newsletter. Privacy Policy.
• Eventbrite – Where we monitor the guestlists for our events. Privacy Policy.
• Intercom – The platform for live chat on our website. Privacy Policy.

Where you are accessing our Services under a licence in the name of your Organisation, we may provide your Customer Material and your Registration Information to your Company where they request us to do so.

For legal or vital interest reasons

We may be required to share your Personal Data with a Third Party for a legal reason, for example

• To comply with any applicable law, regulation, legal process or governmental request
• To enforce our agreements such as Terms and Conditions and Acceptable Use Policy
• To protect the security or integrity of our Services
• To protect our Users or the public from harm or from criminal activity
• To respond to an emergency which we believe in good faith requires us to disclose information to assist in preventing bodily harm or death of a User (an example of this might be if you collapse at an event).

Where you have consented

Where you consent for us to share your Data, as for marketing purposes. For example, you may consent to us using a testimonial from you in our marketing material, or to our listing you as one of our customers.

Change in control

We may provide your Personal Data to a Third Party in the event that Faculty enters into discussions that might lead to a change in control, such as a merger, acquisition or purchase, unless this results in any change to this Privacy Policy or would affect confidentiality.

Analysis and to improve our services

We may share aggregate Non-Personal Data publicly or with Third Parties, for example through displaying marketing trends on our Sites, or for a Third Party to analyse usage statistics.

Modification or deletion of your Information

If for any reason you would like to Modify or Delete the Personal Data we hold for you, you can do one of the following:

• If you are a Faculty Platform user, click “My Account”. Please note that if your Organisation has provided a licence for you, certain information (your name, username and email address) can not be modified in this way. In this situation you should contact your Organisation, as Faculty is only the data processor and my need the Organisation’s authorisation to modify or delete your information. Please note that if you remove all of your Registration Information, we will no longer be able to provide you with our Services.
• If you have subscribed to our mailing list, you will see an “Unsubscribe” link in all our emails to unsubscribe or modify your details. If you are unable to access this you can also contact us through our contact page and ask for your details to be removed or changed.
• If you believe you have provided Faculty with your Personal Data through any other form, you can also contact us through our contact page and ask for your details to be removed or changed.
• You can also ask to be removed from our systems by emailing info@faculty.ai.

Please note that if you delete or request deletion of your Personal Data, we may still retain Non-Personal Data for the purposes of operating the Service, for example to provide historical user levels. We will also retain a single copy of your Registration Information to ensure that you are not re-added to our systems.

Data Retention

Faculty will hold your Personal Information as long as it is required for you to enjoy the use of our Services. Upon termination of any of our Services for any reason, we will retain the data mentioned below for the following time periods:

• If you have been on the free trial of Faculty Platform, your Registration Information and Customer Material will be retained for 60 days after the end of your free trial in case you wish to reactivate your account and to avoid any accidental loss of your Customer Material. This period may be extended if you request us to.
• If you have been an licence holder of Faculty Platform, your Registration Information and Customer Material will be retained for 90 days in case you wish to reactivate your account and to avoid any accidental loss of your Customer Material. This period may be extended if you request us to.
• If you are interacting with your Services under a contract with your Company, your Registration Information and Customer Material is owned and controlled by your Company, and the data retention periods of your data will be subject to the retention period of your Account holder.
• Where you have been a paying Customer of Faculty, your Registration Information will be kept for up to 6 years for tax purposes. However any specific Billing information which is no longer required (such as your credit card details) will be deleted from our systems 30 days after any final payment is taken in case any final charges are required.
• Where you have interacted with our Services in any other ways, such as attending an event, your Registration Information will be kept for 1 year after your last contact with the company for Legitimate Interest reasons.

In all cases, you may ask us to remove or modify your data in accordance with the section “Deletion or Modification of Information”, although in some cases this may compromise our ability to deliver our Services.

Where your data is provided to us through a Third Party (e.g. Eventbrite), the same deletion periods will apply as above, but the Third Party may have different policies, and you should use the links provided in “Sharing with Third Parties” and contact those Third Parties directly to ensure deletion of your Data. Where we transfer your data to a Third Party, we will be responsible for the deletion of your data with such Third Parties, as outlined above.

Security and Storage of Information

Faculty takes great care in implementing, enforcing and maintaining security policies to help ensure the security of our Services, Sites and our User’s Personal Data. You can find out more information about our Security procedures here.

Faculty takes steps to ensure as far as possible that it’s staff are honest, reliable and take all due care in the processing, care and handling of all Data.

Faculty limits access to any Personal Data we hold to staff who:

• Appropriately trained on the requirements applicable to the processing, care and handling of Personal Data
• Are under confidentiality obligations
• Are required to access, process and use the data to carry out the various tasks outlined in the section “How we use your data”
• Who required access in order for Faculty to fulfill its obligations under this Privacy Policy, Terms or Service and Acceptable Use Policy

Customer Material in Faculty Platform (with the exception of Customer Material in the form of Registration Information) is hosted on AWS in Ireland which provides advanced security features and is compliant with ISO 27001. All Customer Material is stored with logical separation from information of other customers. Faculty limits access to Customer Material to the following Faculty staff and contractors:

• Where you have requested for us or allowed us to access your account for Support Services
• Where we are providing essential security and service upgrades, and in such cases the staff have been appropriately trained on the requirements applicable to the processing, care and handling of Personal Data, and are under confidentiality obligations.

Faculty shall notify the User without undue delay, in the event that any Personal Data held by Faculty on the User or on behalf of the User is lost, stolen, or where there has been any unauthorised access to the Personal Data which is likely to result in a high risk to the User’s rights or freedoms. Furthermore Faculty undertakes to cooperate with the User in investigating and remedying any such security breach. In any security breach involving Personal Data, Faculty shall immediately take remedial measures, including without limitation, reasonable measures to restore the security of the Personal Data and limit unauthorised or illegal dissemination of the Personal Data or any part thereof. Faculty maintains documentation regarding compliance with the requirements of the law, including but not limited to documentation of any known breaches and holds reasonable insurance policies in connection with data security.

Transfer of Data outside of the EEA

Personal Data submitted may be transferred by us to Third Parties (as set out under the heading “
Sharing with Third Parties”), including service providers that may be situated outside the European Economic Area (EEA) and may be processed by staff operating outside the EEA. Where this is the case we will take reasonable steps to ensure that your privacy rights continue to be protected. In countries where they do not have similar data protection laws to the UK, we will take reasonable steps to ensure that the Third Parties have policies, terms and conditions that provide similar protection to that offered within the EEA as a minimum. By using the Site you agree to this storing, processing and/or transfer.

Customer Data is hosted on AWS in Ireland, and is not transferred outside of the EEA without specific and independent permission.

Faculty does not transfer any personal data outside of any jurisdiction in a manner incompatible with the requirements of applicable law.

Upon termination of any of our Services for any reason, you may request a copy of your Personal Data, which Faculty will provide in a reasonably acceptable format.

Other Information

Notice to End Users

Many of the Services we provide are primarily used by Companies and Organisations. Where we are providing Services to you under a Company or Organisation contract (for example where a company holds a licence for Faculty Platform), any Personal Data held is controlled by your Company or Organisation. Where this is the case, your Personal Data will be subject to the Privacy Policy of your organisation, and questions about your information should be directed to your organisation.

Organisation account holders are able to:

• Access and retain your Registration Information and Customer Material
• Control the interaction of third parties with your Customer Material

Where the Services are not provided under the control of an Organisation, if you register for our Services with an email address owned by an Organisation, that Organisation may assert control over your Registration Information and Customer Material at a later date. You will be notified if this happens.

We use cookies and other tracking products to customise our Services, to allow you to login without re-entering your Registration Information, and to understand how our customers use our Services in order to continuously improve them.

We use them in the following circumstances:

• Where they are necessary for you to be able to enable the Services to to provide the feature you have requested (e.g. to login)
• To customise the functionality where you have selected preferences, for example when you select to turn features off or on
• To collect information on how you interact with our Sites and Services, and how you have come to interact with us. For example we use Google Analytics to understand how you came to our Sites and therefore improve our access in the future.
• We use social media cookies to allow you to follow links on our Sites to our social media accounts, or for you to “like” or “follow” information or articles on our Sites.

Most browsers allow you to opt out of accepting cookies through their settings and will also allow you to delete cookies already stored on your computer, however, blocking or deleting all cookies may have a negative impact on your use of our Services, and might prevent them from working altogether.

You can opt-out of Google Analytics on all websites by following this link.

Children Under 16

Our Services are not directed towards children under the age of 16, and therefore (other than in Customer Material controlled by you) we do not hold any Personal Data relating to Children under 16. If you have reason to believe that we may have been provided with Personal Data on a child under 16, please contact us immediately via our contact form.

Right to Object

You have the right to object to the processing of your Personal data by Faculty:

• Based on legitimate interests
• For Direct marketing
• For the purposes of research and statistics.

If you would like to object to the above, you can contact us via our contact page.

Report a concern

If you have a concern about our use of your Personal Data or our information rights practices please let us know. You also have the right to lodge a complaint with the Information Commissioner’s Office (“ICO”), the UK data protection authority, via this link or by calling 0303 123 1113.

• Providing notice on our website where the changes are any unsubstantial changes and do not fundamentally alter the spirit of this policy;
• Sending an email regarding the changes to the email address that you provided in your Registration Information where the changes are substantial.

The changes will take effect seven (7) days after notice has been provided.

Unless otherwise stated, all changes to this privacy policy are effective as of the stated Last Revised date, and your continued use of the Site and/or Services after the Last Revised date will constitute acceptance of, and agreement to be bound by, those changes.

Contact Information

For any queries or comments on the Policy or its content, or for any other purposes you can contact us by using our contact page or by:

Sending an email to: info@faculty.ai

Writing to: Operations Department

Faculty Science Ltd

54 Welbeck Street

London

W1G 9XS

By telephone on:  +44 (0)203 637 9415

search faculty.ai

It looks like you are using a legacy browser. For the best experience of our website we recommend using Chrome, Safari or Firefox.