###### Tech Blog

# How to repair an unfair classifier part 2: Achieving separation fairness

At Faculty, we put a lot of research effort into AI safety, of which

fairnessis a key component. In the first part of this blog, I defined 3 different measures of observational fairness:independence, separationandsufficiency. I also showed a way of post-processing the model scores of a classifier to achieve independence.

This time, I’ll describe a post-processing method that we can use to achieve separation. I will first explain the maths behind the algorithm, and then I will take you through a Python example to see the maths at work.

###### Notation

As a reminder, separation fairness means that the probability distribution p_{A, R, Y}(a, r, y) satisfies

p_{R|Y, A}(r|y, a) = p_{R|Y}(r|y),where A, R and Y are the random variables corresponding to the *protected attribute, model score* and *true label* respectively. In other words, it is ok if the model scores depend on the protected attribute, but this should only happen through the true label Y.

The graphical model for separation is:

As in the first part of this blog, I will restrict the discussion to the case of binary Y\in\{0, 1\}. In this case, separation means that the classifier will have identical ROC curves for each value of the protected attribute. I will denote such ROC curves by

C_a: \left[0,1\right] \rightarrow \left[0,1\right] \times \left[0,1\right]:

C_a(t) = (\mathrm{FPR}_a(t), \, \mathrm{TPR}_a(t))

where the (A-conditional) false positive rate (FPR) and true positive rate (TPR) are \mathbb{P}(R > t \mid y = 0, A = a) and \mathbb{P}(R > t \mid y = 1, A = a) respectively. The variable t is the threshold used to make predictions: our classifier predicts y_{\text{pred}} = 1 whenever R>t, and y_{\text{pred}} = 0 otherwise.

###### Algorithm

The problem of separation gets a lot simpler if we are only interested in the binary decision, and not the underlying scores. In this case, the problem of achieving separation fairness is reduced to achieving the same TPR and FPR for each protected group. The algorithm to achieve this was introduced by Hardt, Price and Srebro in this paper. It goes as follows:

- Find the ROC curves C_a for each protected group a \in A.
- Find the minimal ROC curve (see image below).
- Find the best combination of TPR and FPR that lies on the minimal ROC curve. Call this point X.
- In each group a \in A promote the threshold to a random variable, T_a, such that \langle C_a(t) \rangle_{T_a} = X.
- When making a prediction for a data point in the group a, set the threshold by taking a sample from the distribution for T_a.

The following image illustrates the minimal ROC curve:

*Fig. 1: Minimal ROC curve obtained from the ROC curves of two separate groups.*

Some additional things to note here:

• The meaning of *best combination of TPR and FPR* is context-dependent. In practice, there will typically exist an external criterion that allows us to select the best rate.

• Promoting the threshold to a random variable works because, on sufficient data, \text{FPR}(t) becomes \langle \text{FPR}(t) \rangle_T. The same holds for the true positive rate.

• The simplest approach is to promote the threshold for each group to a categorical random variable that can take only two values, t_0 and t_1, with probabilities 1 - p and p respectively. Then, to achieve separation fairness we need to solve the equation

X = C_a(t_0) + p \left( C_a(t_1) - C_a(t_0) \right)

for t_0, t_1 and p, in each group a \in A.

###### Example

It is quite difficult to write a piece of code that is robust enough to fix any classifier trained on any source of data. Instead, I’ve written a minimal example that works only for the example in this blog. You can get the code from this repo, where you will also find a Jupyter notebook with the same example and the script included here. I’ve also created a synthetic data set for this example. You can get hold of the data here.

In this example, we will incrementally write a Python script to:

- Load the data and train an out-of-the-box logistic regression.
- Show that the predictions made by the logistic regression are not separation-fair.
- Repair the logistic regression using the implementation provided in the repo.
- Show that the repaired predictions are separation-fair on the test set.

**Out-of-the-box logistic regression**

Firstly, we need to load the data, train the model, and analyse the fairness of the model, corresponding to steps 1 and 2 above. Here is the content of *example.py* for these steps:

```
# example.py
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd
# utils.py and separation_mvp.py are in the repo
from utils import classification_report
from separation_mvp import SeparatedClassifier
url = "https://raw.githubusercontent.com/omarfsosa/datasets/master/fairness_synthetic_data.csv"
df = pd.read_csv(url) # (1)
X_train, X_test, y_train, y_test, A_train, A_test = train_test_split(
df.drop(columns="y"),
df["y"],
df["A"], # (2)
test_size=.6, # (3)
random_state=42,
)
clf = LogisticRegression(solver="lbfgs")
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred, A_test)) # (4)
```

Notes (corresponding to numbered comments in the example above):

- The data (shown below) consists of a binary target
`y`and four features, one of which is the protected attribute`A`. The protected attribute takes only 2 values, 0 or 1. I’ve intentionally created the data so that it is biased: P(y = 1 \mid A = 1) \approx 0.39 while P(y = 1 \mid A = 0) \approx 0.25 . This is what the output of`df.head()`looks like:y A X1 X2 X3

0 0.0 0 1.0 0.0 0.750524

1 0.0 0 0.0 1.0 0.550230

2 1.0 0 1.0 1.0 0.672612

3 0.0 0 1.0 0.0 0.329655

4 0.0 0 1.0 0.0 0.849663 - To fix the classifier later on, we are going to need the group labels for the train and test sets, so we keep track of these in our split.
- We need a large test set because to measure observational fairness we have to approximate expectations with the sample average.
- The
`classification_report`shows the FPR and TPR broken down by group.

Running the above script prints out:

A TPR FPR ---------------- 0 0.41 0.10 1 0.88 0.11 All 0.69 0.11

The performance in the FPR is similar for both groups, but the TPR is very different. The plot below shows the full ROC curves (code not included):

Note how the ROC curve for group `A=1` lies entirely above the one for group `A=0`. Therefore, in this case, the minimal ROC curve happens to be the same as the ROC curve for the group `A=0`. For this example, I’ve also assumed that the optimal rate for group A is at the point (0.26, 0.84) (in practice, the optimal point will be decided by your specific case at hand, depending on your trade-off between false positives and negatives). The next section will show you how to achieve the same classification performance in both groups simultaneously.

**Repairing the logistic regression**

To use the `SeparatedClassifier`, we need access to the scores R predicted by the original logistic regression in both the training and test sets. For the implementation in this example, we also need to provide the desired TPR and FPR ourselves. The following lines continue to build on the previous script, showing how `SeparatedClassifier` is used.

```
# example.py continued
R_train = clf.predict_proba(X_train)[:, 1]
R_test = clf.predict_proba(X_test)[:, 1]
goal_tpr, goal_fpr = 0.84, 0.26
fair_clf = SeparatedClassifier(y_train, R_train, A_train)
fair_clf.fit(goal_fpr, goal_tpr)
for k, v in fair_clf.randomized_thresholds.items():
print(f"Group {k}: t0={v[0]:.2f}, t1={v[1]:.2f}, p={v[2]:.2f}")
```

This prints out:

Group 0: t0=0.22, t1=0.22, p=0.50 Group 1: t0=0.03, t1=0.68, p=0.62

The output shows the threshold and probabilities that will be used by the `SeparatedClassifier` to make predictions. Note how, for the group `A=0`, `t1` is equal to `t2`. This is expected, because the rate that we are trying to achieve is already achievable in this group without the need for randomisation. The values found for `A=1,` on the other hand, mean that 62% of the time, whenever we encounter a score r belonging to this group, we are going to predict y=1 if r > 0.68 and y=0 otherwise. The rest of the time we are going to predict y=1 if r > 0.03 and y=0 otherwise. Finally, we see how this works on the test set:

```
# example.py continued
y_pred_fair = fair_clf.fair_predict(R_test, A_test)
print(classification_report(y_test, y_pred_fair, A_test))
```

This prints out:

A TPR FPR ---------------- 0 0.84 0.26 1 0.84 0.26 All 0.84 0.26

(results may vary slightly due to the randomisation)

###### Closing remarks

We’ve seen that a robust `SeparatedClassifier` might be difficult to code, but achieving separation via post-processing is actually very simple. A problem with the algorithm used here is that it deliberately damages the performance of the original classifier in order to achieve fairness. If one wishes to reduce this impact in performance, one has to collect more data about the disadvantaged groups. Depending on the problem, however, collecting more data is not always possible or a good thing to do. A second issue with this algorithm is the stochastic nature of the predictions: running `fair_clf.fair_predict(R_test, A_test)` will give similar but different predictions every time the method is called. There are not many circumstances in which this behaviour is acceptable.

In any case, post-processing should be a last resort, only to be used if pre-processing or training-time fairness could not be implemented.