Learning Unit 3: Fairness & Non-Discrimination
Fairness is one of the most widely discussed challenges in modern data science, yet it remains difficult to define and even harder to achieve in practice. This unit examines how unequal social structures, model design choices, and data limitations can create or reinforce discriminatory outcomes. It also highlights how bias arises across the data science life cycle and why technical accuracy alone cannot guarantee equitable results.
Through real examples, we will look at different forms of bias and discrimination, how they appear in data-driven systems, and why careful definitions of fairness are essential for responsible decision making.
Exercises
Task 1 - Bias
This learning unit deals with the question of bias in Machine Learning. For further reading, please be referred to the following article: "A Survey on Bias and Fairness in Machine Learning", https://dl.acm.org/doi/10.1145/3457607.
In this learning unit, we will continue our analysis of the CDC Diabetes Health Indicators dataset. We will investigate the possible biases of this dataset, and we will train a diabetes prediction model. You can find more information about it in the UCI ML Library here. In the Resources section, we have also provided a file called "diabetes_012_health_indicators_BRFSS2015.csv", which is one of three csv files in the dataset.
First, let's take loot another at the attributes of our dataset. In the last learning unit, we examined the distributions of the attributes and their meanings. This time, we want to take a closer look at the potential problems that can arise from including certain features.
Task 2 - Fairness Metrics
In this learning unit, we are going to investigate different fairness approaches and techniques to mitigate bias. We will continue using the CDC Diabetes Health Indicators dataset and your predictions from the last Unit.
During the lecture, the following fairness metrics were introduced: Group Fairness, Equalized Odds (Equal error rates), and Test Fairness (Calibration). In this exercise we will go over some metrics (Group Fairness, Predictive Parity (PPV / Precision), False Positive Error Rate Balance (FPR)) again and briefly explain what they exactly capture.
Afterward, you will freely evaluate the dataset and search for (un)fairness. For this purpose, you do not have to implement these metrics on your own, we offer you already-implemented functions. You don't need to understand how they're implemented in detail, but please don't hesitate to look at them or adjust them in the fairness_functions.py file.
