Differential Privacy in Responsible AI

Differential Privacy in Data

One can control the randomness or noise level by adding a “privacy loss” parameter ( Ɛ ) to a dataset, thus maintaining the data privacy.

Differential privacy in data can be implemented:

Locally – Database noise is added before storing the data in the central repository.

Globally - Raw data is stored directly in the central datastore without adding any noise. The noise gets added when a user queries data.

Curator

Add Noise

Raw Data

Add Noise

Data Sources

Datastore

Querier

Figure 1 Local Privacy

Figure 2 Global Privacy

Impact of applying differential privacy to data (statistical noise)

Comparison of applying differential privacy to data

Histogram of Salary Level

2000

True Value DP Value

1500

1000

33.5% 8500-9000 19.6% Less than 8500 12.3% 12000 above 1.5% 11000-12000 4.3% 10500-11000 6.0% 10000-10500 13.1% 9500-10000 10.0% 9000-9500

33% 8500-9000 20% Less than 8500 13% 12000 above 2% 11000-12000 4% 10500-11000 6% 10000-10500 13% 9500-10000 10% 9000-9500

500

2 3

7 8

Salary Category

Figure 3.1

Figure 3.2

Figures 3.1 and 3.2 demonstrate how noised samples differ from the original data. Noised values are generated by different privacy budgets (controlled by the parameter Ɛ ). There are almost no observable deviations between the histograms.

Differential Privacy in ML Algorithms: In this case, whether any individual’s data is included in the actual dataset is not revealed.

ML models can be made differentially private by the following means:

BY ADDING NOISE TO THE WEIGHTS OF THE OUTPUT OF MODEL’S OBJECTIVE FUNCTION

BY ADDING NOISE TO MODEL’S OBJECTIVE FUNCTION

BY ADDING NOISE TO MODEL OUTPUT

Made with FlippingBook - PDF hosting