Differential Privacy in Data
One can control the randomness or noise level by adding a “privacy loss” parameter ( Ɛ ) to a dataset, thus maintaining the data privacy.
Differential privacy in data can be implemented:
Locally – Database noise is added before storing the data in the central repository.
Globally - Raw data is stored directly in the central datastore without adding any noise. The noise gets added when a user queries data.
Curator
Add Noise
Raw Data
Add Noise
Data Sources
Data Sources
Datastore
Datastore
Querier
Querier
Figure 1 Local Privacy
Figure 2 Global Privacy
Impact of applying differential privacy to data (statistical noise)
Comparison of applying differential privacy to data
Histogram of Salary Level
2000
True Value DP Value
1500
1000
33.5% 8500-9000 19.6% Less than 8500 12.3% 12000 above 1.5% 11000-12000 4.3% 10500-11000 6.0% 10000-10500 13.1% 9500-10000 10.0% 9000-9500
33% 8500-9000 20% Less than 8500 13% 12000 above 2% 11000-12000 4% 10500-11000 6% 10000-10500 13% 9500-10000 10% 9000-9500
500
0
0
1
2 3
4
5
6
7 8
9
Salary Category
Figure 3.1
Figure 3.2
Figures 3.1 and 3.2 demonstrate how noised samples differ from the original data. Noised values are generated by different privacy budgets (controlled by the parameter Ɛ ). There are almost no observable deviations between the histograms.
Differential Privacy in ML Algorithms: In this case, whether any individual’s data is included in the actual dataset is not revealed.
ML models can be made differentially private by the following means:
BY ADDING NOISE TO THE WEIGHTS OF THE OUTPUT OF MODEL’S OBJECTIVE FUNCTION
BY ADDING NOISE TO MODEL’S OBJECTIVE FUNCTION
BY ADDING NOISE TO MODEL OUTPUT
3
© 2023 Fractal Analytics Inc. All rights reserved
Made with FlippingBook - PDF hosting