Digital Innovation & Entrepreneurship
Some may take matters into their own hands and run their own surveys, but they are not data market experts, might not do it well, and may well end up paying far more than they need to. Leaving individual companies to work out what data they need and how to collect it is not an efficient answer to the problem. Others may turn to consent-based models which provide compensation to those sharing their data. There are two main models in use, but each have their shortcomings. ‘Fixed compensation’ schemes offer a cheaper approach, as they provide uniform payouts to subjects. However, the relatively low level of compensation on offer means they tend not to capture more privacy-sensitive users. This leaves companies back at square one with biased data that only reflects part of the customer base. To get the best insights, companies need data that is representative, putting as much emphasis on quality as quantity. In their search for more representative data, companies may turn to the ‘centralised optimisation model’, which tries to customise the level of compensation offered to subjects based on the level of their sensitivity to privacy issues. This also has its flaws, as it can encourage consumers to inflate “But while data is widely available, it is often unreliable and biased”
the amount of compensation they demand, creating an inefficient market in which companies overpay for data. A better option is to develop a system that encourages subjects to get involved by compensating them at an acceptable level but does not encourage them to inflate their demands. This enables the collection of truly representative data at a fair price. The mechanism that my colleagues and I have developed achieves this – providing for the transparent and consensual collection and trading of data, compensating subjects sufficiently to encourage them to participate, while remaining affordable for companies. Under our model, once a request for data has been made to the platform, relevant subjects are sorted into pairs who look identical in terms of their data, but whose privacy concerns are different. This process is called Random Sampling of Rolling Pairs. It then compares the price they would demand to share their data under a system akin to a Vickrey price auction (also known as a second-price, sealed-bid auction). In our system, two parties privately disclose what price they demand for their data. The lower bidder is then selected, but they are paid the higher price that was demanded by the other, losing bidder. This process can be repeated until there is a large enough pool of subjects to satisfy the company seeking information. The structure of this mechanism removes the incentive for people to inflate the price of their privacy constraints and ensures that people are compensated at a near-optimal cost from the data buyer’s perspective. At the same time, the data collected is unbiased and reliable.
This strikes a far better balance between cost, on the one hand, and the ability to collect accurate data that is in full compliance with GDPR and other regulations, on the other. We have tested the model in simulations using real- world data, showing that it can be done in practice. This matters for individual companies, but it is also important for the future of AI systems that require high-quality data to become more reliable and to serve the best interests of businesses and consumers. At the same time, regulators need to do more to develop and enforce appropriate regulations on data sellers and resellers to ensure there is a fair market mechanism. By doing so, they can ensure that data markets become more efficient and more reliable. That should ultimately deliver better results for the companies that rely on AI and for their end consumers.
Explore more WBS research on Digital Innovation and Entrepreneurship.
Sustainable Development Goals
wbs.ac.uk | Warwick Business School
39
Made with FlippingBook Learn more on our blog