self-automated shows that organisations risk hollowing out the expertise that keeps them competitive.
accepting its original output, rather than revising it. “That makes it much harder to detect errors, especially when working on complex subjects under time pressure.” How can we use generative AI chatbots effectively without them persuading us? The first step is to be aware of this new-found bias and recognise when it occurs. View AI outputs as a ‘draft hypothesis’ rather than a reliable final answer. Nick Chater, Professor of Behavioural Science at WBS, believes our interactions with other people provide a good template. “Human interaction doesn’t work because we take people’s justifications at face value,” he said. “It works because we cross-examine those justifications and check for consistency over time.” Using this approach, AI outputs can be evaluated through: Counterfactual probing, such as asking: “What would you say if this assumption changed?” Checking consistency and coherence across extended or related interactions. Cross-examination, such as asking: “Yesterday you said the opposite. If you believe X, how can you now be telling me Y?” 3 . The jagged frontier. Another challenge is understanding when using AI is more of a help or a hindrance. Professor Lifshitz and her colleagues asked 758 consultants to complete either a strategy-focused task or a creative task. Some were allowed to use ChatGPT, others were not. Those using AI completed both tasks more quickly, but the quality of their work varied. On the creative task, those using ChatGPT performed 40 per cent better than
those who did not. However, those using it for the strategic task were 20 per cent less likely to be right. Despite this, they still sounded more convincing. Professor Lifshitz said: “One of the main problems is that it is still very hard to know where the limits of AI’s abilities lie. “Within that jagged frontier, AI can produce high-quality results. But when tasks lie beyond that frontier, you are more likely to make mistakes and the results can seem quite plausible. “This becomes harder to navigate as those boundaries are not stable, they are constantly moving as the technology evolves. Moreover, real- world workflows often combine both types of tasks.” While the technology is still evolving, the best way for companies to use AI effectively is for managers to continually experiment with it in a systematic and strategic way before scaling. Companies need to be diligent in mapping workflows and testing each sub-task and its interdependencies. Only then can they make their decision on full workflows. 4 . The creativity drain. Even when algorithms can add value, business leaders need to recognise the risks involved. This includes AI’s potential to suppress creativity. Search engines and LLMs are designed to serve up the most popular answers. However, Professor Lifshitz warns that this can create ‘ideation bubbles’. When it comes to marketing, these similar ideas can lead to homogeneous messages. Human creativity may be less consistent in quality, but it is more likely to produce ideas that stand out. Similarly, innovation is more likely to happen when people are
“Executives need to abandon the notion that simply having a ‘human in the loop’ is sufficient. “Companies who are serious about using AI effectively need training programmes and continuous coaching for employees on how to operate as centaurs and cyborgs, when to use each approach, and when more automation may be justified for simple, low-risk tasks.” 2 . Persuasion bombing. Another problem with human oversight is the difficulty of inspecting how AI chatbots, known as Large Language Models (LLMs), arrive at an answer. During their experiment, the researchers identified a troubling new LLM bias. As one senior strategy consultant was reviewing a strategy recommendation made by AI, and the market analysis it was based on, she noticed the numbers looked wrong. When she asked the AI tool to reanalyse the data, it doubled down on its original analysis. She then pointed out a specific flaw she had spotted. The chatbot immediately changed tack. It acknowledged that she was right, apologised, and even complemented her “sharp eye for detail”. It then provided an impenetrable wall of information which supported its original recommendation. This behaviour was typical of a wider pattern the researchers called ‘persuasion bombing’, which became stronger as humans tried to validate AI responses. Steven Randazzo, Co-Chair of the AI Innovation Network at WBS, said: “When challenged, the LLM tried to push consultants towards
Warwick Business School | wbs.ac.uk
32
Made with FlippingBook Learn more on our blog