Loading Now

The Deloitte AI debacle in Australia shows what can go wrong if AI is adopted blindly

The Deloitte AI debacle in Australia shows what can go wrong if AI is adopted blindly

The Deloitte AI debacle in Australia shows what can go wrong if AI is adopted blindly


Deloitte was commissioned by Australia’s Department of Employment and Workplace Relations (DEWR) last December to carry out an “independent assurance review” of its ‘targeted compliance framework,’ an automated system that penalizes jobseekers who failed to meet their ‘mutual welfare’ obligations.

The contract was worth around US $290,000. A researcher from Sydney University, Chris Rudge, observed that Deloitte’s 237-page report, which was submitted in July, was riddled with references to non-existent sources and experts.

After examining it, Deloitte confirmed that some of its footnotes and references were inaccurate and acknowledged using Azure OpenAI GPT-4o, a generative AI system, to produce parts of the report. The fabricated quotes and references were snipped out and an amended report was submitted in September. Deloitte also agreed to partially refund its consulting charge.

Fabricated references by GenAI arise from the hallucinations that AI tools are found to suffer. Examples abound of AI falsehoods.

Shortly after ChatGPT’s launch, Samantha Delouya of Business Insider asked it to rewrite an article about a Jeep factory left idle by rising production costs. ChatGPT produced an almost flawless piece, but included fictitious statements from Carlos Tavares, the CEO of Jeep-maker Stellantis. These seemed plausible, akin to what a CEO might say when confronted with the difficult task of terminating employees. Yet, they were entirely fictional.

In early 2023, ChatGPT was asked by two journalists of pollster FiveThirtyEight to write a piece on public perceptions of AI chatbots. The chatbot cited a 2021 Pew survey, saying that 71% of Americans believe that the increased capability and sophistication of computers and robotics will benefit society, typically, while only 27% think otherwise.

However, the Pew survey it cited didn’t exist. The FiveThirtyEight crew found a 2021 Pew survey with the opposite conclusion: only 18% reported feeling more excited than worried, 37% said they felt more concerned than excited, and 45% said they felt equally excited and concerned.

AI-fed media falsehoods have been a scandal in the US, but hallucinations have impacted other fields too. Two New York lawyers were sanctioned in 2023 for submitting briefs with ChatGPT-generated non-existent cases.

This June, a senior UK judge warned lawyers they could face criminal charges if they use fictitious AI-generated cases for arguments. Air Canada was held accountable in 2024 after a passenger received misleading policy advice from its website chatbot. Concerns about AI fabrication have also led to academic publishers retracting thousands of papers.

What leads AI tools to hallucinate? Data is AI’s lifeblood. It learns, adapts and makes decisions based on data. A frequently stated estimate is that GPT-4 was trained with 7.5 trillion words, for example. To train language models, AI developers use high-quality data from scholarly publications, books, news stories, Wikipedia and filtered internet content.

Still, this is not enough. Blog entries, social media posts and website comments comprise the remaining data, which is of low quality and may be biased or prejudiced.

Also, remember that GenAI models are ‘stochastic parrots’ and not truth-identifying machines. They only make probabilistic assessments of what answer to give without any sentient grasp of what their statements actually mean. Thus, a bot’s ‘reasoning’ may not always be correct.

So AI-generated untruths may be caused by more than just inaccurate inputs; even if the model is trained on truthful material, GenAI output can still be untruthful.

Why is the Deloitte incident significant amid a deluge of AI hallucination cases? It had an impact on a national government: Australia’s. While Canberra’s DEWR emphasized that the fundamental study of the country’s welfare system was not affected and that Deloitte’s conclusions and suggestions were still relevant, it compelled Deloitte to publicly admit the use of GenAI for a paid government report. The Australian government has since hinted that more stringent AI-usage provisions might now be included in its future consulting contracts.

The Deloitte AI debacle should serve as a warning to professional services across every economy. Despite the clear temptation to use AI for quick drafts and thus pack in more work with fewer human resources, the rush to automate reports and the like may harm their reputations.

With better oversight and accountability, Deloitte’s report could have met expectations. But somehow, a report with unvetted AI inputs was sent to a government whose policy choices impact billions of welfare payments and millions of people.

The allure of AI is irresistible, but we must never forget that it must amplify human expertise to be of value. The use of AI as a substitute for human intelligence, rather than as a collaborator, is a dangerous path. Humans must stay accountable for AI-aided work.

The author is professor of statistics, Indian Statistical Institute, Kolkata.

Post Comment