A recent look at how Moore Kingston Smith built an artificial intelligence (AI) tool to improve its own workflows explained that the firm developed its system on top of an AI foundation model. Broadly defined as large-scale, general-purpose platforms trained on vast datasets, AI foundation models can be customised to fit numerous organisational use cases.
With popular examples including GPT-4, Claude and Gemini, foundation models are being enthusiastically adopted by corporates to underpin internal and client-facing AI solutions.
That growing reliance means that companies will be increasingly required to factor into their assurance work not just the bespoke AI tools they create for specific purposes, but the foundation models on which they build them.
It is a type of assurance with its own, unique challenges. They were explored in a special panel discussion at ICAEW’s first-ever AI Assurance Conference.
Operational risks
As Founder and CEO of Stacks, Albert Malikov helps organisations deploy AI agents designed to assist with tasks such as financial close management, reconciliation, analysis and reporting.
He explained that his company works closely with enterprises on both development – using foundation models to create agents – and business use, guiding implementation. “We work with several foundation models and traceability of output is incredibly important,” he said. “On the business-use side, what’s really vital for our customers is to have full explainability of the results and how agents came to their decisions.”
Grant Thornton Audit, Data, Ethics and Governance Leader Franki Hackett cited the importance of evaluating key areas such as accuracy, reliability and robustness. However, through an economic lens, there is a much wider issue for assurance professionals to think about. “We may begin to see vendor over-reliance,” she said. “And we may get to a point where the market is squeezed by just a few foundation players working at the top. In audit, we’ll all be familiar with the potential operational and independence risks from a lack of competition.”
In the assessment of Christopher Thomas, Research Associate in the Alan Turing Institute’s Public Policy Programme, the huge complexity of foundation models makes traceability and explainability particularly difficult to evaluate. “Looking into a model to understand why it has generated a certain piece of content or made a particular decision is a real challenge for accountability,” he said. “Then there are novel risks around content generation. For example, hallucinations, which affect accuracy, but also copyright concerns. Where is the training data coming from and what are the rights and privacy implications?”
Thomas also highlighted safety and security risks – particularly around the potential for bad actors to use foundation models to launch targeted streams of malicious or hateful content, or cyber attacks against critical parts of organisations’ IT systems. “That’s a major area that the government is currently looking at,” he said.
Hackett warned professionals against relying overtly on end-users’ ties to global standards, or the brand value of their vendors, to provide blanket assurance. “A user may believe that if they’re signed up to ISO 27001 and their vendor is Microsoft, they’re all good. But it’s about the reality of how they’re implementing the model in their own environment. So, there’s more for us to do here, in terms of applying professional scepticism.”
Thomas agreed. “Traditionally, organisations have used local solutions, where they work directly with a provider,” he said. “However, the scope to interrogate foundation models is significantly limited, so the need for trust is much higher. An added complication is that these systems are accessed across multiple jurisdictions, creating legal uncertainties.”
Critical lens
As speakers turned to what sort of assurance practices are already emerging for foundation models, Hackett said that so many voluntary frameworks are springing up that “it’s hard not to find yourself cherry picking”. As such, professionals must challenge their own bias.
“You may be considering a framework that looks fairly easy, but may have certain blind spots. And you may be measuring it against one that looks really hard, but may not apply to your client base. Perhaps it’s been developed for military technology and is both nowhere near risk sensitive enough in your specific use areas, but incredibly over-sensitive in others. So, I urge everyone to take a principles-based approach. Be critical about what’s out there. The Turing Institute has lots of resources and consultancy firms can certainly provide help.”
For Thomas, the process of standardising assurance for foundation models is being hampered by a lack of joined-up thinking between the AI safety community and global standard-setting organisations (SSOs).
“For a decade, SSOs have been developing standards for trustworthy AI and best data practices, and have engaged with downstream developers and adopters,” he said. “But they haven’t really engaged with foundation model governance.”
Meanwhile, national AI safety institutes have been working on the frontier of foundation model evaluation. “We must think about how to feed the safety bodies’ findings into the SSOs’ activities to support the distribution of new assurance practices,” he said. “In parallel, we must feed the SSOs’ decade of work on functional safety and risk management into the safety community.”
Malikov suggested that, even amid the rapidly increasing sophistication of foundation models and the use cases they are applied to, developers could take advantage of existing tools to help with areas such as traceability. “For example, look at LangChain,” he said. “That enables us to review a whole chain of an end-user’s interactions with a particular foundation model.”
Crucially, Hackett and Thomas pointed out, professionals will also need to address natural overlaps between AI and assurance around environmental, social and governance topics.
“Looking at the energy usage associated with foundation models, there are clear concerns about environmental sustainability,” Hackett said. “That will increasingly pose reputational risks.”
Thomas added: “We must also factor in labour impacts on communities that mine rare-earth minerals for use in AI-related hardware. Considering negative externalities is really important when we’re making choices around our use of foundation models.”