Managing risks through better design principles
Organisations need to tackle the risks associated with adopting cognitive technology so that they can benefit from the potential efficiencies and insights that it can offer.
ICAEW's Tech Faculty will explore three key complementary solutions to these risks. First, in this section, we consider better design principles, followed by controls over operation and external assurance.
The table below provides an overview of how these responses map to the key risks identified in the previous section.
|Inexplicability||Making explainable models|
|Data protection||Data de-identification|
|Bias||Avoidance of bias; data de-identification; increasing accuracy|
|Poor adaptiveness||Increasing accuracy; human in the loop; collars and kill switches; ongoing review|
|Automation risks||Project inventory; human in the loop|
Successful robotic process automation, analytics led by machine learning, or AI, rely on business practices that consider the likely pitfalls and proactively design projects that are resistant to them. Creating and promoting standards for how these tools should be developed and used will help to reduce risk and make sure that benefits, such as reducing effort and increasing productivity, are achieved. This will entail a combination of technical specifications – for example, how much data is required or which development tools are preferred – as well as other considerations such as sources of bias in training data.
This is especially important as the cost of doing simple robotic process automation or machine learning decreases and more end-user approaches are developed.
As end-user packages for robotic process automation or machine learning become more accessible, a common issue for larger organisations is knowing who in the business is using cognitive technologies. Proper inventory of projects can be difficult, especially with inconsistent definitions of what counts as AI or automation. Having clear internal communication about the processes required to start developing and implementing such tools will help prevent technically capable staff from starting their own projects without proper care and oversight.
Inventory should cover not only a listing of cognitive automation projects, but also provide an indication of their current status, identifying project owners and key stakeholders, risk levels and associated controls.
Avoidance of bias
For an approach based on learning from data, it is vital to consider the impact of omissions, errors and biases encoded in that data early in the process. According to Informatics professor Geoffrey Bowker: “Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care.”
This serves to remind us that all data is shaped and altered by the way it is collected, as well as the mindset of those collecting it and deciding which parts are true and important and which are noise or unnecessary. All data contains the thumbprints of its makers and considering how to handle the effects of these imprints is a vital part of technological development.
For example, a Boston city app called Street Bump was developed to passively collect users’ phone accelerometer and location data while driving and use this to detect where potholes and other road issues were causing bumps. The city realised while designing the app that areas with older or poorer residents would have lower levels of smartphone ownership and therefore would be less represented in the dataset. They adjusted for this expected bias in the dataset by building up a baseline, unbiased dataset from their own road engineers before rolling out the app to the wider public.
This approach is exemplary for expected biases in data, but many unknown or unexpected ones might exist that an organisation would therefore not accommodate.
If personal data is in play, then data protection is a concern. The simplest approach is to anonymise data so that it is no longer personal data, for example by collating responses into groups or removing identifying markers such as names or addresses. This can also help to avoid biases in machine learning by denying the algorithm access to the sensitive fields. However, great care should be taken that the system does not become biased anyway if it latches on to a proxy for these fields, for example, using postcode as an approximant for race.
Likewise, if the remaining data is detailed enough, it may be possible to re-identify data subjects, which would re-introduce data protection risks. This could happen if overly detailed data allows identities to be deduced or, if data is collated, members of unusual demographic combinations could be left in a ‘class of one’.
Machine learning is a complex process involving the generation of large numbers of iterations of a model and measuring their relative success at achieving a goal. The initial model(s) chosen, the way that errors are measured and many other factors can dramatically alter the final model produced. An entity should understand the impact that these decisions have and be able to test the impact by, for example, repeating the training process with different assumptions to measure the impact.
Depending on the development process used, existing expertise within the organisation may or may not be required. Robotic process automation would require knowledge of existing business processes, whereas machine learning would not. However, consulting with those currently in charge of the processes being automated can reveal gaps in the assumptions of the model’s designers.
Even if the driver to automate is to reduce costs, care should be taken not to lose institutional knowledge altogether by working to retain key staff. This might be done by keeping them on as expert reviewers of the automated output or transferring their skills to other parts of the organisation.
Making explainable models
Machine learning approaches can produce models that make decisions through complex, opaque processes, which are not readily understandable by a human reviewer. This is sometimes termed a black box model. Black box models can make it difficult to distinguish unexpected insights from errors in reasoning, challenging to justify why a particular decision was recommended and limit learning lessons from the models’ experience. However, under the GDPR, data subjects have a right to know when an entirely automated decision has been made about them, and to have an explanation of how that decision was made.
There is significant research on how to make models optimised by machine learning explainable. This might involve training another AI to test them or using complex mathematical techniques to measure the model’s sensitivity to a range of inputs and combinations of inputs. While this won’t explain perfectly how a model works, gaining a sense of the model’s priorities can help a user understand which features of the input data are the most salient, and shed light on the decision-making process or the causes of any unusual behaviour.
Next, we look at implementing robust controls.