Low-code analytics is increasingly making the complex art of data science more accessible to business users. To explore how organisations can get started with adopting a low-code solution, ICAEW collaborated with The Operational Research Society to host a webinar with Jamie Crossman-Smith, Managing Director in Grant Thornton's UK Digital Hub, and Kilian Thiel, Head of Strategic Partnerships at KNIME, to share GT's journey with KNIME. They explored the challenges and benefits they have experienced so far at GT, and how they have fostered greater engagement with analytics across the organisation by using low-code technology.
KNIME offers a complete platform for end-to-end data science, from creating analytic models, to deploying them and sharing insights within the organisation, through to data apps and services. In this article, Kilian and Ian return to address some of the questions from the webinar on this tool and adoption of low-code analytics technology.
Which service areas tend to use KNIME?
KNIME – and other low-code tools like it – is used across industries and departments from financial services, over manufacturing, to automotive, health care, pharma, and many others. KNIME can be used wherever there is data that is generated, stored, and needs to be analysed and processed. Finance teams are often amongst the largest users of data in organisations, and so are commonly where the greatest opportunity lies.
How easy is it to 'sell' the idea of open-source coded applications to stakeholders, especially with risks around the longevity and stability of open-source platforms and the communities that drive them?
It's worth noting that not all open-source platforms are run in the same way. Some are completely crowdsourced with no clear ownership, and this can lead to issues with software getting updated in ways that don't always benefit all users, or 'forks' in development where competing versions emerge. However, most are still operated under license and the open-source element is more about how users support the development of the software for the greater good of the user base, with the copyright holder still retaining ultimate control over releases and development decisions. Therefore, an understanding of what 'open-source' actually means is often key to getting stakeholders on board. As was touched on during the webinar, the cost advantages of open-source can very quickly influence stakeholders, as can the knowledge that a critical mass of global users can accelerate innovation and quality in the evolution of the software, rather than being reliant on a small 'closed shop' team of developers.
KNIME very much follows a licensed model and in fact has both the main open-source Analytics Platform, and more enterprise-level capabilities through its Hub solutions. KNIME usually aligns their development road map with customers, partners, and users, and has an active community of people capable of providing necessary support.
How do clients get comfortable with (a) correlations to make decisions with analytics (b) ethical transparency of ML tools and training data uses (c) impacts of open-source updates for existing code integrity?
Getting clients comfortable with the use of analytics can be a long process! As with all things, there will be those willing to embrace new technology, and those more hesitant. Introducing analytical capabilities to more amenable clients first will help to ensure that you get good feedback on the experience from their perspective and can allow refinement of approach before it is introduced to those clients who are a little more reluctant to embrace change.
As for the use of ML tools, transparency is often crucial to embracing of solutions. Being able to explain what was done, how it was done and why you are comfortable that it was done correctly will give clients the assurance that you are using the technology appropriately. Ensuring that, for ML-based solutions, you are being clear with clients about how their data will be used (and that it is fully anonymised before being fed into an ML system for training), and have the appropriate clauses in place in engagement contracts to permit the use of that data, will make a big difference. That being said, ultimately most clients don't necessarily care much for how the answer was reached, rather, just want to be sure that it is the correct answer – using analytics can be a very powerful tool to drive much more accurate decision making and audit interrogation.
Similar principles apply with open-source technology, and as per the answer to the previous question, explaining what is actually meant by 'open-source' and that it isn't a wild west of software development will address much of the related nervousness (perhaps drawing on popular examples such as the Chrome and Edge web browsers which are built on the open-source Chromium project).
From an audit perspective, if reviewers want to review the logic being applied from the input data to the output data, how can they do this and retain the evidence on the audit file?
Workflow tools such as KNIME or Alteryx carry all of the relevant logic in metadata – the operations used, in which order, the changes to the data structure and any calculations or other logic from node to node. As a result, this metadata can be extracted and interrogated. Indeed, there are known examples of tools that organisations have developed to automatically analyse and summarise workflow metadata and feed this directly into audit documentation.
The beauty, again, of open-source software, is that anyone can install it to aid the review process, and while some skills and knowledge are required to appropriately review workflows (as should be the case for any technical review of course!), the logic is often much easier to follow than it might be in some Excel VBA or Python scripts. This is especially true if extended functionality is used to annotate workflows, aiding the work of the reviewer further.
In the context of regulated audit, having some robust controls in place around the use and review of analytics workflows helps give auditors more confidence to adopt them safely. These controls can include (but may not be limited to) restricting access to the software until appropriate training has been completed, ensuring a rigorous review process with the right level of technical competence, and dedicated templates that support both the development of robust workflows, as well as stepping auditors through the process of documenting the analytic in an unambiguous way.
Tell us more about the functionality. Can you change the levels of granularity that you see? I have found this to be important when relating it to domain experts who don't know analytics.
In KNIME, you can encapsulate parts of your workflows into "components" to hide complexity and share them for reuse with domain experts and thus change the level of granularity in your workflow. With that you can hide complex workflows within just one or few components that are easy to reuse and understand by everyone. Alteryx has similar functionality with "containers" and "macros" that allow more complex elements to be packaged into self-contained units.
How can we version control it?
Version controlling is no different to when using any other file or script. Repositories can be used to manage versions, as well as simpler controls in the filename or annotations within the workflows to identify when, and by whom, they were last modified. Enterprise level solutions will usually include some additional version control management. And again, because metadata can be extracted, it is possible to produce comparisons of workflows to identify differences.
What are the limits in terms of data volumes?
Most workflow-based solutions such as KNIME, Alteryx or Microsoft Power Automate do not have any built-in data volume limitations, so the only limiting factor is the computing resources required to store and process that data. During the processing of data, local compute and RAM is usually required (unless server or cloud-based versions are deployed), but the source data and target destinations can be pretty much any location including SharePoint/Google Drive, databases and even as the body or attachments of emails.
If the data is being stored in memory, what are the recommended computer specs?
There is no hard and fast rule on the computing power needed, but ultimately the more complex the operations and the larger the volume of data, the more CPU and RAM (and storage) that will be required. For really large volumes, into the hundreds of millions, it is likely that server processing capabilities will be required.
For KNIME Analytics Platform, a minimum of 4GB RAM and 10 GB free disk space is recommended for installation and buffering. However, more CPUs and more RAM can be utilised to help with processing speed and volume. You can also find more on the requirements for KNIME Business Hub here.
What other platforms are out there (besides KNIME)? What criteria do prospective clients tend to use, to choose among them?
As touched on during the webinar, there are a great many tools on the market, and it's important to choose the tool that is right for you and your organisation. This includes an awareness of the current application stack and the opportunities to integrate with it. Large organisations generally have data warehouses/lakes and use visualisation tools. Common platforms we see include Microsoft Power Automate and Dataiku; smaller firms or organisations that require regular processing of flat files often find tools like KNIME or Alteryx the most useful. Many software platforms also have built in low/no-code analytics solutions designed specifically to support the analysis of data stored within that platform, such as Salesforce.
It's worth noting that visualisation tools such as Power BI and Tableau typically have some low/no-code data workflow capabilities, and those capabilities may be sufficient, but equally may lack the functionality needed to deliver complex data analytics to an enterprise level – they are generally designed to support the manipulation of data for presentation purposes, and have less of the analytical capabilities that dedicated solutions provide.
The choice of the tool ultimately depends on your own skill level, your users, your budget, and most importantly your use cases. Identifying the need is always the first step to determining the right solution.
How long does it take to get someone up to speed on a tool like KNIME?
How long is a piece of string! A lot depends on their existing level of technical proficiency, their willingness to learn, and their natural aptitude for this style of workflow-based analytics. An average computer-literate user who is used to working with data in some capacity can probably master the basics of tools like KNIME or Alteryx in a few hours. It is important to have a grounding in the basic principles of working with data, to be able to use these tools effectively, so that should also be factored into the training plan. Some larger firms are rolling out training to all employees, regardless of level or experience, that covers these basics alongside hands-on use of analytics tools – this training is typically run as 1 or 2-day courses. To deliver more advanced solutions will of course require further training and skill development, but there is no shortage of content provided by the major software players to support users on that journey.
How can the upskilling challenge be addressed? Do staff really want to learn to use yet another new piece of software?
It is fair to say that tech overwhelm is a big challenge for many organisations, and the pace of change is making it difficult for employers and employees to keep up. When introducing a new piece of software, first and foremost it's worth being clear on why that software is being introduced, and how it will benefit users. Not everyone will want to use the software immediately – there will be some enthusiastic early adopters who can be utilised as advocates for the new software, but there will be plenty who adopt a more 'wait and see' approach. It's important not to rush those sceptics, and to utilise the early adopters to demonstrate the value and opportunity that the new software brings. As Grant Thornton have seen, by building a strong sense of community around new technology, you create an environment where the most enthusiastic users are excited to engage with like-minded individuals from across the organisation, while those who are more unsure can tap into the support that the community provides.