AI-supported test automation with TOMAS | prodot

Written by Dennis Schiavo | May 27, 2026 6:46:23 AM

Is test automation worthwhile in a project that has been running for years? With the right combination of your own test framework and an AI agent, the answer is different today than it was two years ago.

The initial situation: bugs are found too late - and that is expensive

Test automation is one of the most frequently mentioned quality assurance measures in software projects and also one of the most frequently postponed. The reasons are well known: The initial outlay seems high, the benefits are difficult to quantify, and in ongoing projects with an established code base, it is much more difficult to get started than on a greenfield site.

The consequences are well documented in studies: A study by the IBM Systems Sciences Institute shows that an error that is only discovered in production is on average around 100 times more expensive to rectify than an error that is already noticeable in the design phase; and still around fifteen times more expensive than an error that is found during development (IBM via Functionize). Specifically, industry analyses put the cost of a bug in the requirements phase at around 100 US dollars, in the QA phase at 1,500 US dollars and in production at up to 10,000 US dollars (CloudQA: How Much Do Software Bugs Cost? 2025).

In addition, an analysis for the year 2024 shows that around 85% of bugs in web applications are only discovered by users - not in the test phase (CloudQA 2025). Mature engineering organizations, on the other hand, aim for a defect escape rate of less than 10% according to the DORA metrics (Opsera: DER Guide). The gap between these two figures marks one of the greatest optimization potentials of modern software development.

This is precisely where our approach comes in: Test automation must become effective quickly, even in projects that have been running for a long time. And it must remain affordable.

The solution: TOMAS meets an AI agent

We have been using our test framework TOMAS, which is based on C# and allows the creation of automation scripts in an XAML dialect, for several years. It was originally developed for the automation of Windows desktop applications and has since become a comprehensive and modular toolset for the automation of front-end and back-end processes. TOMAS can be integrated into the CI/CD pipeline and is completely transparent as a code base. No vendor lock-in, no black box tool. That was important to us during development.

However, this did not solve the actual problem: even with a good framework, writing each individual test case takes time: analysis, implementation and stabilization. Particularly in the case of long-running projects that were created without test automation in mind, this results in an effort that many clients can hardly shoulder in their day-to-day business.

Our answer: a self-developed AI agent that automatically generates the test cases. Connected via specialized skills, integrated into the development environment, with direct access to the application to be tested and the existing TOMAS framework.

How the agent works

The agent is not simply an upstream language model that throws together a few code snippets. We have developed it specifically to be productive in our concrete environment. This means

  • It knows the conventions of our TOMAS framework - including page object patterns, timeout strategies, retry logic, reporting format.
  • He has direct access to the application to be tested and can independently read out UI structures, element hierarchies and interaction paths.
  • It generates test cases in our code base, documents its assumptions and reports where it is unsure.

The result is not a finished, approved test case. This is important to us and should be clear to anyone thinking about AI-supported test automation: what the agent delivers is a qualified draft. Only when it is checked by experienced software and QA engineers does it become a test that can be trusted.

The human remains in the loop

There is a great temptation to accept AI-generated code without a second thought. Our experience shows: This is exactly the fastest way to an unstable test set. Tests that nobody trusts are worse than no test, because they tie up capacity without providing security.

This is why we have deliberately created a two-stage workflow:

  1. The agent generates the test case based on the specification, the application and the conventions of the framework.
  2. An experienced person checks, completes and releases - with a focus on stability, maintainability and technical correctness.

This division has proven its worth. The time saving is not in the elimination of the review, but in the elimination of the initial effort. Whereas it used to take a day to analyze and write the initial test, today a developer checks the proposal in two hours, corrects weaknesses and releases it. All in all, the effort required is significantly less while the quality remains the same.

Why existing projects in particular benefit

Traditionally, the later you start with test automation, the more expensive it is. Mature code bases often lack clean structures, the interfaces are complex and the documentation lags behind the implementation.

This is exactly where the agent comes into its own. It analyzes the application in its current state, not in the state described by an outdated specification. It recognizes which elements exist in the UI, how they behave and what a test path realistically looks like. For the first time, this makes test automation in existing projects an investment with a manageable risk - and not a modernization project lasting several years.

We can see the effect in the figures: In one of our projects, the rate of bugs that are only found after delivery has fallen from around 25% to less than 9% across three main versions. Part of this is due to structural improvements. However, a growing proportion is due to the increasing test coverage and the speed with which we can add new tests to the database.

What we have learned

A number of insights have emerged from our work with the agent that are relevant for any comparable project:

Stability beats scope. Fifteen stable tests are more valuable than fifty unstable ones. The agent can produce a lot, the trick is to find the right balance.

Prioritization is gained through data. Which program components should be automated first? Not those that are the most complex, but those that are used most frequently or whose failure has the greatest impact. Telemetry provides the most reliable answers here.

Pipeline integration is not a bonus, but a requirement. Tests running on a single developer machine are not a safety net. They become a safety net when they run automatically on every build and make results visible.

The agent is an assistant, not a replacement. Experienced test and development teams become more productive through AI, not superfluous. Those who see it the other way around risk both quality and trust.

The challenges

As convincing as the results are - honesty is part of it: There are issues that we are continuously improving.

The agent is only as good as the information it receives. Complex technical validations - such as industry-specific rules that cannot be read from the UI - are still required as specifications. We are working on systematically integrating technical context into the interface without the agent becoming a data collector.

And secondly: speed is not an end in itself. We don't measure how many tests the agent produces per hour, but how many bugs are found earlier through automated regression. That is the only key figure that counts.

Conclusion: test automation is becoming more accessible - even for mature projects

The use of AI in test automation does not change the basic principles of good quality assurance. What does change is the effort required to implement these principles. Where test automation in existing projects was long considered "too expensive, too late", it is now becoming a measurably economical option.

For us, the combination of the TOMAS framework and our AI agent is a practical step forward: not a hype technology, but a concrete tool that makes our teams' work more productive and helps our customers achieve quality earlier. Bugs that we find in the sprint are many times cheaper than bugs that come to light in production. This calculation has not changed. The only thing that has changed is how quickly and how accessible we can secure the sprint.

Are you considering how test automation can make a measurable contribution to your project - even if the code base has been around for a while? Please feel free to contact us.