Software Analytics going crAIzy!

Traditionally, software analytics has focused on understanding problems in existing systems: finding code smells, uncovering architectural drift, detecting inconsistencies, and many more. For me, its strength has always been in creating transparency and enabling developers and architects to better understand their systems and explain systemic problems to business stakeholders. But what if these kinds of analyses don’t stop at describing problems? What if we could go one step further, straight into implementing solutions?

From grumbling to action

Artificial intelligence (where I include parts of data science, data mining, and, of course, large language models (LLMs)) now challenges this long-standing separation between knowing what to do and actually doing it. Instead of producing only reports, overviews, or dashboards, analytics results can now be directly operationalized. We’re now able to transform insights into improved solutions such as:

scratch refactoring cryptic code by brainstorming design alternatives that better reflect good coding practices and domain concepts to make systems easier to understand for developers
extracting implicit architectural concepts and guidelines to transform them into executable checks as well as a living architecture documentation that developers can trust and build upon
generating problem impact analyses and at the same time migration scripts enabling large-scale refactorings that spare developers tedious, repetitive cleanup tasks (that might otherwise let them think about quitting their job …)

Problem-focused analysis first

I strongly believe in the importance of identifying the problem properly instead of rushing to quick solutions. Being able to describe the problem precisely often reveals valuable clues that guide us toward a good solution! And even better: When analyzing software system with tooling from (graph) data science, you even can get the locations of the problems directly pointed out to you. The results, when properly visualized and communicated, can raise awareness of the severity of the identified problems even at the business level. At the same time, it becomes easier to come up with an idea or approach to fix all the affected parts, because the previously hidden problem becomes much clearer, more tangibly.

Solution-focused approaches second

LLMs and code transformation frameworks like CodeModder or OpenRewrite take us now even further: I can now apply these recipes onto the software system, moving beyond problem detection toward active transformations of systemic problems that matter to the business. A few reasons for the examples from above how this combination works:

if you can express what “cryptic” code looks like, you can locate all those locations, mine through their structures, find similarities or differences for the bigger picture and then pair with a LLM to come up with alternative solutions
if you go through your complete code base and mine for common identifiers and concepts, you can extract those and let an LLM document their meaning and see what rules they are following, letting you even derive architectural validations
if you can identify problematic code snippets that are spread throughout the code base, you can let an LLM create a migration script template where you can then add your precise refactoring rules or transformations for those code parts

Boring approach?

I’m really excited that we as developers now have the possibility not only to generate large amounts of code with AI, but also to remove it before our software systems start to deteriorate significantly. But still, these approaches to modernizing software systems feel very familiar, sometimes even a little bit boring: you read data, wrangle the data, get results, and take action. I’ve done plenty of these kinds of analyses in the past. But today, the final step, the ability to automate the implementation, feels refreshing and highly valuable to me. It motivates me to continue exploring this path with renewed energy.

And yes, there are plenty of “AI modernization vanity tools” out there that answer questions nobody asked and fix problems your command line or IDE can already handle with a single keystroke. But don’t let yourself get discouraged by news about these kinds of tools! There is plenty of potential in this area if you focus on tools and methods that actually deliver solutions you benefit from. Spoiler: none of this is rocket science.

What’s next?

I’ll share more concrete examples here. Some of the original problems and their analyses were really crazy. But thanks to AI, all of them turned out to be solvable (so in the end they were just “crAIzy”, hence the title of this post). When exactly I’ll write about this, I don’t know yet. The analysis and migration script generation scripts [sic!] for my clients are still not writing themselves (yet). So stay tuned!

For the curious: Here are some demo notebooks that I wrote earlier to prototype some solutions using LLMs:

Conceptual Integrity Analysis: This notebook analyzes the conceptual integrity of a code base by creating an interactive treemap visualization showing how well Java files in a codebase align with predefined technical and business concepts.
Fake comment correction using Gemma 3: This notebook uses a local LLM (Google’s Gemma 3 27B model) to detect fake comments in Java code, successfully replaces creative attempts to hide nonsense comments and sarcastic remarks with better fitting comments.
The Jupyter notebook where I use ASM via JPype in Python to analyze Java bytecode to detect coding problems that are then patched with generated OpenRewrite recipes, well, don’t hold your breath for that one. Oh boy, this one went really crAIzy. 😅

Many thanks to Joachim Praetorius for giving feedback on an earlier version of this blog post.

Credits: Header image by Annette from Pixabay

Blog-Post