Microsoft Researchers: We trained AI to find software bugs using hide and seek
Microsoft researchers worked on a deep learning model that was trained to find software bugs without any real world bugs to draw from.
While there is dozens of tools available For static analysis of code in various languages to detect security vulnerabilities, researchers explored techniques that use machine learning to improve the ability to detect and correct vulnerabilities. This is because finding and fixing bugs in code can be difficult and expensive, even using AI to find them.
Researchers from Microsoft Research Cambridge, UK, detailed their work on BugLab, a Python implementation of “an approach for self-supervised learning to find and fix bugs.” It is “self-supervised” in the sense that the two models behind BugLab were trained without tagged data.
This non-training ambition was driven by the lack of annotated real-world bugs to train bug-finding deep learning models. Although there is a large amount of source code available for such training, much of it is not annotated.
BugLab aims to find bugs that are hard to find versus critical bugs that can already be found through traditional program scans. Their approach promises to avoid the costly process of manually coding a model to find these bugs.
The group claims to have found 19 previously unknown bugs in PyPI’s open source Python packages, as detailed in the article, Self-monitoring bug detection and repair, presented at the Neural Information Processing Systems (NeurIPS) 2021 conference this week.
“BugLab can learn to find and fix bugs, without using tagged data, through a game of ‘hide and seek'”, explain Miltos Allamanis, senior researcher at Microsoft Research and Marc Brockschmidt, senior research director at Microsoft. Both are the authors of the article.
Beyond reasoning about the structure of a piece of code, they believe bugs can be found “by also understanding the ambiguous natural language clues that software developers leave in code comments, variable names, etc.
Their approach in BugLab, which uses two competing models, builds on existing self-supervised learning efforts in the field that utilize deep learning, computer vision, and natural language processing (NLP). It also resembles or is “inspired” by GANs or generative antagonist networks – the neural networks sometimes used to create deep counterfeits.
“In our case, we aim to train a bug detection model without using training data from actual bugs”, they note in the paper.
Both BugLab templates include a bug selector and a bug detector: “Given existing code that is presumed to be correct, a bug selector template decides whether to introduce a bug, where to introduce it, and its exact form ( for example, replacing a “+” with a “-“). Given the choice of the selector, the code is edited to introduce the bug. Then another model, the bug detector, tries to determine if a bug has been introduced in the code, and if so, locate it, and fix it. ”
Their models are not a GAN because “BugLab’s bug picker does not generate a new snippet from scratch, but instead rewrites an existing piece of code (assumed to be correct)”.
From the researchers’ test data set of 2374 actual bugs from the Python package, they showed that 26% of bugs can be found and fixed automatically.
However, their technique also reported too many false positives or bugs that weren’t actually bugs. For example, while it found some known bugs, only 19 of the 1000 warnings reported by BugHub were actually actual bugs.
Training a neural network without using actual bug training data seems like a difficult puzzle to solve. For example, some bugs were obviously not bugs, but were reported as such by neural models.
“Some of the issues reported were complex enough that it took us (human authors) a few minutes of thinking to conclude that a warning is false,” they note in the journal.
At the same time, some warnings are ‘obviously’ incorrect for us, but it is not clear why neural models are raising them.
As for the 19 zero-day vulnerabilities they found, they reported 11 on GitHub, of which 6 have been merged and 5 are pending approval. Some of the 19 were too underage to bother to point out.