Scholars have identified yet another way to defeat text-based CAPTCHA systems using a machine learning algorithm. The new “captcha solver” method was developed by scientists from UK’s Lancaster University, China’s Northwest University and Peking University. The method is built upon the concept of the so-called generative adversarial network (GAN).
The new algorithm is the most effective solver of captcha security and authentication systems so far. It can literally defeat versions of text captcha schemes that are widely deployed to defend websites.
What is a generative adversarial network?
Generative adversarial networks are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. The method can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics.
Why are they called adversarial? In short, GANs are deep neural net architectures comprised of two nets, pitting one against the other, and hence they are called adversarial. The potential of these networks is huge, because they can learn to mimic any distribution of data, experts say. Furthermore, a GAN-based algorithm doesn’t need as many data points to train as classification machine learning algorithms need. And it can still perform very accurately.
How does the GAN-based captcha solver algorithm work?
The method involves teaching a captcha generator program to produce large numbers of training captchas that are indistinguishable from genuine captchas, the academics explained. http://www.lancaster.ac.uk/sci-tech/about-us/news/new-attack-could-make-website-security-captchas-obsolete They are used to rapidly train a solver, which is then refined and tested against real captchas.
How can this algorithm be used by attackers?
By using a machine-learned automatic captcha generator threat actors can significantly reduce the effort and time, needed to find and manually tag captchas to train their software. This is indeed how the scholars succeeded.
This GAN-based captcha solver only requires 500 genuine captchas, instead of the millions that would normally be needed to effectively train an attack program, the researchers added. What is more, the innovative solver is easy to rebuild, and can be used against new or modified captcha schemes. It doesn’t require a lot of human involvement to work.
The captcha solver was tested on 33 captcha schemes. 11 of the schemes are deployed by a number of the world’s most popular websites, such as eBay, Wikipedia and Microsoft. You can refer to the report for further details.
In 2016, three researchers, Suphannee Sivakorn, Jason Polakis, and Angelos D. Keromytis managed to design an automated attack that could successfullybreak the CAPTCHA of Google and Facebook. In order to succeed, the expert trio applied various “tricks” to defeat the CAPTCHA schemes. They also used machine learning to figure out the right CAPTCHA answer. T