Monday, January 26, 2009

Captcha cracking in JavaScript with Canvas and neural nets

Everybody’s favourite glass shield to protect web apps are CAPTCHAS. These are the distorted characters displayed on a page that a user has to enter before gaining access or sending off a form. They annoy normal users, are largely inaccessible to blind users or dyslexic people and are not that safe as we think they are. PWNtcha continually reports successful cracks of various captchas on the web using OCR algos and backend systems.

What is pretty amazing though is that now you can even crack the images using JavaScript and Canvas. ShaunF wrote a GreaseMonkey script that automatically solves captchas of the file hosting site Megaupload. There’s a demo of it available.

As John Resig explains in his analysis of the script there’s some pretty nifty work going on:

  1. The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
  2. The script includes an implementation of a neural network, written in pure JavaScript.
  3. The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used - in a sort of crude form of Optical Character Recognition (OCR).

True, Megaupload’s CAPTCHA is pretty basic, but it is still very impressive that you can use JavaScript to crack it. Seems like the getImageData API is something to have a closer look at.

http://ajaxian.com/archives/captcha-cracking-in-javascript-with-canvas-and-neural-nets

No comments: