Are CAPTCHAs Still Bot-hard? Generalized Visual CAPTCHA Solving
with Agentic Vision Language Model
Field Study
We infiltrated a human-driven CAPTCHA farm to evaluate Halligan's ability to generalize to unforeseen visual
CAPTCHAs. We selected 2Captcha as our target for three reasons: (1) it allows anyone to
join as a worker; (2) it ranks as the second-most popular
CAPTCHA-solving service according to domain popularity
metrics, enabling us to collect more data; and (3) its API documents solvers for many CAPTCHA services, increasing our
likelihood of encountering new types of CAPTCHAs.
Setup
Using 2Captcha's Windows software, we created a throwaway account and deployed Halligan to interact with tasks. Additionally, we performed a man-in-the-middle (MITM) attack using mitmproxy to intercept the communication between the worker software and 2Captcha's server, which gives us more insight on each task. Below is an example of the intercepted data from an incoming CAPTCHA task:
data contains all task-related information, including images, the sitekey, and the target website URL, necessary to load the visual CAPTCHA.
proxy and cookie contain spoofing-related data that enables workers to solve tasks using the same identity as the site visitor (task requester or customer). This approach helps bypass behavioral and IP-based checks, increasing the task success rate.
reputation, balance, simple_solving are internal metrics that reflect the current worker's performance and status.
The figure below illustrates the distribution of tasks by CAPTCHA service over the course of the study. Hover over any area to see a detailed breakdown of the numbers, and hover over the legend to highlight specific sections. To zoom in on a particular period, click and drag; double-click anywhere to reset the zoom to its default view.
Total Tasks
The figure below displays the total number of tasks, grouped by CAPTCHA service and further divided by CAPTCHA type. Four out of the nine services (xcaptcha, prosopo, amazon, and 2captcha), representing 17 of the 26 CAPTCHA types, are not included in the benchmark. Hover over the figure for more details.
Known Targeted Domains
The figure below displays the target domains, organized by CAPTCHA service. Please note that the intercepted data may be incomplete, and some services (e.g., xcaptcha, 2captcha, mtcaptcha) do not track the page URL, so the data should be considered an estimate. Arkose has the highest concentration of popular domains, with all ranked within the top 2000 and over half in the top 500. Amazon has a large portion of government-related domains, while geetest has a large portion of finance and cryptocurrency-related domains.