Timpibot Is Yet Another Badly-Behaved Crawler
Over the last couple of days I have noticed aggressive crawling from a user agent I hadn’t seen before.
It downloads but ignores robots.txt directives completely and sends dozens of parallel requests per second. This triggered certain safeguards I have put in place on infrastructure I manage. The behavior could be observed repeatedly on a diverse set of machines.
The company’s website makes the usual lofty marketing claims about freedom, privacy, being “powered by you”, what have you. All while completely ignoring mine, of course — thank you very much.
They also provide no email contact address or crawler information page, as is customary for trustworthy actors. Instead, there’s information about funding rounds. Quite a bunch of red flags right there. Even their so-called “white paper” is a marketing document light on technical details.
As a result, I have blocked the user agent globally and will firewall all currently encountered and future IPs. For now, activity appears to be concentrated on three addresses.
220.127.116.11 (18.104.22.168.mobile.tre.se) 22.214.171.124 (adsl-89-217-182-135.adslplus.ch) 126.96.36.199 (dynamic-ip-1868323648.cable.net.co)
It is rather beyond me how a search startup could reasonably argue that ignoring webmaster directives is acceptable conduct while crawling. It’s either a deliberate choice or fundamental incompetence regarding their core technology. Judge for yourself which is the more plausible scenario.
Lately I did encounter a different but similar case where inquiring about the respective crawler led to the audacious statement that robots.txt directives were followed insofar as to interpret allowing Googlebot as allowing their crawler. Their justification: excluding most crawlers but allowing Googlebot would be a misconfiguration on account of the webmaster. Wow.
I don’t know if the attitude of Timpi’s team is similarly cocky but I would imagine it to be.