web crawler – Playing with s.t.e.a.m.

by admin

January 24, 2025

Trap Naughty Web Crawlers in Digestive Juices with Nepenthes

In the olden days of the WWW you could just put a robots.txt file in the root of your website and crawling bots from search engines and kin would (generally) respect the rules in it. These days, however, we have especially web crawlers from large language model (LLM) companies happily ignoring such signs on the […]

by admin

September 3, 2019

This Machine Learning Algorithm is Meta

Suppose you ran a website releasing many articles per day about various topics, all following a general theme. And suppose that your website allowed for a comments section for discussion on those topics. Unless you are brand new to the Internet, you’ll also imagine that the comments section needs at least a little bit of […]

Category: web crawler

Trap Naughty Web Crawlers in Digestive Juices with Nepenthes

This Machine Learning Algorithm is Meta