Compare commits

..

15 commits
v1.36 ... main

Author SHA1 Message Date
dark-visitors
4ed17b8e4a Update from Dark Visitors
Some checks failed
/ ai-robots-txt (push) Has been cancelled
/ run-tests (push) Has been cancelled
/ lint-json (push) Has been cancelled
2025-06-17 01:00:21 +00:00
ai.robots.txt
5326c202b5 Merge pull request #154 from paulrudy/main
Some checks are pending
/ ai-robots-txt (push) Waiting to run
/ run-tests (push) Waiting to run
/ lint-json (push) Waiting to run
re-add facebookexternalhit
2025-06-16 15:12:42 +00:00
a31ae1e6d0
Merge pull request #154 from paulrudy/main
re-add facebookexternalhit
2025-06-16 08:12:31 -07:00
paulrudy
7535893aec re-add facebookexternalhit 2025-06-15 16:49:07 -07:00
ai.robots.txt
eb05f2f527 Merge pull request #153 from sergiospagnuolo/Poseidon
Some checks failed
/ ai-robots-txt (push) Has been cancelled
/ run-tests (push) Has been cancelled
/ lint-json (push) Has been cancelled
Update robots.json with new crawler
2025-06-14 14:04:03 +00:00
26a46c409d
Merge pull request #153 from sergiospagnuolo/Poseidon
Update robots.json with new crawler
2025-06-14 07:03:52 -07:00
dark-visitors
2b68568ac2 Update from Dark Visitors
Some checks are pending
/ ai-robots-txt (push) Waiting to run
/ run-tests (push) Waiting to run
/ lint-json (push) Waiting to run
2025-06-14 00:58:11 +00:00
Sérgio Spagnuolo
b05f2fee00
Update robots.json with new crawler
Update with Poseidon Research Crawler as found in nytimes.com/robots.txt
2025-06-13 17:15:13 -03:00
ai.robots.txt
e53d81c66d Merge pull request #152 from ai-robots-txt/MyCentralAIScraperBot
Some checks are pending
/ ai-robots-txt (push) Waiting to run
/ run-tests (push) Waiting to run
/ lint-json (push) Waiting to run
chore(robots.json): adds MyCentralAIScraperBot
2025-06-13 09:28:41 +00:00
Glyn Normington
20e327e74e
Merge pull request #152 from ai-robots-txt/MyCentralAIScraperBot
chore(robots.json): adds MyCentralAIScraperBot
2025-06-13 10:28:32 +01:00
Glyn Normington
8f17718e76
Fix typo 2025-06-13 10:28:12 +01:00
d760f9216f
chore(robots.json): adds MyCentralAIScraperBot 2025-06-12 13:08:29 -07:00
ai.robots.txt
842e2256e8 Merge pull request #150 from ai-robots-txt/semrush-bots
Some checks failed
/ run-tests (push) Has been cancelled
/ lint-json (push) Has been cancelled
/ ai-robots-txt (push) Has been cancelled
chore(robots.json): adds additional SemrushBot user agents
2025-06-12 07:12:00 +00:00
Glyn Normington
229ea20426
Merge pull request #150 from ai-robots-txt/semrush-bots
chore(robots.json): adds additional SemrushBot user agents
2025-06-12 08:11:51 +01:00
14d68f05ba
chore(robots.json): adds additional SemrushBot user agents 2025-06-11 13:50:53 -07:00
7 changed files with 73 additions and 3 deletions

View file

@ -1,3 +1,3 @@
RewriteEngine On RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User/1\.0|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot) [NC] RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User/1\.0|MyCentralAIScraperBot|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|Poseidon\ Research\ Crawler|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot\-BA|SemrushBot\-CT|SemrushBot\-OCOB|SemrushBot\-SI|SemrushBot\-SWA|Sidetrade\ indexer\ bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot) [NC]
RewriteRule !^/?robots\.txt$ - [F,L] RewriteRule !^/?robots\.txt$ - [F,L]

View file

@ -1,3 +1,3 @@
@aibots { @aibots {
header_regexp User-Agent "(AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User/1\.0|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot)" header_regexp User-Agent "(AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User/1\.0|MyCentralAIScraperBot|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|Poseidon\ Research\ Crawler|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot\-BA|SemrushBot\-CT|SemrushBot\-OCOB|SemrushBot\-SI|SemrushBot\-SWA|Sidetrade\ indexer\ bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot)"
} }

View file

@ -23,6 +23,7 @@ Diffbot
DuckAssistBot DuckAssistBot
EchoboxBot EchoboxBot
FacebookBot FacebookBot
facebookexternalhit
Factset_spyderbot Factset_spyderbot
FirecrawlAgent FirecrawlAgent
FriendlyCrawler FriendlyCrawler
@ -43,6 +44,7 @@ Meta-ExternalAgent
meta-externalfetcher meta-externalfetcher
Meta-ExternalFetcher Meta-ExternalFetcher
MistralAI-User/1.0 MistralAI-User/1.0
MyCentralAIScraperBot
NovaAct NovaAct
OAI-SearchBot OAI-SearchBot
omgili omgili
@ -55,12 +57,17 @@ Perplexity-User
PerplexityBot PerplexityBot
PetalBot PetalBot
PhindBot PhindBot
Poseidon Research Crawler
QualifiedBot QualifiedBot
QuillBot QuillBot
quillbot.com quillbot.com
SBIntuitionsBot SBIntuitionsBot
Scrapy Scrapy
SemrushBot
SemrushBot-BA
SemrushBot-CT
SemrushBot-OCOB SemrushBot-OCOB
SemrushBot-SI
SemrushBot-SWA SemrushBot-SWA
Sidetrade indexer bot Sidetrade indexer bot
TikTokSpider TikTokSpider

View file

@ -1,3 +1,3 @@
if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User/1\.0|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot)") { if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User/1\.0|MyCentralAIScraperBot|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|Poseidon\ Research\ Crawler|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot\-BA|SemrushBot\-CT|SemrushBot\-OCOB|SemrushBot\-SI|SemrushBot\-SWA|Sidetrade\ indexer\ bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot)") {
return 403; return 403;
} }

View file

@ -174,6 +174,13 @@
"frequency": "Up to 1 page per second", "frequency": "Up to 1 page per second",
"description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically."
}, },
"facebookexternalhit": {
"operator": "Meta/Facebook",
"respect": "[No](https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2524591313)",
"function": "Ostensibly only for sharing, but likely used as an AI crawler as well",
"frequency": "Unclear at this time.",
"description": "Note that excluding FacebookExternalHit will block incorporating OpenGraph data when sharing in social media, including rich links in Apple's Messages app. [According to Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/), its purpose is \"to crawl the content of an app or website that was shared on one of Meta\u2019s family of apps\u2026\". However, see discussions [here](https://github.com/ai-robots-txt/ai.robots.txt/pull/21) and [here](https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2524591313) for evidence to the contrary."
},
"Factset_spyderbot": { "Factset_spyderbot": {
"operator": "[Factset](https://www.factset.com/ai)", "operator": "[Factset](https://www.factset.com/ai)",
"respect": "Unclear at this time.", "respect": "Unclear at this time.",
@ -314,6 +321,13 @@
"description": "MistralAI-User is for user actions in LeChat. When users ask LeChat a question, it may visit a web page to help answer and include a link to the source in its response.", "description": "MistralAI-User is for user actions in LeChat. When users ask LeChat a question, it may visit a web page to help answer and include a link to the source in its response.",
"respect": "Yes" "respect": "Yes"
}, },
"MyCentralAIScraperBot": {
"operator": "Unclear at this time.",
"respect": "Unclear at this time.",
"function": "AI data scraper",
"frequency": "Unclear at this time.",
"description": "Operator and data use is unclear at this time."
},
"NovaAct": { "NovaAct": {
"operator": "Unclear at this time.", "operator": "Unclear at this time.",
"respect": "Unclear at this time.", "respect": "Unclear at this time.",
@ -398,6 +412,13 @@
"operator": "[phind](https://www.phind.com/)", "operator": "[phind](https://www.phind.com/)",
"respect": "Unclear at this time." "respect": "Unclear at this time."
}, },
"Poseidon Research Crawler": {
"operator": "[Poseidon Research](https://www.poseidonresearch.com)",
"description": "Lab focused on scaling the interpretability research necessary to make better AI systems possible.",
"frequency": "No explicit frequency provided.",
"function": "AI research crawler",
"respect": "Unclear at this time."
},
"QualifiedBot": { "QualifiedBot": {
"description": "Operated by Qualified as part of their suite of AI product offerings.", "description": "Operated by Qualified as part of their suite of AI product offerings.",
"frequency": "No explicit frequency provided.", "frequency": "No explicit frequency provided.",
@ -433,6 +454,27 @@
"operator": "[Zyte](https://www.zyte.com)", "operator": "[Zyte](https://www.zyte.com)",
"respect": "Unclear at this time." "respect": "Unclear at this time."
}, },
"SemrushBot": {
"operator": "[Semrush](https://www.semrush.com/)",
"respect": "[Yes](https://www.semrush.com/bot/)",
"function": "Crawls your site for ContentShake AI tool.",
"frequency": "Roughly once every 10 seconds.",
"description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)."
},
"SemrushBot-BA": {
"operator": "[Semrush](https://www.semrush.com/)",
"respect": "[Yes](https://www.semrush.com/bot/)",
"function": "Crawls your site for ContentShake AI tool.",
"frequency": "Roughly once every 10 seconds.",
"description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)."
},
"SemrushBot-CT": {
"operator": "[Semrush](https://www.semrush.com/)",
"respect": "[Yes](https://www.semrush.com/bot/)",
"function": "Crawls your site for ContentShake AI tool.",
"frequency": "Roughly once every 10 seconds.",
"description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)."
},
"SemrushBot-OCOB": { "SemrushBot-OCOB": {
"operator": "[Semrush](https://www.semrush.com/)", "operator": "[Semrush](https://www.semrush.com/)",
"respect": "[Yes](https://www.semrush.com/bot/)", "respect": "[Yes](https://www.semrush.com/bot/)",
@ -440,6 +482,13 @@
"frequency": "Roughly once every 10 seconds.", "frequency": "Roughly once every 10 seconds.",
"description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)." "description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)."
}, },
"SemrushBot-SI": {
"operator": "[Semrush](https://www.semrush.com/)",
"respect": "[Yes](https://www.semrush.com/bot/)",
"function": "Crawls your site for ContentShake AI tool.",
"frequency": "Roughly once every 10 seconds.",
"description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)."
},
"SemrushBot-SWA": { "SemrushBot-SWA": {
"operator": "[Semrush](https://www.semrush.com/)", "operator": "[Semrush](https://www.semrush.com/)",
"respect": "[Yes](https://www.semrush.com/bot/)", "respect": "[Yes](https://www.semrush.com/bot/)",

View file

@ -23,6 +23,7 @@ User-agent: Diffbot
User-agent: DuckAssistBot User-agent: DuckAssistBot
User-agent: EchoboxBot User-agent: EchoboxBot
User-agent: FacebookBot User-agent: FacebookBot
User-agent: facebookexternalhit
User-agent: Factset_spyderbot User-agent: Factset_spyderbot
User-agent: FirecrawlAgent User-agent: FirecrawlAgent
User-agent: FriendlyCrawler User-agent: FriendlyCrawler
@ -43,6 +44,7 @@ User-agent: Meta-ExternalAgent
User-agent: meta-externalfetcher User-agent: meta-externalfetcher
User-agent: Meta-ExternalFetcher User-agent: Meta-ExternalFetcher
User-agent: MistralAI-User/1.0 User-agent: MistralAI-User/1.0
User-agent: MyCentralAIScraperBot
User-agent: NovaAct User-agent: NovaAct
User-agent: OAI-SearchBot User-agent: OAI-SearchBot
User-agent: omgili User-agent: omgili
@ -55,12 +57,17 @@ User-agent: Perplexity-User
User-agent: PerplexityBot User-agent: PerplexityBot
User-agent: PetalBot User-agent: PetalBot
User-agent: PhindBot User-agent: PhindBot
User-agent: Poseidon Research Crawler
User-agent: QualifiedBot User-agent: QualifiedBot
User-agent: QuillBot User-agent: QuillBot
User-agent: quillbot.com User-agent: quillbot.com
User-agent: SBIntuitionsBot User-agent: SBIntuitionsBot
User-agent: Scrapy User-agent: Scrapy
User-agent: SemrushBot
User-agent: SemrushBot-BA
User-agent: SemrushBot-CT
User-agent: SemrushBot-OCOB User-agent: SemrushBot-OCOB
User-agent: SemrushBot-SI
User-agent: SemrushBot-SWA User-agent: SemrushBot-SWA
User-agent: Sidetrade indexer bot User-agent: Sidetrade indexer bot
User-agent: TikTokSpider User-agent: TikTokSpider

View file

@ -25,6 +25,7 @@
| DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot | | DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot |
| EchoboxBot | [Echobox](https://echobox.com) | Unclear at this time. | Data collection to support AI-powered products. | Unclear at this time. | Supports company's AI-powered social and email management products. | | EchoboxBot | [Echobox](https://echobox.com) | Unclear at this time. | Data collection to support AI-powered products. | Unclear at this time. | Supports company's AI-powered social and email management products. |
| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. |
| facebookexternalhit | Meta/Facebook | [No](https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2524591313) | Ostensibly only for sharing, but likely used as an AI crawler as well | Unclear at this time. | Note that excluding FacebookExternalHit will block incorporating OpenGraph data when sharing in social media, including rich links in Apple's Messages app. [According to Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/), its purpose is "to crawl the content of an app or website that was shared on one of Metas family of apps…". However, see discussions [here](https://github.com/ai-robots-txt/ai.robots.txt/pull/21) and [here](https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2524591313) for evidence to the contrary. |
| Factset\_spyderbot | [Factset](https://www.factset.com/ai) | Unclear at this time. | AI model training. | No information provided. | Scrapes data for AI training. | | Factset\_spyderbot | [Factset](https://www.factset.com/ai) | Unclear at this time. | AI model training. | No information provided. | Scrapes data for AI training. |
| FirecrawlAgent | [Firecrawl](https://www.firecrawl.dev/) | Yes | AI scraper and LLM training | No information provided. | Scrapes data for AI systems and LLM training. | | FirecrawlAgent | [Firecrawl](https://www.firecrawl.dev/) | Yes | AI scraper and LLM training | No information provided. | Scrapes data for AI systems and LLM training. |
| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. |
@ -45,6 +46,7 @@
| meta\-externalfetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | | meta\-externalfetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher |
| Meta\-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | | Meta\-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher |
| MistralAI\-User/1\.0 | Mistral AI | Yes | Takes action based on user prompts. | Only when prompted by a user. | MistralAI-User is for user actions in LeChat. When users ask LeChat a question, it may visit a web page to help answer and include a link to the source in its response. | | MistralAI\-User/1\.0 | Mistral AI | Yes | Takes action based on user prompts. | Only when prompted by a user. | MistralAI-User is for user actions in LeChat. When users ask LeChat a question, it may visit a web page to help answer and include a link to the source in its response. |
| MyCentralAIScraperBot | Unclear at this time. | Unclear at this time. | AI data scraper | Unclear at this time. | Operator and data use is unclear at this time. |
| NovaAct | Unclear at this time. | Unclear at this time. | AI Agents | Unclear at this time. | Nova Act is an AI agent created by Amazon that can use a web browser. It can intelligently navigate and interact with websites to complete multi-step tasks on behalf of a human user. More info can be found at https://darkvisitors.com/agents/agents/novaact | | NovaAct | Unclear at this time. | Unclear at this time. | AI Agents | Unclear at this time. | Nova Act is an AI agent created by Amazon that can use a web browser. It can intelligently navigate and interact with websites to complete multi-step tasks on behalf of a human user. More info can be found at https://darkvisitors.com/agents/agents/novaact |
| OAI\-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | OAI\-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. |
| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. |
@ -57,12 +59,17 @@
| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. |
| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. |
| PhindBot | [phind](https://www.phind.com/) | Unclear at this time. | AI-enhanced search engine. | No explicit frequency provided. | Company offers an AI agent that uses AI and generate extra web query on the fly | | PhindBot | [phind](https://www.phind.com/) | Unclear at this time. | AI-enhanced search engine. | No explicit frequency provided. | Company offers an AI agent that uses AI and generate extra web query on the fly |
| Poseidon Research Crawler | [Poseidon Research](https://www.poseidonresearch.com) | Unclear at this time. | AI research crawler | No explicit frequency provided. | Lab focused on scaling the interpretability research necessary to make better AI systems possible. |
| QualifiedBot | [Qualified](https://www.qualified.com) | Unclear at this time. | Company offers AI agents and other related products; usage can be assumed to support said products. | No explicit frequency provided. | Operated by Qualified as part of their suite of AI product offerings. | | QualifiedBot | [Qualified](https://www.qualified.com) | Unclear at this time. | Company offers AI agents and other related products; usage can be assumed to support said products. | No explicit frequency provided. | Operated by Qualified as part of their suite of AI product offerings. |
| QuillBot | [Quillbot](https://quillbot.com) | Unclear at this time. | Company offers AI detection, writing tools and other services. | No explicit frequency provided. | Operated by QuillBot as part of their suite of AI product offerings. | | QuillBot | [Quillbot](https://quillbot.com) | Unclear at this time. | Company offers AI detection, writing tools and other services. | No explicit frequency provided. | Operated by QuillBot as part of their suite of AI product offerings. |
| quillbot\.com | [Quillbot](https://quillbot.com) | Unclear at this time. | Company offers AI detection, writing tools and other services. | No explicit frequency provided. | Operated by QuillBot as part of their suite of AI product offerings. | | quillbot\.com | [Quillbot](https://quillbot.com) | Unclear at this time. | Company offers AI detection, writing tools and other services. | No explicit frequency provided. | Operated by QuillBot as part of their suite of AI product offerings. |
| SBIntuitionsBot | [SB Intuitions](https://www.sbintuitions.co.jp/en/) | [Yes](https://www.sbintuitions.co.jp/en/bot/) | Uses data gathered in AI development and information analysis. | No information. | AI development and information analysis | | SBIntuitionsBot | [SB Intuitions](https://www.sbintuitions.co.jp/en/) | [Yes](https://www.sbintuitions.co.jp/en/bot/) | Uses data gathered in AI development and information analysis. | No information. | AI development and information analysis |
| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." |
| SemrushBot | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). |
| SemrushBot\-BA | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). |
| SemrushBot\-CT | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). |
| SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | | SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). |
| SemrushBot\-SI | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). |
| SemrushBot\-SWA | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Checks URLs on your site for SWA tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | | SemrushBot\-SWA | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Checks URLs on your site for SWA tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). |
| Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | | Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. |
| TikTokSpider | ByteDance | Unclear at this time. | LLM training. | Unclear at this time. | Downloads data to train LLMS, as per Bytespider. | | TikTokSpider | ByteDance | Unclear at this time. | LLM training. | Unclear at this time. | Downloads data to train LLMS, as per Bytespider. |