# Per-Ankh · House of Life # Sitemap index (canonical host) — declared up front so all crawlers (incl. Baidu) reliably pick it up. # The index links to per-section sub-sitemaps: # sitemap-core, sitemap-scholars, sitemap-books, sitemap-archives (oral traditions + LoC archive media), # sitemap-chambers, sitemap-collections, sitemap-journals, sitemap-bannedbooks, sitemap-pathways, sitemap-goodnews, # sitemap-directory-hubs (directory hub cards), sitemap-directory-orgs (directory organizations), # sitemap-foodways, sitemap-archive-tags # Sub-sitemaps are also listed explicitly below for crawlers (Baidu, Sogou, some regional bots) that # don't always follow nested entries. Always-chunked sections point at "-1" — the # sitemap edge fn 301-redirects bare names to the first chunk and the index advertises every chunk. # # Internal/admin routes (/admin, /auth, /dashboard, /parent-dashboard, /account/, # /subscribe) are explicitly disallowed in every User-agent group below to keep # them out of search indexes. Per the robots.txt spec, named UA groups override # `User-agent: *`, so the rules must be repeated per group. Sitemap: https://perankharchive.com/sitemap.xml Sitemap: https://perankharchive.com/sitemap-core.xml Sitemap: https://perankharchive.com/sitemap-scholars-1.xml Sitemap: https://perankharchive.com/sitemap-books-1.xml Sitemap: https://perankharchive.com/sitemap-archives-1.xml Sitemap: https://perankharchive.com/sitemap-chambers-1.xml Sitemap: https://perankharchive.com/sitemap-collections-1.xml Sitemap: https://perankharchive.com/sitemap-journals-1.xml Sitemap: https://perankharchive.com/sitemap-bannedbooks-1.xml Sitemap: https://perankharchive.com/sitemap-pathways-1.xml Sitemap: https://perankharchive.com/sitemap-goodnews-1.xml Sitemap: https://perankharchive.com/sitemap-directory-hubs-1.xml Sitemap: https://perankharchive.com/sitemap-directory-orgs-1.xml Sitemap: https://perankharchive.com/sitemap-foodways-1.xml Sitemap: https://perankharchive.com/sitemap-archive-tags-1.xml # ───────────────────────────────────────────────────────────── # Named "good bot" groups: per the robots.txt spec, a named # User-agent group fully overrides `User-agent: *` for that bot, # so every allowed crawler must repeat the /api/ and /functions/ # Disallow lines or it will happily crawl edge-function URLs. # ───────────────────────────────────────────────────────────── # Legitimate search engines — Global User-agent: Googlebot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Bingbot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: DuckDuckBot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Yandex Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Baidu — explicit allow for primary crawler and all known sub-crawlers User-agent: Baiduspider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-image Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-video Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-news Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-mobile Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-favo Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-cpro Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Baidu JS-rendering crawler (UA: "Baiduspider-render/2.0") — required for SPA pages User-agent: Baiduspider-render Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Baiduspider-render/2.0 Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Pan-African & Regional African Search Engines User-agent: Mashcor Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: ASKIA Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Nassita Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Afriweb Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: VConnect Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Finelib Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Ananzi Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Aardvark Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Bongoza Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Kayambo Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: GoBATLA Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Caribbean Search Engines User-agent: CaribbeanLocal Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Is4we Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: KaribSearch Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: YawddyPages Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: KaribGuide Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Asian Regional Search Engines # Sogou (China) — primary web crawler + named sub-crawlers (news, image, vertical, mobile). # Each must be listed explicitly because a named UA group fully overrides `User-agent: *`. User-agent: Sogou web spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Sogou inst spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Sogou Pic Spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Sogou News Spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Sogou Orion spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Sogou-Test-Spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Sogou mobile crawler (smartphone-targeted index). User-agent: Sogou Mobile Spider Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Sogou next-gen "Spider2" UA observed in access logs. User-agent: Sogou Spider2 Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Naver (South Korea) — Yeti is the primary crawler; Yeti-Mobile is the smartphone variant. User-agent: Yeti Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Yeti-Mobile Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Naver next-gen crawler UA observed in access logs. User-agent: Yetibot/3.0 Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Naver image/thumbnail crawler — required for Korean image search visibility of scholar portraits and archive media. User-agent: YetiThumb Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Goo (Japan) — additional UAs beyond ichiro. User-agent: moget Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: goo Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # South American / Latin American Search & Discovery # No regional engine has Baidu/Yandex-scale presence; these are best-effort allows for # decentralized and Latin-American shopping/comparison crawlers that occasionally surface # Per-Ankh content in regional discovery feeds. User-agent: yacybot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: SearXNG Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: BuscapeBot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: Bigsearch.ca Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Image-only crawler variants for already-allowed engines. # Per-Ankh has substantial image content (scholar portraits, archive media); these named # UAs ensure that imagery surfaces in image search verticals. Video variants intentionally # skipped — Vimeo embeds are served from Vimeo's own domain. User-agent: Googlebot-Image Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: bingbot-image Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe User-agent: YandexImages Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # European Privacy-Respecting Search Engines User-agent: Qwantify Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Mojeek — independent UK crawler with its own index (no Google/Bing reliance). # Explicit allow for the primary UA. User-agent: MojeekBot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # ───────────────────────────────────────────────────────────── # Africa — pan-African and regional search/news crawlers # ───────────────────────────────────────────────────────────── # AllAfricaBot — pan-African news aggregator (allafrica.com) User-agent: AllAfricaBot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # IbouBot — Senegal-based search engine crawler (Ibou) User-agent: IbouBot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # ───────────────────────────────────────────────────────────── # Asia — regional search engine crawlers (Vietnam, Korea, Japan) # Note: Sogou (CN) and Naver Yeti (KR) are already declared above. # ───────────────────────────────────────────────────────────── # Cốc Cốc — Vietnamese search engine (umbrella token covers all variants) User-agent: coccocbot Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Cốc Cốc — web crawler variant User-agent: coccocbot-web Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Cốc Cốc — image crawler variant User-agent: coccocbot-image Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Cốc Cốc — fast-discovery variant User-agent: coccocbot-fast Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Daumoa — Daum (South Korea) search engine crawler (Kakao) User-agent: Daumoa Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # ichiro — goo.ne.jp (Japan) search engine crawler (NTT Resonant) User-agent: ichiro Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # ───────────────────────────────────────────────────────────── # South America — note on coverage # ───────────────────────────────────────────────────────────── # Brazil, Argentina, and the wider Spanish/Portuguese-speaking Americas # are served almost exclusively by Googlebot and Bingbot, both of which # already have explicit Allow groups above. No active regional South # American search-engine crawler with a stable, verified user-agent # token currently warrants its own group. If one emerges (e.g. a # revived Yacy-based or sovereign LATAM index), add a named group here # following the same Allow + standard-Disallow template used above. # Block AI scrapers and data harvesters User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: CCBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: Google-Extended Disallow: / User-agent: FacebookBot Disallow: / User-agent: Bytespider Disallow: / User-agent: PetalBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: YouBot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: cohere-ai Disallow: / User-agent: Diffbot Disallow: / User-agent: img2dataset Disallow: / User-agent: omgili Disallow: / User-agent: Scrapy Disallow: / User-agent: DataForSeoBot Disallow: / User-agent: magpie-crawler Disallow: / User-agent: Timpibot Disallow: / User-agent: SemrushBot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: DotBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: SeznamBot Disallow: / # Block generic unknown bots User-agent: * Allow: / Disallow: /api/ Disallow: /functions/ Disallow: /admin Disallow: /admin/ Disallow: /auth Disallow: /dashboard Disallow: /parent-dashboard Disallow: /account/ Disallow: /subscribe # Sitemap (repeated at end for crawlers that scan bottom-up) Sitemap: https://perankharchive.com/sitemap.xml