robots.txt - 사이트 접근 설정

카테고리 없음

robots.txt - 사이트 접근 설정

로아_ 2023. 4. 1. 14:26

728x90

사이트 접근 설정

사이트URL/robots.txt

사이트 구축 후 구글, 네이버, 다음 등 검색 엔진이 어디까지 크롤링해 갈수 있는지 설정

[robots.txt]

User-agent: *
Disallow: /
Allow : /$

[전체 서버에서 모든 로봇을 제외]

er-agent: *
Disallow: /

[모든 로봇의 완전한 접근을 허용]

User-agent: *
Disallow:

(또는 빈 "/robots.txt" 파일을 만들거나 전혀 사용하지 마십시오)

[서버의 일부에서 모든 로봇을 제외]

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

[단일 로봇을 제외]

User-agent: BadBot
Disallow: /

[단일 로봇을 허용]

User-agent: Google
Disallow:

User-agent: *
Disallow: /

[하나를 제외한 모든 파일을 제외]

User-agent: *
Disallow: /~joe/stuff/

[허용되지 않는 모든 페이지를 명시적으로 금지]

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

http://www.robotstxt.org/

The Web Robots Pages

The Web Robots Pages Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they

www.robotstxt.org

728x90