What is web-crawler?

An Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing.

번역 대기 중인 콘텐츠입니다. 영어 버전을 표시하고 있습니다.

A web crawler (also known as a spider or spiderbot) starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit next. This is how search engines 'discover' and keep track of the billions of pages on the web.

        graph LR
  Center["What is web-crawler?"]:::main
  Rel_search_engine["search-engine"]:::related -.-> Center
  click Rel_search_engine "/terms/search-engine"
  Rel_keyword_research["keyword-research"]:::related -.-> Center
  click Rel_keyword_research "/terms/keyword-research"
  Rel_sorting_algorithm["sorting-algorithm"]:::related -.-> Center
  click Rel_sorting_algorithm "/terms/sorting-algorithm"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

      

🧠 지식 테스트

1 / 1

🧒 5살도 이해할 수 있게 설명

A web crawler is like a tiny robotic explorer that travels from one website to another using links like [bridges](/ko/terms/bridges). Every time it finds a new [bridge](/ko/terms/bridge) (a link), it crosses it and writes down what it saw on the other side. Thousands of these robots are constantly moving across the web day and night.

🤓 Expert Deep Dive

Crawlers must follow the 'Robots Exclusion Standard' (robots.txt), which tells them which parts of a site they are allowed to visit. Key challenges for crawlers include 'Spider Traps' (infinite loops of links) and handling JavaScript-heavy sites that require rendering before the links can be found.

📚 출처