RAG란 무엇인가

검색 증강 생성(RAG)은 외부 지식 소스를 통합하여 대규모 언어 모델(LLM)의 정확성과 신뢰성을 향상시키는 AI 프레임워크입니다.

RAG는 정보 검색과 텍스트 생성의 강점을 결합합니다. 먼저 사용자의 쿼리를 기반으로 지식 기반 또는 외부 데이터 소스에서 관련 정보를 검색합니다. 그런 다음 이 검색된 정보는 LLM의 프롬프트를 보강하여 보다 정확하고 정확한 답변을 생성하기 위한 컨텍스트와 사실을 제공합니다. 이 접근 방식은 LLM의 내부 매개변수에 대한 의존도를 줄여 특히 전문적이거나 최신 지식에 대해 잘못되거나 환각된 정보를 생성할 위험을 완화합니다. 이 프로세스는 일반적으로 지식 기반을 색인화하고, 사용자의 질문으로 쿼리하고, 관련 문서를 검색한 다음, 원래 질문과 함께 해당 문서를 LLM에 공급하여 답변을 생성하는 방식으로 진행됩니다.

        graph LR
  Center["RAG란 무엇인가"]:::main
  Pre_logic["logic"]:::pre --> Center
  click Pre_logic "/terms/logic"
  Rel_retrieval_augmented_generation["retrieval-augmented-generation"]:::related -.-> Center
  click Rel_retrieval_augmented_generation "/terms/retrieval-augmented-generation"
  Rel_rag_pipeline["rag-pipeline"]:::related -.-> Center
  click Rel_rag_pipeline "/terms/rag-pipeline"
  Rel_reinforcement_learning["reinforcement-learning"]:::related -.-> Center
  click Rel_reinforcement_learning "/terms/reinforcement-learning"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

      

🧠 지식 테스트

1 / 3

🧒 5살도 이해할 수 있게 설명

Imagine you're asking a super-smart robot a question, but it only knows things it learned a long time ago. RAG is like giving the robot a quick peek at a library book (the knowledge base) before it answers, so it can tell you the latest and most correct information!

🤓 Expert Deep Dive

RAG architectures fundamentally decouple knowledge acquisition from model inference by augmenting a generative Large Language Model (LLM) with an external, dynamically queried knowledge source. The core components typically include:

  1. Document Indexing: A corpus of documents (e.g., articles, PDFs, web pages) is processed and embedded into a vector space using a pre-trained embedding model (e.g., Sentence-BERT, OpenAI's text-embedding-ada-002). These embeddings are stored in a vector [database](/ko/terms/vector-database) (e.g., Pinecone, Weaviate, FAISS) for efficient similarity search.
  2. Retrieval: Upon receiving a user query, the query is also embedded into the same vector space. A similarity search (e.g., Approximate Nearest Neighbor - ANN) is performed against the vector database to identify the k most relevant document chunks (or passages) based on cosine similarity or dot product.
  3. Augmentation & Generation: The original user query and the retrieved document chunks are concatenated into a single prompt. This augmented prompt is then fed into the LLM. The LLM uses this context to generate a response that is grounded in the retrieved information.

Mathematically, the retrieval step can be viewed as finding document embeddings $d_i$ such that their similarity to the query embedding $q$ is maximized:

$i^* = \arg\max_i \text{sim}(q, d_i)$

where $\text{sim}(u, v)$ is a similarity function like cosine similarity: $\frac{u \cdot v}{\|u\| \cdot \|v\|}$.

This approach mitigates the 'knowledge cut-off' problem inherent in LLMs and significantly reduces hallucination by providing factual grounding. Advanced RAG techniques explore re-ranking retrieved documents, query expansion, and fine-tuning the retriever and generator components jointly (e.g., REALM, DPR).

🔗 관련 용어

선행 지식:

📚 출처