Arquitectura Transformer

Arquitectura de red neuronal basada en la autoatención.

Contenido pendiente de traducción. Se muestra la versión en inglés.

Unlike earlier models (RNNs), Transformers process entire sequences of data simultaneously. This 'self-attention' allows the model to understand the relationship between distant words in a sentence, regardless of their position.

        graph LR
  Center["Arquitectura Transformer"]:::main
  Rel_attention_mechanism["attention-mechanism"]:::related -.-> Center
  click Rel_attention_mechanism "/terms/attention-mechanism"
  Rel_transformer["transformer"]:::related -.-> Center
  click Rel_transformer "/terms/transformer"
  Rel_natural_language_processing["natural-language-processing"]:::related -.-> Center
  click Rel_natural_language_processing "/terms/natural-language-processing"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

      

🧠 Prueba de conocimiento

1 / 1

🧒 Explícalo como si tuviera 5 años

A revolutionary way for computers to read. Instead of reading word by word, it looks at the whole page at once to understand how every word relates to the others.

🤓 Expert Deep Dive

Introduced the multi-head attention mechanism. It eliminates recursion, allowing for massive parallelization during training. It is the backbone of BERT, GPT, and T5.

📚 Fuentes