Transformer

Transformer to model głębokiego uczenia, który wykorzystuje mechanizmy samo-uwagi do ważenia znaczenia różnych części danych wejściowych podczas ich przetwarzania, doskonale sprawdzający się w zadaniach takich jak przetwarzanie języka naturalnego.

🌐 Terminy w innych językach:

English Deutsch Español Français 日本語 한국어 Polski Português Русский Türkçe Українська

Transformery, wprowadzone w artykule "Attention is All You Need", zrewolucjonizowały dziedzinę AI, szczególnie w przetwarzaniu języka naturalnego (NLP). W przeciwieństwie do rekurencyjnych sieci neuronowych (RNN), które przetwarzają dane sekwencyjnie, Transformery używają samo-uwagi do jednoczesnej analizy wszystkich danych wejściowych, co pozwala na paralelizację i szybsze szkolenie. Ta architektura umożliwia modelowi zrozumienie relacji między różnymi częściami wejścia, co prowadzi do poprawy wydajności w zadaniach takich jak tłumaczenie maszynowe, podsumowywanie tekstu i odpowiadanie na pytania. Mechanizm samo-uwagi pozwala modelowi skupić się na najbardziej istotnych częściach sekwencji wejściowej, niezależnie od ich pozycji, skutecznie wychwytując zależności dalekiego zasięgu.

        graph LR
  Center["Transformer"]:::main
  Pre_neural_network["neural-network"]:::pre --> Center
  click Pre_neural_network "/terms/neural-network"
  Pre_linear_algebra["linear-algebra"]:::pre --> Center
  click Pre_linear_algebra "/terms/linear-algebra"
  Pre_deep_learning["deep-learning"]:::pre --> Center
  click Pre_deep_learning "/terms/deep-learning"
  Rel_transformer_architecture["transformer-architecture"]:::related -.-> Center
  click Rel_transformer_architecture "/terms/transformer-architecture"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

🕸️ Open in Universe

🧠 Sprawdzenie wiedzy

1 / 3

🧒 Wyjaśnij jak 5-latkowi

It's like a super-smart reader that can look at all the words in a sentence at once and figure out which words are most important to understand the meaning of each individual word.

🤓 Expert Deep Dive

The Transformer model's success stems from its ability to model dependencies without regard to their distance in the input or output sequences. The self-attention mechanism computes a weighted sum of value vectors, where the weight assigned to each value is determined by the compatibility (dot product) of its corresponding key vector with a query vector. This allows for direct modeling of relationships between any two positions in the sequence. Multi-head attention further enhances this by allowing the model to jointly attend to information from different representation subspaces at different positions. The encoder uses stacked self-attention and point-wise feed-forward layers, while the decoder adds masked self-attention (to prevent attending to future tokens) and encoder-decoder attention. The absence of recurrence makes it highly parallelizable, leading to faster training times on modern hardware compared to RNNs. However, the quadratic complexity of self-attention with respect to sequence length ($O(n^2)$) remains a bottleneck for very long sequences, prompting research into more efficient variants.

🔗 Powiązane terminy

Wymagana wiedza:

📚 Źródła

1. Attention is All You Need

2. The Transformer: Novel Neural Network Architecture for Language Understanding

3. Language Modeling with Transformers