Sequence Networks
Sequence networks can have either input as sequence, output as sequence, or both. We categorize them into three main types:
- Vec2Seq
- Seq2Vec
- Seq2Seq
1. Vec2Seq (Sequence Generation)
$$
f_{\theta}:\mathbb{R}^{D}\to\mathbb{R}^{N_{\infty}\cdot C}
$$
$$
p(y_{1:T}|x)=\sum_{h_{1:T}}p(y_{1:T},h_{1:T}|x)=\sum_{h_{1:T}}\prod_{t=1}^{T}p(y_{t}|h_{t})p(h_{t}|h_{t-1},y_{t-1},x)
$$
Notation:
\(h_t\): hidden state at time \(t\)
\(p(h_1|h_0,y_0,x) = p(h_1|x)\): initial hidden state distribution
\(h_t\): hidden state at time \(t\)
\(p(h_1|h_0,y_0,x) = p(h_1|x)\): initial hidden state distribution
For categorical and real-valued outputs:
$$
p(y_t|h_t) = \text{Cat}(y_t | \text{softmax}(W_{hy} h_t + b_y))
$$
$$
p(y_t|h_t) = \mathcal{N}(y_t | W_{hy} h_t + b_y, \sigma^2 I)
$$
This generative model is called a Recurrent Neural Network (RNN).
2. Seq2Vec (Sequence Classification)
$$
f_{\theta}:\mathbb{R}^{T D} \to \mathbb{R}^{C}
$$
Output is a class label: \(y \in \{1, \dots, C\}\)
$$
p(y|x_{1:T}) = \text{Cat}(y | \text{softmax}(W h_T))
$$
Better results are obtained if hidden states depend on both past and future context, giving rise to Bidirectional RNNs:
$$
h_t^{\rightarrow} = \varphi(W_{xh}^{\rightarrow} x_t + W_{hh}^{\rightarrow} h_{t-1}^{\rightarrow} + b_h^{\rightarrow})
$$
$$
h_t^{\leftarrow} = \varphi(W_{xh}^{\leftarrow} x_t + W_{hh}^{\leftarrow} h_{t+1}^{\leftarrow} + b_h^{\leftarrow})
$$
$$
h_t = [h_t^{\rightarrow}, h_t^{\leftarrow}]
$$
3. Seq2Seq (Sequence-to-Sequence Models)
Maps input sequence \(x_{1:T}\) to output sequence \(y_{1:T'}\). Commonly implemented using RNN encoder-decoder, LSTM, GRU, or Transformer architectures.
$$
p(y_{1:T'}|x_{1:T}) = \prod_{t=1}^{T'} p(y_t | y_{< t}, x_{1:T})
$$
Notation:
\(x_{1:T}\): input sequence of length \(T\)
\(y_{1:T'}\): output sequence of length \(T'\)
\(y_{< t}\): previously generated tokens
\(x_{1:T}\): input sequence of length \(T\)
\(y_{1:T'}\): output sequence of length \(T'\)
\(y_{< t}\): previously generated tokens
Seq2Seq models are widely used in machine translation, text summarization, and speech recognition.

Comments
Post a Comment