Comparison of Keras and PyTorch syntaxes

March 2, 2021

6 minutes read

Keras and PyTorch are popular frameworks for building programs with deep learning. The former, Keras, is more precisely an abstraction layer for Tensorflow and offers the capability to prototype models fast. There are similar abstraction layers developped on top of PyTorch, such as PyTorch Ignite or PyTorch lightning. They are not yet as mature as Keras, but are worth the try!

I found few resources or articles comparing codes in both Keras and PyTorch and I will show such example in this article, to help understand the key differences in terms of syntax and naming between frameworks. This article is the first of a series. After comparing syntaxes in this article, I will demonstrate a practical example on sentiment classification comparing both frameworks in a subsequent article I will publish later this month.

Prepare the data

The first comparison is on how data is loaded and prepared. Loading data can be achieved in a very similar fashion between both frameworks, using utils.Sequence class in Keras and using utils.dataset in PyTorch. In Keras you would have something like

import numpy as np
from tensorflow.keras.utils import Sequence
from pathlib import Path
class CustomGenerator(Sequence):
def __init__(self, path):
    self.path = Path(path)
    self.filenames = list(self.path.glob("**/*.npy"))

def __len__(self):
    return len(self.filenames)

def __getitem__(self, index):
    fn = self.filenames[index]
    vector = np.load(fn)
    return vector

And here is the same code in PyTorch.

import numpy as np
from torch.utils.data import Dataset
from pathlib import Path
class CustomDataset(Dataset):
def __init__(self, path):
    self.path = Path(path)
    self.filenames = list(self.path.glob("**/*.npy"))

def __len__(self):
    return len(self.filenames)

def __getitem__(self, index):
    fn = self.filenames[index]
    vector = torch.from_numpy(np.load(fn))
    return vector

Define and build models

In Keras, we can define and build a model at the same time. In the following example, we use the Sequential (https://keras.io/api/models/sequential/) to build an LSTM network with an embedding layer. and single neuron output.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
def build_model(vocab_size=50, embedding_dim=16,
             hidden_size=8):
 model = Sequential()
 model.add(Embedding(vocab_size, embedding_dim))
 model.add(LSTM(hidden_size))
 model.add(Dense(1, activation="sigmoid"))
 model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
 return model

And here is the same architecture in PyTorch.

from torch import nn
from torch.nn.functional import softmax
class CustomModel(nn.Module):
 def __init__(self, vocab_size=50,
              embedding_dim=16,
              hidden_size=8):
     super().__init__()
     self.encoder = nn.Embedding(vocab_size, embedding_dim)
     self.lstm = nn.LSTM(embedding_dim, hidden_size)
     self.linear = nn.Linear(hidden_size, 1)
self.activation = nn.Sigmoid()

 def forward(self, x):
     output = self.encoder(x)
     output, _ = self.lstm(output)
     output = output[-1] # Keep last output only
     output = self.linear(output)
output = self.activation(output)
     return output

The __init__ function instantiates the different modules of the network while the actual computation is decided in the forward function. Actually, we still need to "compile" the model like in the Keras example. However, as you will see in how models are trained, we define metrics, models and optimizers separately in PyTorch and call them when needed in the training loop. So we only need to define the same criterion for metric and the same optimizer as above.

  import torch
  criterion = nn.BCELoss()
  optimizer = torch.optim.Adam(model.parameters())
  model = CustomModel()

In most cases, default parameters in Keras will match defaults in PyTorch, as it is the case for the Adam optimizer and the BCE (Binary Cross-Entropy) loss.

To summarize, we have this table of comparison of the two syntaxes.

Keras	Pytorch
Model.call	Module.forward
layers.Layer	nn.Module
layers.Dense	nn.Linear (without activation)
layers.LSTM(return_sequences=True)	nn.LSTM
utils.Sequence	utils.data.Dataset
activation="sigmoid" parameter to layer object	nn.Sigmoid()
loss="binary-crossentropy" parameter to model.compile	nn.BCELoss
optimizer="adam" parameter to model.compile	torch.optim.Adam()

Manipulate tensors

Both frameworks have their own specificities in syntax for manipulating tensors. Here we will compare PyTorch and Tensorflow.

Shape of tensors

Pytorch has .shape and .size which are both equivalent to access the shape of tensors.

t = torch.zeros((4, 3))
print(t.shape, t.size())         # Both equal to (4, 3)
t.shape[1], t.size(1)     # Both equal to 3

Tensorflow has only .shape

t = tf.zeros((4, 3))
print(t.shape)            # .size is not available
print(t.shape[1])

Order of dimensions

Keras usually orders dimensions as (batch_size, seq_len, input_dim), whereas Pytorch prefers to order them by default as (seq_len, batch_size, input_dim). In PyTorch, recurrent networks like LSTM, GRU have a switch parameter batch_first which, if set to True, will expect inputs to be of shape (seq_len, batch_size, input_dim). However modules like Transformer do not have such parameter. In this case, the input will have to be adapted. To do so, you can switch dimensions in Pytorch using .transpose method.

data = torch.Tensor(tensor_with_batch_first)
data.transpose(0, 1)            # Switch first and second dimensions

The order chosen by PyTorch is more natural from a parallel computing viewpoint. For example, a recurrent layer will be applied in parallel at each step of the sequence, to all batch, so we will iterate over the seq_len dimension which is first. The order preferred by Keras is more natural in terms of model architecture, since we would rather consider one input sequence to be fed to the model, then simply duplicate the operation for a batch.

Initialize vectors

PyTorch has a syntax very similar to numpy.

torch.ones(2, 4, 1)         # Matrix of size (2, 4, 1) filled with 1. Same as torch.zeros
torch.eye(3)                # Identity matrix of size (3,3)

Good news! All above methods are present and work the same in Tensorflow. In addition, we have torch.full which is the equivalent of numpy.fill, for filling a tensor with one value. Tensorflow has tf.fill.

torch.full((2, 4), fill_value=3.14)  # Fill a (2, 4) matrix with 3.14 value.
tf.fill((2, 4), value=3.14)

Here is how to sample a matrix of random numbers

torch.randn(2, 3)            # Sample from N(0, 1) a matrix of size (2, 3)
tf.random.normal(shape=[2, 3])

torch.randint(low=10, high=20, size=(2, 5))  # Sample uniformely a (2, 5) matrix of integers within [10, 20[
tf.random.uniform(shape=[2, 5], minval=10, maxval=20, dtype=tf.int64)

Reproducibility seed for the random number generator can be set with

torch.random.manual_seed(0)
tf.random.set_seed(0)

Conclusion

While Keras and Pytorch have very similar data loading logic, their syntax quite differs for the rest. PyTorch has a pythonic syntax while Keras is designed for writing short and concise programs, without taking too much time on expliciting building blocks. There are many more points of comparison but I hope this article gives some insights on both frameworks. For the sake of completeness, I share some resources I found covering a comparison between Keras and PyTorch.

Comparison and speed benchmark of Keras and PyTorch with a ConvNet architecture: https://deepsense.ai/keras-or-pytorch/
A multi-GPU framework comparison: https://medium.com/@iliakarmanov/multi-gpu-rosetta-stone-d4fa96162986
A rosetta stone repository between deep learning frameworks: https://github.com/ilkarman/DeepLearningFrameworks/
Comparison of Keras and PyTorch on image classification: https://deepsense.ai/keras-vs-pytorch-avp-transfer-learning/

In the next article, I will present a practical implementation for sentiment classification with comparison in both Keras and PyTorch.

Adam Oudad