Keras and PyTorch are popular frameworks for building programs with deep learning. The former, Keras, is more precisely an abstraction layer for Tensorflow and offers the capability to prototype models fast. There are similar abstraction layers developped on top of PyTorch, such as PyTorch Ignite or PyTorch lightning. They are not yet as mature as Keras, but are worth the try!
I found few resources or articles comparing codes in both Keras and PyTorch and I will show such example in this article, to help understand the key differences in terms of syntax and naming between frameworks. This article is the first of a series. After comparing syntaxes in this article, I will demonstrate a practical example on sentiment classification comparing both frameworks in a subsequent article I will publish later this month.
Prepare the data
The first comparison is on how data is loaded and prepared. Loading data can be achieved in a very similar fashion between both frameworks, using utils.Sequence
class in Keras and using utils.dataset
in PyTorch.
In Keras you would have something like
import numpy as np
from tensorflow.keras.utils import Sequence
from pathlib import Path
class CustomGenerator(Sequence):
def __init__(self, path):
self.path = Path(path)
self.filenames = list(self.path.glob("**/*.npy"))
def __len__(self):
return len(self.filenames)
def __getitem__(self, index):
fn = self.filenames[index]
vector = np.load(fn)
return vector
And here is the same code in PyTorch.
import numpy as np
from torch.utils.data import Dataset
from pathlib import Path
class CustomDataset(Dataset):
def __init__(self, path):
self.path = Path(path)
self.filenames = list(self.path.glob("**/*.npy"))
def __len__(self):
return len(self.filenames)
def __getitem__(self, index):
fn = self.filenames[index]
vector = torch.from_numpy(np.load(fn))
return vector
Define and build models
In Keras, we can define and build a model at the same time. In the following example, we use the Sequential
(https://keras.io/api/models/sequential/) to build an LSTM network with an embedding layer. and single neuron output.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
def build_model(vocab_size=50, embedding_dim=16,
hidden_size=8):
model = Sequential()
model.add(Embedding(vocab_size, embedding_dim))
model.add(LSTM(hidden_size))
model.add(Dense(1, activation="sigmoid"))
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
return model
And here is the same architecture in PyTorch.
from torch import nn
from torch.nn.functional import softmax
class CustomModel(nn.Module):
def __init__(self, vocab_size=50,
embedding_dim=16,
hidden_size=8):
super().__init__()
self.encoder = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_size)
self.linear = nn.Linear(hidden_size, 1)
self.activation = nn.Sigmoid()
def forward(self, x):
output = self.encoder(x)
output, _ = self.lstm(output)
output = output[-1] # Keep last output only
output = self.linear(output)
output = self.activation(output)
return output
The __init__
function instantiates the different modules of the network while the actual computation is decided in the forward
function. Actually, we still need to "compile" the model like in the Keras example. However, as you will see in how models are trained, we define metrics, models and optimizers separately in PyTorch and call them when needed in the training loop. So we only need to define the same criterion for metric and the same optimizer as above.
import torch
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters())
model = CustomModel()
In most cases, default parameters in Keras will match defaults in PyTorch, as it is the case for the Adam optimizer and the BCE (Binary Cross-Entropy) loss.
To summarize, we have this table of comparison of the two syntaxes.
Keras | Pytorch |
---|---|
Model.call | Module.forward |
layers.Layer | nn.Module |
layers.Dense | nn.Linear (without activation) |
layers.LSTM(return_sequences=True) | nn.LSTM |
utils.Sequence | utils.data.Dataset |
activation="sigmoid" parameter to layer object | nn.Sigmoid() |
loss="binary-crossentropy" parameter to model.compile | nn.BCELoss |
optimizer="adam" parameter to model.compile | torch.optim.Adam() |
Manipulate tensors
Both frameworks have their own specificities in syntax for manipulating tensors. Here we will compare PyTorch and Tensorflow.
Shape of tensors
Pytorch has .shape
and .size
which are both equivalent to access the shape of tensors.
t = torch.zeros((4, 3))
print(t.shape, t.size()) # Both equal to (4, 3)
t.shape[1], t.size(1) # Both equal to 3
Tensorflow has only .shape
t = tf.zeros((4, 3))
print(t.shape) # .size is not available
print(t.shape[1])
Order of dimensions
Keras usually orders dimensions as (batch_size, seq_len, input_dim)
, whereas Pytorch prefers to order them by default as (seq_len, batch_size, input_dim)
. In PyTorch, recurrent networks like LSTM, GRU have a switch parameter batch_first
which, if set to True
, will expect inputs to be of shape (seq_len, batch_size, input_dim)
. However modules like Transformer do not have such parameter. In this case, the input will have to be adapted. To do so, you can switch dimensions in Pytorch using .transpose
method.
data = torch.Tensor(tensor_with_batch_first)
data.transpose(0, 1) # Switch first and second dimensions
The order chosen by PyTorch is more natural from a parallel computing viewpoint. For example, a recurrent layer will be applied in parallel at each step of the sequence, to all batch, so we will iterate over the seq_len
dimension which is first. The order preferred by Keras is more natural in terms of model architecture, since we would rather consider one input sequence to be fed to the model, then simply duplicate the operation for a batch.
Initialize vectors
PyTorch has a syntax very similar to numpy.
torch.ones(2, 4, 1) # Matrix of size (2, 4, 1) filled with 1. Same as torch.zeros
torch.eye(3) # Identity matrix of size (3,3)
Good news! All above methods are present and work the same in Tensorflow.
In addition, we have torch.full
which is the equivalent of numpy.fill
, for filling a tensor with one value. Tensorflow has tf.fill
.
torch.full((2, 4), fill_value=3.14) # Fill a (2, 4) matrix with 3.14 value.
tf.fill((2, 4), value=3.14)
Here is how to sample a matrix of random numbers
torch.randn(2, 3) # Sample from N(0, 1) a matrix of size (2, 3)
tf.random.normal(shape=[2, 3])
torch.randint(low=10, high=20, size=(2, 5)) # Sample uniformely a (2, 5) matrix of integers within [10, 20[
tf.random.uniform(shape=[2, 5], minval=10, maxval=20, dtype=tf.int64)
Reproducibility seed for the random number generator can be set with
torch.random.manual_seed(0)
tf.random.set_seed(0)
Conclusion
While Keras and Pytorch have very similar data loading logic, their syntax quite differs for the rest. PyTorch has a pythonic syntax while Keras is designed for writing short and concise programs, without taking too much time on expliciting building blocks. There are many more points of comparison but I hope this article gives some insights on both frameworks. For the sake of completeness, I share some resources I found covering a comparison between Keras and PyTorch.
- Comparison and speed benchmark of Keras and PyTorch with a ConvNet architecture
- https://deepsense.ai/keras-or-pytorch/
- A multi-GPU framework comparison
- https://medium.com/@iliakarmanov/multi-gpu-rosetta-stone-d4fa96162986
- A rosetta stone repository between deep learning frameworks
- https://github.com/ilkarman/DeepLearningFrameworks/
- Comparison of Keras and PyTorch on image classification
- https://deepsense.ai/keras-vs-pytorch-avp-transfer-learning/
In the next article, I will present a practical implementation for sentiment classification with comparison in both Keras and PyTorch.