Multi-label classification encoding with TensorFlow
From integers to multi-hot encoding
Multi-label classication problems happen when an observation can belong to more than one class. They happen quite often in practice, one example being video classification. In order to solve multi-label classification problems with TensorFlow, we need to be able to express the label variable using multi-hot encoding. For the sake of this post, assume our classification has 5 classes and that each observation can belong to one or more classes.
import tensorflow as tf
Tensorflow version used:
print(tf.__version__)
Here is a single observation that belong to the second and third class.
indice = tf.constant([1, 2]) # We want to generate [0, 1, 1, 0, 0]
one_hot = tf.one_hot(indices=indice, depth=5)
multi_hot = tf.reduce_max(one_hot, axis = 0) # reduce across axis = 0
one_hot.shape
multi_hot
indices = tf.ragged.constant([[1, 2], [1], [3, 2]]) # We want [
# [0, 1, 1, 0, 0],
# [0, 1, 0, 0, 0],
# [0, 0, 1, 1, 0]
# ]
one_hot = tf.one_hot(indices=indices, depth=5)
multi_hot = tf.reduce_max(one_hot, axis = 1) # reduce across axis = 1
one_hot.shape
multi_hot