Example: Lets take $V(s) = \mathbb{E}_{\pi}[G_t | s]$ for state $s$ as our baseline $b.$ Basically you take the mean returns of all possible actions at $s.$ You would expect the return of your action to be slightly better or worse than $b.$ So if $V(s)$ = 5 and reward of our action = 4: 4-5=-1. If $V(s)$ = -5 and reward of our action = -6: -6-(-5)=-1. So it's two actions that give wildly different returns as is (4 vs -6) but in the context of their situation they are only a bit bad (-1). Without baseline the second action would be extremely bad (-6) even though given the context it is only slightly bad (-1).
By the way, $V(s)$ can come from a neural network that has learnt it.