PyTorch | Special Max Functions
In here I will examine PyTorch most used max functions and functions derived from max functions.
Table of Contents:
argmax vs. max
max
function returns both the values and indices, and argmax
returns just the indices.
bs = 16
c = 10
inp = torch.rand(bs,c)
print(inp)
val, ind= inp.max(0)
print(val,ind)
ind=torch.argmax(inp)
print(ind)
ind=torch.argmax(inp, dim=-1)
print(ind)
Out:
tensor([[0.8666, 0.9182, 0.9221, 0.0614, 0.6921, 0.7990, 0.7242, 0.0850, 0.5979,
0.5338],
[0.7762, 0.4818, 0.3937, 0.3368, 0.9466, 0.8684, 0.2778, 0.3191, 0.2533,
0.9955],
[0.8546, 0.2984, 0.2758, 0.2672, 0.7125, 0.8232, 0.6431, 0.6561, 0.4301,
0.1402],
[0.9645, 0.4607, 0.7776, 0.1615, 0.8907, 0.0280, 0.3052, 0.0927, 0.6251,
0.8051],
[0.2568, 0.1202, 0.8515, 0.8270, 0.4015, 0.1637, 0.0748, 0.9239, 0.1780,
0.7411],
[0.2186, 0.2376, 0.9768, 0.4281, 0.1449, 0.6928, 0.2812, 0.6096, 0.3918,
0.0351],
[0.9216, 0.5829, 0.5170, 0.9244, 0.7714, 0.4040, 0.8010, 0.7712, 0.7016,
0.3892],
[0.4871, 0.7730, 0.6121, 0.0313, 0.0185, 0.3041, 0.4070, 0.7756, 0.9997,
0.5943],
[0.1915, 0.9406, 0.0933, 0.8587, 0.3801, 0.8114, 0.2761, 0.0516, 0.4894,
0.4485],
[0.5983, 0.5491, 0.2320, 0.0991, 0.9607, 0.6197, 0.8853, 0.4235, 0.9316,
0.2995],
[0.1200, 0.7855, 0.0331, 0.5068, 0.4881, 0.1931, 0.2154, 0.8744, 0.1847,
0.8747],
[0.9883, 0.3393, 0.6095, 0.9093, 0.9551, 0.5177, 0.1650, 0.0240, 0.4617,
0.2242],
[0.2640, 0.2038, 0.9602, 0.2644, 0.9622, 0.2912, 0.3437, 0.9473, 0.3629,
0.5049],
[0.3104, 0.7076, 0.9408, 0.2040, 0.5645, 0.4079, 0.3781, 0.8250, 0.6933,
0.3200],
[0.4698, 0.4686, 0.2541, 0.3936, 0.5561, 0.4596, 0.4405, 0.3231, 0.0915,
0.2346],
[0.9101, 0.9437, 0.9523, 0.6999, 0.8808, 0.1516, 0.6469, 0.1138, 0.0067,
0.4576]])
tensor([0.9883, 0.9437, 0.9768, 0.9244, 0.9622, 0.8684, 0.8853, 0.9473, 0.9997,
0.9955]) tensor([11, 15, 5, 6, 12, 1, 9, 12, 7, 1])
tensor(78)
tensor([2, 9, 0, 0, 7, 2, 3, 8, 1, 4, 9, 0, 4, 2, 4, 2])
argmax()
may also take the dim
parameter, and if dim
is None
then the input
parameters is flattened and single index is returned.
In the example above when the dim
is -1 we have 16 outputs. When the dim=1
this is equivalent.
Often, you may see the last dimension or dim=-1
, which is in this case equivalent to dim=1
.
ind= torch.argmax(inp, dim=1)
print(ind)
ind= torch.argmax(inp, dim=-1)
print(ind)
val, ind= inp.max(-1)
print(val,ind)
Out:
tensor([3, 5, 3, 0, 0, 6, 6, 8, 7, 4, 0, 6, 6, 7, 0, 5])
tensor([3, 5, 3, 0, 0, 6, 6, 8, 7, 4, 0, 6, 6, 7, 0, 5])
tensor([0.8815, 0.8207, 0.9557, 0.8296, 0.9502, 0.9504, 0.9463, 0.9528, 0.9314,
0.8503, 0.9170, 0.8854, 0.9080, 0.9811, 0.9985, 0.9126]) tensor([3, 5, 3, 0, 0, 6, 6, 8, 7, 4, 0, 6, 6, 7, 0, 5])
softmax vs. softmax with temperature vs. logsoftmax
softmax
returns probabilities of being max for all input values according to the formula:
$\operatorname {softmax}( x_i ) = { e^{x_i} \over \sum_{j=1}^k { e^{x_j} } }$
The true maximum will get the maximum probability value. You may use softmax
as:
torch.nn.functional.softmax()
torch.softmax()
nn.Softmax()
These should be equivalent.
Example: softmax
variations
inp = torch.rand(2,3)
m = nn.Softmax(dim=1)
sm = m(inp)
print(sm)
sm = torch.softmax(inp, dim=1)
print(sm)
m = nn.LogSoftmax(dim=1)
lsm = m(inp)
print(lsm)
Out:
tensor([[0.2007, 0.3282, 0.4711],
[0.2266, 0.4048, 0.3686]])
tensor([[0.2007, 0.3282, 0.4711],
[0.2266, 0.4048, 0.3686]])
tensor([[-1.6059, -1.1142, -0.7527],
[-1.4844, -0.9044, -0.9980]])
One addition to regular softmax()
function would be the temperature.
Here is how this works:
inp = torch.rand(2,3) #input tensor
t = 0.1 # temperature
out = torch.softmax(inp/t, dim=1)
print(out)
out = torch.softmax(inp, dim=1)
print(out)
Out:
tensor([[1.6607e-04, 6.5169e-01, 3.4814e-01],
[8.9896e-01, 2.0861e-02, 8.0183e-02]])
tensor([[0.1840, 0.4208, 0.3952],
[0.4046, 0.2777, 0.3177]])
Few important notes about softmax()
:
- it is a generalization of logistic function used in logistic regression, with
softmax()
it is called multinomial logistic regression. softmax()
probabilities for all the inputs should add to 1- calculating
log_softmax()
is numerically stable comparing the calculatinglog()
aftersoftmax()
logsoftmax vs. crossentropy
In PyTorch you can use:
- class
torch.nn.LogSoftmax(...)
- or function
torch.nn.functional.log_softmax(...)
Functions in PyTorch use
_
as a separator and classes use CamelCase.
Part torch.nn.functional
you replace with F
and torch.nn
with nn
import torch.nn.functional as F
import torch.nn as nn
You can implement log_softmax()
yourself:
def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)
Which is numerically stable. Both PyTorch and Tensorflow use this log-sum-exp trick.
Entropy or Shannon entropy is a measurable item assuming a system (or random variable mathematically with associated probability distribution) that moves that system into chaos.
The idea of the system assumes the system contains particles and that the particles all have some information (energy), so that the entropy is the weighted sum of all particle information.
I imagine the entropy as a force that takes the energy from the system so that energy disappears, something like heat dissipation force.
When the entropy is maximized heat dissipation is maximized for every particle.
You can say this other way: Entropy will be maximized if uncertainty of any element is maximized.
For isolated system, the entropy is maximized if we find the place of equilibrium, which is maximum entropy state. The system stays at maximum entropy state, because any change would reduce entropy.
If we recall the analogy, this equilibrium state would be the state with maximum dissipation.
Cross-entropy is a measure building upon two random variables (two systems) calculating the difference between two probability distributions, and it is a loss function.
In PyTorch, you can just write:
loss = F.cross_entropy(x, target)
which is equivalent to:
lp = F.log_softmax(x, dim=-1)
loss = F.nll_loss(lp, target)
Note that nll_loss()
function doesn’t contain any logarithmic operation inside. The log operation is inside log_softmax()
only.
nll_loss()
will simple do the negative operation and calculate the mean for the full batch.
Example: CrossEntropy function
batch_size, n_classes = 10, 5
x = torch.randn(batch_size, n_classes)
print("x:",x)
target = torch.randint(n_classes, size=(batch_size,), dtype=torch.long)
print("target:",target)
def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)
def nll_loss(p, target):
return -p[range(target.shape[0]), target].mean()
pred = log_softmax(x)
print ("pred:", pred)
ohe = torch.zeros(batch_size, n_classes)
ohe[range(ohe.shape[0]), target]=1
print("ohe:",ohe)
pe = pred[range(target.shape[0]), target]
print("pe:",pe)
mean = pred[range(target.shape[0]), target].mean()
print("mean:",mean)
negmean = -mean
print("negmean:", negmean)
loss = nll_loss(pred, target)
print("loss:",loss)
Out:
x: tensor([[ 1.5837, -1.3132, 1.5513, 1.4422, 0.8072],
[ 1.1740, 1.9250, 0.4258, -1.0320, -0.4650],
[-1.2447, -0.5360, -1.4950, 1.2020, 1.2724],
[ 0.2300, 0.2587, -0.4463, -0.1397, -0.3617],
[-0.7983, 0.7742, 0.0035, 0.9963, -0.7926],
[ 0.7575, -0.8008, 0.7995, 0.0448, 0.6621],
[-1.7153, 0.7672, -0.6841, -0.4826, -0.8614],
[ 0.0263, 0.7244, 0.8751, -1.0226, -1.3762],
[ 0.0192, -0.4368, -0.4010, -1.0660, 0.0364],
[-0.5120, -1.4871, 0.6758, 1.2975, 0.2879]])
target: tensor([0, 4, 3, 0, 0, 4, 1, 2, 4, 2])
pred: tensor([[-1.2094, -4.1063, -1.2418, -1.3509, -1.9859],
[-1.3601, -0.6091, -2.1083, -3.5661, -2.9991],
[-3.3233, -2.6146, -3.5736, -0.8766, -0.8063],
[-1.3302, -1.3015, -2.0065, -1.7000, -1.9220],
[-2.7128, -1.1403, -1.9109, -0.9181, -2.7070],
[-1.2955, -2.8538, -1.2535, -2.0081, -1.3909],
[-3.0705, -0.5881, -2.0394, -1.8379, -2.2167],
[-1.7823, -1.0841, -0.9334, -2.8311, -3.1847],
[-1.2936, -1.7496, -1.7138, -2.3788, -1.2764],
[-2.5641, -3.5393, -1.3764, -0.7546, -1.7643]])
ohe: tensor([[1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 1., 0., 0.]])
pe: tensor([-1.2094, -2.9991, -0.8766, -1.3302, -2.7128, -1.3909, -0.5881, -0.9334,
-1.2764, -1.3764])
mean: tensor(-1.4693)
negmean: tensor(1.4693)
loss: tensor(1.4693)
…
tags: functions - softmax - argmax - logistic regression - softmax with temperature - logsoftmax - numerical stability - max - argmax - pytorch - multinomial logistic regression & category: pytorch