Sorting and ranking¶

Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section Reordering and sorting from levels. In the following we look at sorting the values with DataFrame.sort_values and Series.sort_values:

[1]:

import numpy as np
import pandas as pd


rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s.sort_index(ascending=False)

[1]:

6   -0.287551
5   -0.073895
4    0.077808
3    0.647918
2    1.370572
1   -0.071934
0    0.823556
dtype: float64

All missing values are sorted to the end of the row by default:

[2]:

s = pd.Series(rng.normal(size=7))
s[s < 0] = np.nan

s.sort_values()

[2]:

5    0.502380
3    1.347849
4    1.488811
0         NaN
1         NaN
2         NaN
6         NaN
dtype: float64

With a DataFrame you can sort on both axes. With by you specify which column or row is to be sorted:

[3]:

df = pd.DataFrame(rng.normal(size=(7, 3)))

df.sort_values(by=2, ascending=False)

[3]:

You can also sort rows with axis=1 and by:

[4]:

df.sort_values(axis=1, by=[0, 1], ascending=False)

[4]:

Ranking¶

DataFrame.rank and Series.rank assign ranks from one to the number of valid data points in an array:

[5]:

df.rank()

[5]:

If ties occur in the ranking, the middle rank is usually assigned in each group.

[6]:

df2 = pd.concat([df, df[5:]])

df2.rank()

[6]:

The parameter min, on the other hand, assigns the smallest rank in the group:

[7]:

df2.rank(method="min")

[7]:

Method	Description
`average`	default: assign the average rank to each entry in the same group
`min`	uses the minimum rank for the whole group
`max`	uses the maximum rank for the whole group
`first`	assigns the ranks in the order in which the values appear in the data
`dense`	like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group