Adding, changing and deleting data¶
With many data sets, you may want to perform a transformation based on the values in an array, series or column in a DataFrame. To do this, we look at the first Unicode characters:
[1]:
import numpy as np
import pandas as pd
[2]:
df = pd.DataFrame(
{
"Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
"Decimal": [0, 1, 2, 3, 4, 5],
"Octal": ["001", "002", "003", "004", "004", "005"],
"Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
}
)
df
[2]:
| Code | Decimal | Octal | Key | |
|---|---|---|---|---|
| 0 | U+0000 | 0 | 001 | NUL |
| 1 | U+0001 | 1 | 002 | Ctrl-A |
| 2 | U+0002 | 2 | 003 | Ctrl-B |
| 3 | U+0003 | 3 | 004 | Ctrl-C |
| 4 | U+0004 | 4 | 004 | Ctrl-D |
| 5 | U+0005 | 5 | 005 | Ctrl-E |
Adding data¶
Suppose you want to add a column in which characters are assigned to the C0 or C1 control code:
[3]:
control_code = {
"u+0000": "C0",
"u+0001": "C0",
"u+0002": "C0",
"u+0003": "C0",
"u+0004": "C0",
"u+0005": "C0",
}
The map method for a series accepts a function or a dict-like object containing a mapping, but here we have a small problem because some of the codes in control_code are lowercase, but not in our DataFrame. Therefore, we need to convert each value to lowercase using the str.lower method:
[4]:
lowercased = df["Code"].str.lower()
lowercased
[4]:
0 u+0000
1 u+0001
2 u+0002
3 u+0003
4 u+0004
5 u+0005
Name: Code, dtype: object
[5]:
df["Control code"] = lowercased.map(control_code)
df
[5]:
| Code | Decimal | Octal | Key | Control code | |
|---|---|---|---|---|---|
| 0 | U+0000 | 0 | 001 | NUL | C0 |
| 1 | U+0001 | 1 | 002 | Ctrl-A | C0 |
| 2 | U+0002 | 2 | 003 | Ctrl-B | C0 |
| 3 | U+0003 | 3 | 004 | Ctrl-C | C0 |
| 4 | U+0004 | 4 | 004 | Ctrl-D | C0 |
| 5 | U+0005 | 5 | 005 | Ctrl-E | C0 |
We could also have passed a function that does all the work:
[6]:
df["Code"].map(lambda x: control_code[x.lower()])
[6]:
0 C0
1 C0
2 C0
3 C0
4 C0
5 C0
Name: Code, dtype: object
Using map is a convenient way to perform element-by-element transformations and other data cleansing operations.
Modifying data¶
The replace method can be used to replace certain values with others.
[7]:
s = pd.Series(["Manpower", "man-made", np.nan])
[8]:
s.replace("Man", "Personal")
[8]:
0 Manpower
1 man-made
2 NaN
dtype: object
[9]:
s.replace("[Mm]an", "Personal", regex=True)
[9]:
0 Personalpower
1 Personal-made
2 NaN
dtype: object
[10]:
s.replace(["[Mm]an", np.nan], ["Personal", 0], regex=True)
[10]:
0 Personalpower
1 Personal-made
2 0
dtype: object
[11]:
s.replace(["[Mm]an", np.nan], ["Personal", len(s)], regex=True)
[11]:
0 Personalpower
1 Personal-made
2 3
dtype: object
See also:
Deleting data¶
Deleting one or more entries from an axis is easy if you already have an index array or list without those entries.
Since this may require a little set theory, we return the drop method as a new object without the deleted value(s):
[12]:
s = pd.Series(np.random.randn(7))
s
[12]:
0 0.301876
1 -0.022517
2 0.150433
3 -1.280364
4 0.311058
5 -0.312778
6 0.241354
dtype: float64
[13]:
new = s.drop(2)
new
[13]:
0 0.301876
1 -0.022517
3 -1.280364
4 0.311058
5 -0.312778
6 0.241354
dtype: float64
[14]:
new = s.drop([2, 3])
new
[14]:
0 0.301876
1 -0.022517
4 0.311058
5 -0.312778
6 0.241354
dtype: float64
With DataFrames, index values can be deleted on both axes. To illustrate this, we will first create a sample DataFrame:
[15]:
data = {
"Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
"Decimal": [0, 1, 2, 3, 4, 5],
"Octal": ["001", "002", "003", "004", "004", "005"],
"Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
}
df = pd.DataFrame(data)
df
[15]:
| Code | Decimal | Octal | Key | |
|---|---|---|---|---|
| 0 | U+0000 | 0 | 001 | NUL |
| 1 | U+0001 | 1 | 002 | Ctrl-A |
| 2 | U+0002 | 2 | 003 | Ctrl-B |
| 3 | U+0003 | 3 | 004 | Ctrl-C |
| 4 | U+0004 | 4 | 004 | Ctrl-D |
| 5 | U+0005 | 5 | 005 | Ctrl-E |
[16]:
df.drop([0, 1])
[16]:
| Code | Decimal | Octal | Key | |
|---|---|---|---|---|
| 2 | U+0002 | 2 | 003 | Ctrl-B |
| 3 | U+0003 | 3 | 004 | Ctrl-C |
| 4 | U+0004 | 4 | 004 | Ctrl-D |
| 5 | U+0005 | 5 | 005 | Ctrl-E |
You can also remove values from the columns by passing axis=1 or axis="columns":
[17]:
df.drop("Decimal", axis=1)
[17]:
| Code | Octal | Key | |
|---|---|---|---|
| 0 | U+0000 | 001 | NUL |
| 1 | U+0001 | 002 | Ctrl-A |
| 2 | U+0002 | 003 | Ctrl-B |
| 3 | U+0003 | 004 | Ctrl-C |
| 4 | U+0004 | 004 | Ctrl-D |
| 5 | U+0005 | 005 | Ctrl-E |
Many functions, such as drop, which change the size or shape of an array or DataFrame, can manipulate an object in place without returning a new object:
[18]:
df.drop(0, inplace=True)
df
[18]:
| Code | Decimal | Octal | Key | |
|---|---|---|---|---|
| 1 | U+0001 | 1 | 002 | Ctrl-A |
| 2 | U+0002 | 2 | 003 | Ctrl-B |
| 3 | U+0003 | 3 | 004 | Ctrl-C |
| 4 | U+0004 | 4 | 004 | Ctrl-D |
| 5 | U+0005 | 5 | 005 | Ctrl-E |
Warning:
Be careful with the inplace function, as the data will be irretrievably deleted.
See also: