{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "28f76326",
   "metadata": {},
   "source": [
    "# Apply\n",
    "\n",
    "Die am allgemeinsten einsetzbare `GroupBy`-Methode ist `apply`. Sie teilt das zu bearbeitende Objekt  auf, ruft die übergebene Funktion auf jedem Teil auf und versucht dann, die Teile miteinander zu verketten."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd02d5f5",
   "metadata": {},
   "source": [
    "Nehmen wir an, wir wollen die fünf größten `hit`-Werte nach Gruppen auswählen. Hierzu schreiben wir zunächst eine Funktion, die die Zeilen mit den größten Werten in einer bestimmten Spalte auswählt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a5bbe4fa",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.103478Z",
     "iopub.status.busy": "2026-05-21T16:34:54.103283Z",
     "iopub.status.idle": "2026-05-21T16:34:54.315510Z",
     "shell.execute_reply": "2026-05-21T16:34:54.315072Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.103460Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1b99d975",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.316044Z",
     "iopub.status.busy": "2026-05-21T16:34:54.315919Z",
     "iopub.status.idle": "2026-05-21T16:34:54.323503Z",
     "shell.execute_reply": "2026-05-21T16:34:54.323199Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.316034Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "                 en         6073.0   7716.0   6547.0\n",
       "PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "                 en            NaN      NaN      NaN\n",
       "Python Basics    de          427.0    276.0    525.0\n",
       "                 en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"2021-12\": [30134, 6073, 4873, None, 427, 95],\n",
    "        \"2022-01\": [33295, 7716, 3930, None, 276, 226],\n",
    "        \"2022-02\": [19651, 6547, 2573, None, 525, 157],\n",
    "    },\n",
    "    index=[\n",
    "        [\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            \"Python Basics\",\n",
    "            \"Python Basics\",\n",
    "        ],\n",
    "        [\"de\", \"en\", \"de\", \"en\", \"de\", \"en\"],\n",
    "    ],\n",
    ")\n",
    "df.index.names = [\"Title\", \"Language\"]\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "da3c09a8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.323850Z",
     "iopub.status.busy": "2026-05-21T16:34:54.323774Z",
     "iopub.status.idle": "2026-05-21T16:34:54.327983Z",
     "shell.execute_reply": "2026-05-21T16:34:54.327660Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.323844Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "                 en         6073.0   7716.0   6547.0\n",
       "PyViz Tutorial   de         4873.0   3930.0   2573.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def top(df, n=5, column=\"2021-12\"):\n",
    "    return df.sort_values(by=column, ascending=False)[:n]\n",
    "\n",
    "\n",
    "top(df, n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45248446",
   "metadata": {},
   "source": [
    "Wenn wir nun z.B. nach Titeln gruppieren und `apply` mit dieser Funktion aufrufen, erhalten wir Folgendes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "deaf5428",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.328473Z",
     "iopub.status.busy": "2026-05-21T16:34:54.328382Z",
     "iopub.status.idle": "2026-05-21T16:34:54.332851Z",
     "shell.execute_reply": "2026-05-21T16:34:54.332651Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.328465Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                            2021-12  2022-01  2022-02\n",
       "Title            Title            Language                           \n",
       "Jupyter Tutorial Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "                                  en         6073.0   7716.0   6547.0\n",
       "PyViz Tutorial   PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "                                  en            NaN      NaN      NaN\n",
       "Python Basics    Python Basics    de          427.0    276.0    525.0\n",
       "                                  en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_titles = df.groupby(\"Title\")\n",
    "\n",
    "grouped_titles.apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08227af2",
   "metadata": {},
   "source": [
    "Was ist hier passiert? Die obere Funktion wird für jede Zeilengruppe des DataFrame aufgerufen, und dann werden die Ergebnisse mit [pandas.concat](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) zusammengefügt, wobei die Teile mit den Gruppennamen gekennzeichnet werden. Das Ergebnis hat daher einen hierarchischen Index, dessen innere Ebene Indexwerte aus dem ursprünglichen DataFrame enthält."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b37bee0a",
   "metadata": {},
   "source": [
    "Wenn ihr eine Funktion an `apply` übergebt, die andere Argumente oder Schlüsselwörter benötigt, könnt ihr diese nach der Funktion übergeben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0c89d8e6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.333233Z",
     "iopub.status.busy": "2026-05-21T16:34:54.333176Z",
     "iopub.status.idle": "2026-05-21T16:34:54.337450Z",
     "shell.execute_reply": "2026-05-21T16:34:54.337154Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.333226Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                            2021-12  2022-01  2022-02\n",
       "Title            Title            Language                           \n",
       "Jupyter Tutorial Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "PyViz Tutorial   PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "Python Basics    Python Basics    de          427.0    276.0    525.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_titles.apply(top, n=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96cc6370",
   "metadata": {},
   "source": [
    "Wir haben nun die grundlegende Verwendungsweise von `apply` gesehen. Was innerhalb der übergebenen Funktion geschieht, ist sehr vielseitig und bleibt euch überlassen; sie muss nur ein pandas-Objekt oder einen Einzelwert zurückgeben. Im Folgend werden wir daher hauptsächlich Beispielen zeigen, die euch Anregungen geben können, wie ihr verschiedene Probleme mit `groupby` lösen könnt."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8db08c56",
   "metadata": {},
   "source": [
    "Zunächst vergegenwärtigen wir uns nochmal an `describe`, aufgerufen über dem `GroupBy`-Objekt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a0ebb337",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.339233Z",
     "iopub.status.busy": "2026-05-21T16:34:54.339102Z",
     "iopub.status.idle": "2026-05-21T16:34:54.352732Z",
     "shell.execute_reply": "2026-05-21T16:34:54.352489Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.339222Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">2021-12</th>\n",
       "      <th colspan=\"5\" halign=\"left\">2022-01</th>\n",
       "      <th colspan=\"8\" halign=\"left\">2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>...</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <td>2.0</td>\n",
       "      <td>18103.5</td>\n",
       "      <td>17013.696262</td>\n",
       "      <td>6073.0</td>\n",
       "      <td>12088.25</td>\n",
       "      <td>18103.5</td>\n",
       "      <td>24118.75</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>20505.5</td>\n",
       "      <td>...</td>\n",
       "      <td>26900.25</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>13099.0</td>\n",
       "      <td>9265.927261</td>\n",
       "      <td>6547.0</td>\n",
       "      <td>9823.0</td>\n",
       "      <td>13099.0</td>\n",
       "      <td>16375.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <td>1.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>4873.00</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>4873.00</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>...</td>\n",
       "      <td>3930.00</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <td>2.0</td>\n",
       "      <td>261.0</td>\n",
       "      <td>234.759451</td>\n",
       "      <td>95.0</td>\n",
       "      <td>178.00</td>\n",
       "      <td>261.0</td>\n",
       "      <td>344.00</td>\n",
       "      <td>427.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>251.0</td>\n",
       "      <td>...</td>\n",
       "      <td>263.50</td>\n",
       "      <td>276.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>341.0</td>\n",
       "      <td>260.215295</td>\n",
       "      <td>157.0</td>\n",
       "      <td>249.0</td>\n",
       "      <td>341.0</td>\n",
       "      <td>433.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 2021-12                                                    \\\n",
       "                   count     mean           std     min       25%      50%   \n",
       "Title                                                                        \n",
       "Jupyter Tutorial     2.0  18103.5  17013.696262  6073.0  12088.25  18103.5   \n",
       "PyViz Tutorial       1.0   4873.0           NaN  4873.0   4873.00   4873.0   \n",
       "Python Basics        2.0    261.0    234.759451    95.0    178.00    261.0   \n",
       "\n",
       "                                    2022-01           ...                     \\\n",
       "                       75%      max   count     mean  ...       75%      max   \n",
       "Title                                                 ...                      \n",
       "Jupyter Tutorial  24118.75  30134.0     2.0  20505.5  ...  26900.25  33295.0   \n",
       "PyViz Tutorial     4873.00   4873.0     1.0   3930.0  ...   3930.00   3930.0   \n",
       "Python Basics       344.00    427.0     2.0    251.0  ...    263.50    276.0   \n",
       "\n",
       "                 2022-02                                                 \\\n",
       "                   count     mean          std     min     25%      50%   \n",
       "Title                                                                     \n",
       "Jupyter Tutorial     2.0  13099.0  9265.927261  6547.0  9823.0  13099.0   \n",
       "PyViz Tutorial       1.0   2573.0          NaN  2573.0  2573.0   2573.0   \n",
       "Python Basics        2.0    341.0   260.215295   157.0   249.0    341.0   \n",
       "\n",
       "                                    \n",
       "                      75%      max  \n",
       "Title                               \n",
       "Jupyter Tutorial  16375.0  19651.0  \n",
       "PyViz Tutorial     2573.0   2573.0  \n",
       "Python Basics       433.0    525.0  \n",
       "\n",
       "[3 rows x 24 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = grouped_titles.describe()\n",
    "\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ccdb4d5",
   "metadata": {},
   "source": [
    "Wenn ihr innerhalb von `GroupBy` eine Methode wie `describe` aufruft, ist dies eigentlich nur eine Abkürzung für:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "967709f7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.353165Z",
     "iopub.status.busy": "2026-05-21T16:34:54.353077Z",
     "iopub.status.idle": "2026-05-21T16:34:54.364029Z",
     "shell.execute_reply": "2026-05-21T16:34:54.363422Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.353158Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>count</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>18103.500000</td>\n",
       "      <td>20505.500000</td>\n",
       "      <td>13099.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>17013.696262</td>\n",
       "      <td>18087.084356</td>\n",
       "      <td>9265.927261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>6073.000000</td>\n",
       "      <td>7716.000000</td>\n",
       "      <td>6547.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>12088.250000</td>\n",
       "      <td>14110.750000</td>\n",
       "      <td>9823.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>18103.500000</td>\n",
       "      <td>20505.500000</td>\n",
       "      <td>13099.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>24118.750000</td>\n",
       "      <td>26900.250000</td>\n",
       "      <td>16375.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>30134.000000</td>\n",
       "      <td>33295.000000</td>\n",
       "      <td>19651.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>count</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>4873.000000</td>\n",
       "      <td>3930.000000</td>\n",
       "      <td>2573.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">Python Basics</th>\n",
       "      <th>count</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>261.000000</td>\n",
       "      <td>251.000000</td>\n",
       "      <td>341.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>234.759451</td>\n",
       "      <td>35.355339</td>\n",
       "      <td>260.215295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>95.000000</td>\n",
       "      <td>226.000000</td>\n",
       "      <td>157.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>178.000000</td>\n",
       "      <td>238.500000</td>\n",
       "      <td>249.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>261.000000</td>\n",
       "      <td>251.000000</td>\n",
       "      <td>341.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>344.000000</td>\n",
       "      <td>263.500000</td>\n",
       "      <td>433.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>427.000000</td>\n",
       "      <td>276.000000</td>\n",
       "      <td>525.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             2021-12       2022-01       2022-02\n",
       "Title                                                           \n",
       "Jupyter Tutorial count      2.000000      2.000000      2.000000\n",
       "                 mean   18103.500000  20505.500000  13099.000000\n",
       "                 std    17013.696262  18087.084356   9265.927261\n",
       "                 min     6073.000000   7716.000000   6547.000000\n",
       "                 25%    12088.250000  14110.750000   9823.000000\n",
       "                 50%    18103.500000  20505.500000  13099.000000\n",
       "                 75%    24118.750000  26900.250000  16375.000000\n",
       "                 max    30134.000000  33295.000000  19651.000000\n",
       "PyViz Tutorial   count      1.000000      1.000000      1.000000\n",
       "                 mean    4873.000000   3930.000000   2573.000000\n",
       "                 std             NaN           NaN           NaN\n",
       "                 min     4873.000000   3930.000000   2573.000000\n",
       "                 25%     4873.000000   3930.000000   2573.000000\n",
       "                 50%     4873.000000   3930.000000   2573.000000\n",
       "                 75%     4873.000000   3930.000000   2573.000000\n",
       "                 max     4873.000000   3930.000000   2573.000000\n",
       "Python Basics    count      2.000000      2.000000      2.000000\n",
       "                 mean     261.000000    251.000000    341.000000\n",
       "                 std      234.759451     35.355339    260.215295\n",
       "                 min       95.000000    226.000000    157.000000\n",
       "                 25%      178.000000    238.500000    249.000000\n",
       "                 50%      261.000000    251.000000    341.000000\n",
       "                 75%      344.000000    263.500000    433.000000\n",
       "                 max      427.000000    276.000000    525.000000"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def desc(x):\n",
    "    return x.describe()\n",
    "\n",
    "\n",
    "grouped_titles.apply(desc)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd2395bc",
   "metadata": {},
   "source": [
    "## Unterdrückung der Gruppenschlüssel\n",
    "\n",
    "In den vorangegangenen Beispielen habr ihr gesehen, dass das resultierende Objekt einen hierarchischen Index hat, der aus den Gruppenschlüsseln zusammen mit den Indizes der einzelnen Teile des ursprünglichen Objekts gebildet wird. Ihr können dies deaktivieren, indem ihr `group_keys=False` an `groupby` übergebt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4f3cbc07",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.364522Z",
     "iopub.status.busy": "2026-05-21T16:34:54.364423Z",
     "iopub.status.idle": "2026-05-21T16:34:54.368679Z",
     "shell.execute_reply": "2026-05-21T16:34:54.368408Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.364514Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "Python Basics    de          427.0    276.0    525.0\n",
       "Jupyter Tutorial en         6073.0   7716.0   6547.0\n",
       "Python Basics    en           95.0    226.0    157.0\n",
       "PyViz Tutorial   en            NaN      NaN      NaN"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_lang = df.groupby(\"Language\", group_keys=False)\n",
    "\n",
    "grouped_lang.apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92c83367",
   "metadata": {},
   "source": [
    "## Quantil- und Bucket-Analyse\n",
    "\n",
    "Wie bereits in  [Diskretisierung und Gruppierung](discretisation.ipynb) beschrieben, verfügt pandas über einige Werkzeuge, insbesondere `cut` und `qcut`, um Daten in Buckets mit Bins eurer Wahl oder nach Stichprobenquantilen aufzuteilen. Kombiniert man diese Funktionen mit `groupby`, kann man bequem eine Bucket- oder Quantilanalyse für einen Datensatz durchführen. Betrachtet einen einfachen Zufallsdatensatz und eine gleich lange Bucket-Kategorisierung mit `cut`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "796c9936",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.369049Z",
     "iopub.status.busy": "2026-05-21T16:34:54.368987Z",
     "iopub.status.idle": "2026-05-21T16:34:54.372834Z",
     "shell.execute_reply": "2026-05-21T16:34:54.372464Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.369043Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     (-0.462, 1.407]\n",
       "1     (-0.462, 1.407]\n",
       "2     (-0.462, 1.407]\n",
       "3     (-0.462, 1.407]\n",
       "4     (-0.462, 1.407]\n",
       "5    (-2.331, -0.462]\n",
       "6    (-2.331, -0.462]\n",
       "7     (-0.462, 1.407]\n",
       "8    (-2.331, -0.462]\n",
       "9      (1.407, 3.275]\n",
       "Name: data1, dtype: category\n",
       "Categories (4, interval[float64, right]): [(-4.208, -2.331] < (-2.331, -0.462] < (-0.462, 1.407] < (1.407, 3.275]]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rng = np.random.default_rng()\n",
    "df2 = pd.DataFrame(\n",
    "    {\n",
    "        \"data1\": rng.normal(size=1000),\n",
    "        \"data2\": rng.normal(size=1000),\n",
    "    },\n",
    ")\n",
    "\n",
    "quartiles = pd.cut(df2.data1, 4)\n",
    "\n",
    "quartiles[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c28aa4f8",
   "metadata": {},
   "source": [
    "Das von `cut` zurückgegebene `Categorical`-Objekt kann direkt an `groupby` übergeben werden. Wir könnten also eine Reihe von Gruppenstatistiken für die Quartile wie folgt berechnen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f1517742",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.373288Z",
     "iopub.status.busy": "2026-05-21T16:34:54.373215Z",
     "iopub.status.idle": "2026-05-21T16:34:54.380545Z",
     "shell.execute_reply": "2026-05-21T16:34:54.380143Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.373282Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_40670/157931318.py:12: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  grouped_quart = df2.groupby(quartiles)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-4.208, -2.331]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-4.200342</td>\n",
       "      <td>-2.366172</td>\n",
       "      <td>22</td>\n",
       "      <td>-2.697025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-1.505625</td>\n",
       "      <td>1.049164</td>\n",
       "      <td>22</td>\n",
       "      <td>-0.019913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-2.331, -0.462]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-2.329520</td>\n",
       "      <td>-0.466649</td>\n",
       "      <td>296</td>\n",
       "      <td>-1.064579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.886552</td>\n",
       "      <td>3.427402</td>\n",
       "      <td>296</td>\n",
       "      <td>-0.029285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-0.462, 1.407]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.459433</td>\n",
       "      <td>1.406223</td>\n",
       "      <td>589</td>\n",
       "      <td>0.405802</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.840433</td>\n",
       "      <td>3.120917</td>\n",
       "      <td>589</td>\n",
       "      <td>-0.041455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(1.407, 3.275]</th>\n",
       "      <th>data1</th>\n",
       "      <td>1.411315</td>\n",
       "      <td>3.275479</td>\n",
       "      <td>93</td>\n",
       "      <td>1.938677</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.142100</td>\n",
       "      <td>2.717809</td>\n",
       "      <td>93</td>\n",
       "      <td>0.140287</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             min       max  count      mean\n",
       "data1                                                      \n",
       "(-4.208, -2.331] data1 -4.200342 -2.366172     22 -2.697025\n",
       "                 data2 -1.505625  1.049164     22 -0.019913\n",
       "(-2.331, -0.462] data1 -2.329520 -0.466649    296 -1.064579\n",
       "                 data2 -2.886552  3.427402    296 -0.029285\n",
       "(-0.462, 1.407]  data1 -0.459433  1.406223    589  0.405802\n",
       "                 data2 -2.840433  3.120917    589 -0.041455\n",
       "(1.407, 3.275]   data1  1.411315  3.275479     93  1.938677\n",
       "                 data2 -2.142100  2.717809     93  0.140287"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def stats(group):\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"min\": group.min(),\n",
    "            \"max\": group.max(),\n",
    "            \"count\": group.count(),\n",
    "            \"mean\": group.mean(),\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "grouped_quart = df2.groupby(quartiles)\n",
    "\n",
    "grouped_quart.apply(stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b006fddf",
   "metadata": {},
   "source": [
    "Dies waren Buckets gleicher Länge; um Buckets gleicher Größe auf der Grundlage von Stichprobenquantilen zu berechnen, können wir `qcut` verwenden. Ich übergebe `labels=False`, um nur Quantilzahlen zu erhalten:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "70589dd6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.381068Z",
     "iopub.status.busy": "2026-05-21T16:34:54.380990Z",
     "iopub.status.idle": "2026-05-21T16:34:54.388435Z",
     "shell.execute_reply": "2026-05-21T16:34:54.388115Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.381061Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>data1</th>\n",
       "      <td>-4.200342</td>\n",
       "      <td>-0.681646</td>\n",
       "      <td>250</td>\n",
       "      <td>-1.342533</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.886552</td>\n",
       "      <td>3.427402</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.010124</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.680027</td>\n",
       "      <td>0.076445</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.273673</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.552269</td>\n",
       "      <td>3.120917</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.030725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2</th>\n",
       "      <th>data1</th>\n",
       "      <td>0.083643</td>\n",
       "      <td>0.761484</td>\n",
       "      <td>250</td>\n",
       "      <td>0.403456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.840433</td>\n",
       "      <td>2.677849</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.029197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">3</th>\n",
       "      <th>data1</th>\n",
       "      <td>0.762138</td>\n",
       "      <td>3.275479</td>\n",
       "      <td>250</td>\n",
       "      <td>1.392207</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.815744</td>\n",
       "      <td>2.725009</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.011862</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  min       max  count      mean\n",
       "data1                                           \n",
       "0     data1 -4.200342 -0.681646    250 -1.342533\n",
       "      data2 -2.886552  3.427402    250 -0.010124\n",
       "1     data1 -0.680027  0.076445    250 -0.273673\n",
       "      data2 -2.552269  3.120917    250 -0.030725\n",
       "2     data1  0.083643  0.761484    250  0.403456\n",
       "      data2 -2.840433  2.677849    250 -0.029197\n",
       "3     data1  0.762138  3.275479    250  1.392207\n",
       "      data2 -2.815744  2.725009    250 -0.011862"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quartiles_samp = pd.qcut(df2.data1, 4, labels=False)\n",
    "grouped_quart_samp = df2.groupby(quartiles_samp)\n",
    "\n",
    "grouped_quart_samp.apply(stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8043d46",
   "metadata": {},
   "source": [
    "## Daten mit gruppenspezifischen Werten auffüllen\n",
    "\n",
    "Wenn ihr fehlende Daten bereinigt, werdet ihr in einigen Fällen Datenbeobachtungen mit `dropna` ersetzen, aber in anderen Fällen möchtet ihr vielleicht die Nullwerte (`NA`) mit einem festen Wert oder einem aus den Daten abgeleiteten Wert auffüllen. `fillna` ist das richtige Werkzeug dafür; hier fülle ich zum Beispiel die Nullwerte mit dem Mittelwert auf:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "417345b1",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.389493Z",
     "iopub.status.busy": "2026-05-21T16:34:54.389114Z",
     "iopub.status.idle": "2026-05-21T16:34:54.392105Z",
     "shell.execute_reply": "2026-05-21T16:34:54.391805Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.389478Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         NaN\n",
       "1    0.411457\n",
       "2    0.122992\n",
       "3         NaN\n",
       "4   -0.110075\n",
       "5   -0.494890\n",
       "6         NaN\n",
       "7    0.124568\n",
       "dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(rng.normal(size=8))\n",
    "s[::3] = np.nan\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f2e0d0eb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.392615Z",
     "iopub.status.busy": "2026-05-21T16:34:54.392546Z",
     "iopub.status.idle": "2026-05-21T16:34:54.394891Z",
     "shell.execute_reply": "2026-05-21T16:34:54.394565Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.392609Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.010811\n",
       "1    0.411457\n",
       "2    0.122992\n",
       "3    0.010811\n",
       "4   -0.110075\n",
       "5   -0.494890\n",
       "6    0.010811\n",
       "7    0.124568\n",
       "dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.fillna(s.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac4979f5",
   "metadata": {},
   "source": [
    "Hier sind einige Beispieldaten zu meinen Tutorials, die in deutsch- und englischsprachige Ausgaben unterteilt sind:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07edf8ca",
   "metadata": {},
   "source": [
    "Angenommen, ihr möchtet, dass der Füllwert je nach Gruppe variiert. Diese Werte können vordefiniert werden, und da die Gruppen ein internes Namensattribut `name` haben, könnt ihr dieses mit `apply` verwenden:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "eb7970fb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.395437Z",
     "iopub.status.busy": "2026-05-21T16:34:54.395373Z",
     "iopub.status.idle": "2026-05-21T16:34:54.399900Z",
     "shell.execute_reply": "2026-05-21T16:34:54.399698Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.395431Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">de</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">en</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>3469.0</td>\n",
       "      <td>3469.0</td>\n",
       "      <td>3469.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    2021-12  2022-01  2022-02\n",
       "Language Title            Language                           \n",
       "de       Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "         PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "         Python Basics    de          427.0    276.0    525.0\n",
       "en       Jupyter Tutorial en         6073.0   7716.0   6547.0\n",
       "         PyViz Tutorial   en         3469.0   3469.0   3469.0\n",
       "         Python Basics    en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fill_values = {\"de\": 10632, \"en\": 3469}\n",
    "\n",
    "\n",
    "def fill(g):\n",
    "    return g.fillna(fill_values[g.name])\n",
    "\n",
    "\n",
    "df.groupby(\"Language\").apply(fill)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0514aa28",
   "metadata": {},
   "source": [
    "Ihr könnt auch die Daten gruppieren und `apply` mit einer Funktion zu verwenden, die `fillna` für jedes Datenpaket aufruft:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b32b4aea",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.400213Z",
     "iopub.status.busy": "2026-05-21T16:34:54.400143Z",
     "iopub.status.idle": "2026-05-21T16:34:54.405198Z",
     "shell.execute_reply": "2026-05-21T16:34:54.404905Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.400207Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">de</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>19651.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>2573.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">en</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>6073.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>6547.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>3084.0</td>\n",
       "      <td>3971.0</td>\n",
       "      <td>3352.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <th>en</th>\n",
       "      <td>95.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    2021-12  2022-01  2022-02\n",
       "Language Title            Language                           \n",
       "de       Jupyter Tutorial de        30134.0  33295.0  19651.0\n",
       "         PyViz Tutorial   de         4873.0   3930.0   2573.0\n",
       "         Python Basics    de          427.0    276.0    525.0\n",
       "en       Jupyter Tutorial en         6073.0   7716.0   6547.0\n",
       "         PyViz Tutorial   en         3084.0   3971.0   3352.0\n",
       "         Python Basics    en           95.0    226.0    157.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def fill_mean(g):\n",
    "    return g.fillna(g.mean())\n",
    "\n",
    "\n",
    "df.groupby(\"Language\").apply(fill_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "857cdfe4",
   "metadata": {},
   "source": [
    "## Gruppierter gewichteter Durchschnitt\n",
    "\n",
    "Da Operationen zwischen Spalten in einem `DataFrame` oder zwei `Series` möglich sind, können wir z.B. den gruppengewichteten Durchschnitt berechnen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b925bb61",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.405681Z",
     "iopub.status.busy": "2026-05-21T16:34:54.405612Z",
     "iopub.status.idle": "2026-05-21T16:34:54.409227Z",
     "shell.execute_reply": "2026-05-21T16:34:54.408936Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.405675Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>data</th>\n",
       "      <th>weights</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>de</td>\n",
       "      <td>81741</td>\n",
       "      <td>0.105997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>de</td>\n",
       "      <td>25669</td>\n",
       "      <td>0.509308</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>de</td>\n",
       "      <td>13488</td>\n",
       "      <td>0.283457</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>de</td>\n",
       "      <td>32126</td>\n",
       "      <td>0.587351</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>en</td>\n",
       "      <td>41678</td>\n",
       "      <td>0.284316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>en</td>\n",
       "      <td>92022</td>\n",
       "      <td>0.661866</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>en</td>\n",
       "      <td>74278</td>\n",
       "      <td>0.869102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>en</td>\n",
       "      <td>43758</td>\n",
       "      <td>0.871160</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  category   data   weights\n",
       "0       de  81741  0.105997\n",
       "1       de  25669  0.509308\n",
       "2       de  13488  0.283457\n",
       "3       de  32126  0.587351\n",
       "4       en  41678  0.284316\n",
       "5       en  92022  0.661866\n",
       "6       en  74278  0.869102\n",
       "7       en  43758  0.871160"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rng = np.random.default_rng()\n",
    "df3 = pd.DataFrame(\n",
    "    {\n",
    "        \"category\": [\"de\", \"de\", \"de\", \"de\", \"en\", \"en\", \"en\", \"en\"],\n",
    "        \"data\": rng.integers(100000, size=8),\n",
    "        \"weights\": rng.random(8),\n",
    "    },\n",
    ")\n",
    "\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfd945a2",
   "metadata": {},
   "source": [
    "Der nach Kategorien gewichtete Gruppendurchschnitt würde dann lauten:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "de364549",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.409930Z",
     "iopub.status.busy": "2026-05-21T16:34:54.409675Z",
     "iopub.status.idle": "2026-05-21T16:34:54.413010Z",
     "shell.execute_reply": "2026-05-21T16:34:54.412791Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.409918Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "category\n",
       "de    29896.945738\n",
       "en    65302.434726\n",
       "dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_cat = df3.groupby(\"category\")\n",
    "\n",
    "\n",
    "def get_wavg(g):\n",
    "    return np.average(g[\"data\"], weights=g[\"weights\"])\n",
    "\n",
    "\n",
    "grouped_cat.apply(get_wavg, include_groups=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "485bde5b",
   "metadata": {},
   "source": [
    "## Korrelation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffaf9697",
   "metadata": {},
   "source": [
    "Eine interessante Aufgabe könnte darin bestehen, einen `DataFrame` zu berechnen, der aus den  prozentualen Veränderungen besteht."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2018813",
   "metadata": {},
   "source": [
    "Zu diesem Zweck erstellen wir zunächst eine Funktion, die die paarweise Korrelation der Spalte `2021-12` mit den nachfolgenden Spalten berechnet:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2731ed0c",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.413372Z",
     "iopub.status.busy": "2026-05-21T16:34:54.413308Z",
     "iopub.status.idle": "2026-05-21T16:34:54.415225Z",
     "shell.execute_reply": "2026-05-21T16:34:54.414913Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.413366Z"
    }
   },
   "outputs": [],
   "source": [
    "def corr(x):\n",
    "    return x.corrwith(x[\"2021-12\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9d312fa",
   "metadata": {},
   "source": [
    "Als nächstes berechnen wir die prozentuale Veränderung:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "33a3d392",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.415691Z",
     "iopub.status.busy": "2026-05-21T16:34:54.415569Z",
     "iopub.status.idle": "2026-05-21T16:34:54.419892Z",
     "shell.execute_reply": "2026-05-21T16:34:54.419571Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.415681Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/hk/s8m0bblj0g10hw885gld52mc0000gn/T/ipykernel_40670/3358811060.py:1: FutureWarning: The default fill_method='pad' in DataFrame.pct_change is deprecated and will be removed in a future version. Either fill in any non-leading NA values prior to calling pct_change or specify 'fill_method=None' to not fill NA values.\n",
      "  pcts = df.pct_change().dropna()\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>en</th>\n",
       "      <td>-0.798467</td>\n",
       "      <td>-0.768253</td>\n",
       "      <td>-0.666836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>-0.197596</td>\n",
       "      <td>-0.490669</td>\n",
       "      <td>-0.606996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>-0.912374</td>\n",
       "      <td>-0.929771</td>\n",
       "      <td>-0.795958</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>-0.777518</td>\n",
       "      <td>-0.181159</td>\n",
       "      <td>-0.700952</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            2021-12   2022-01   2022-02\n",
       "Title            Language                              \n",
       "Jupyter Tutorial en       -0.798467 -0.768253 -0.666836\n",
       "PyViz Tutorial   de       -0.197596 -0.490669 -0.606996\n",
       "                 en        0.000000  0.000000  0.000000\n",
       "Python Basics    de       -0.912374 -0.929771 -0.795958\n",
       "                 en       -0.777518 -0.181159 -0.700952"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pcts = df.pct_change().dropna()\n",
    "\n",
    "pcts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ff5fdec",
   "metadata": {},
   "source": [
    "Schließlich gruppieren wir diese prozentualen Änderungen nach Jahr, das aus jeder Zeilenbeschriftung mit einer einzeiligen Funktion extrahiert werden kann, die das Attribut Jahr jeder Datumsbeschriftung zurückgibt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "8a566a2a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.420311Z",
     "iopub.status.busy": "2026-05-21T16:34:54.420232Z",
     "iopub.status.idle": "2026-05-21T16:34:54.424279Z",
     "shell.execute_reply": "2026-05-21T16:34:54.423989Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.420304Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>de</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.699088</td>\n",
       "      <td>0.99781</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          2021-12   2022-01  2022-02\n",
       "Language                            \n",
       "de            1.0  1.000000  1.00000\n",
       "en            1.0  0.699088  0.99781"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_language = pcts.groupby(\"Language\")\n",
    "\n",
    "by_language.apply(corr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "96f1a586",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.427395Z",
     "iopub.status.busy": "2026-05-21T16:34:54.427298Z",
     "iopub.status.idle": "2026-05-21T16:34:54.430130Z",
     "shell.execute_reply": "2026-05-21T16:34:54.429773Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.427389Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language\n",
       "de    1.000000\n",
       "en    0.699088\n",
       "dtype: float64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_language.apply(lambda g: g[\"2021-12\"].corr(g[\"2022-01\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9800b02b",
   "metadata": {},
   "source": [
    "## Performance-Probleme mit `apply`\n",
    "\n",
    "Da die `apply`-Methode typischerweise auf jeden einzelnen Wert in einer `Series` wirkt, wird die Funktion für jeden Wert einmal aufgerufen. Wenn ihr tausende Werte habt, wird die Funktion auch tausende Male aufgerufen. Dadurch werden die schnellen Vektorisierungen von pandas ignoriert sofern ihr keine NumPy-Funktionen verwendet, und langsames Python verwendet. Zum Beispiel haben wir zuvor die Daten nach Titel gruppiert und dann unsere `top`-Methode mit `apply` aufgerufen. Messen wir hierfür die Zeit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "b6815e84",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:54.431088Z",
     "iopub.status.busy": "2026-05-21T16:34:54.430983Z",
     "iopub.status.idle": "2026-05-21T16:34:57.721745Z",
     "shell.execute_reply": "2026-05-21T16:34:57.721445Z",
     "shell.execute_reply.started": "2026-05-21T16:34:54.431081Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "399 μs ± 23.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "grouped_titles.apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1035c064",
   "metadata": {},
   "source": [
    "Wir können dasselbe Ergebnis auch ohne `apply` erhalten indem wir unserer Methode `top` den DataFrame übergeben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "111d1777",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:34:57.722186Z",
     "iopub.status.busy": "2026-05-21T16:34:57.722110Z",
     "iopub.status.idle": "2026-05-21T16:35:00.975265Z",
     "shell.execute_reply": "2026-05-21T16:35:00.974949Z",
     "shell.execute_reply.started": "2026-05-21T16:34:57.722178Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "39.9 μs ± 785 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "top(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67aeb04c",
   "metadata": {},
   "source": [
    "Diese Berechnung ist 18 mal schneller."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81fe0de3",
   "metadata": {},
   "source": [
    "## Optimieren von `apply` mit Cython\n",
    "\n",
    "Nicht immer lässt sich jedoch für `apply`so einfach eine Alternative finden. Numerische Operationen wie unsere `top`-Methode lässt sich jedoch mit [Cython](https://cython.org/) schneller machen. Um Cython in Jupyyter zu nutzen, verwenden wir die folgende [IPython-Magie](../ipython/magics.ipynb):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "c8b32bd2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:35:00.975684Z",
     "iopub.status.busy": "2026-05-21T16:35:00.975605Z",
     "iopub.status.idle": "2026-05-21T16:35:01.219304Z",
     "shell.execute_reply": "2026-05-21T16:35:01.218908Z",
     "shell.execute_reply.started": "2026-05-21T16:35:00.975676Z"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext Cython"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "662eb122",
   "metadata": {},
   "source": [
    "Dann können wir unsere `top`-Funktion mit Cython definieren:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "4c016e1a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:35:01.219944Z",
     "iopub.status.busy": "2026-05-21T16:35:01.219874Z",
     "iopub.status.idle": "2026-05-21T16:35:01.277139Z",
     "shell.execute_reply": "2026-05-21T16:35:01.276878Z",
     "shell.execute_reply.started": "2026-05-21T16:35:01.219938Z"
    }
   },
   "outputs": [],
   "source": [
    "%%cython\n",
    "def top_cy(df, n=5, column=\"2021-12\"):\n",
    "    return df.sort_values(by=column, ascending=False)[:n]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "ea728ba7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-21T16:35:01.277568Z",
     "iopub.status.busy": "2026-05-21T16:35:01.277490Z",
     "iopub.status.idle": "2026-05-21T16:35:04.554226Z",
     "shell.execute_reply": "2026-05-21T16:35:04.553950Z",
     "shell.execute_reply.started": "2026-05-21T16:35:01.277561Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "399 μs ± 2.95 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "grouped_titles.apply(top_cy)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57a39279",
   "metadata": {},
   "source": [
    "Damit haben wir noch nicht wirklich viel gewonnen. Weitere Optimierungsmöglichkeiten wären nun, dass wir mit `cpdef` den Typ im Cython-Code definieren. Dafür müssten wir jedoch unsere Methode umbauen, da dann kein `DataFrame` mehr übergeben werden kann."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}