{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2065a177",
   "metadata": {},
   "source": [
    "# Gruppenoperationen\n",
    "\n",
    "Mit `groupby` ist ein Prozess gemeint, der einen oder mehrere der folgenden Schritte umfasst:\n",
    "\n",
    "* **Split** teilt die Daten in Gruppen nach bestimmten Kriterien auf\n",
    "* **Apply** wendet eine Funktion unabhängig auf jede Gruppe an\n",
    "* **Combine** kombiniert die Ergebnisse in einer Datenstruktur\n",
    "\n",
    "In der ersten Phase des Prozesses werden die in einem pandas-Objekt enthaltenen Daten, sei es eine Series, ein DataFrame oder etwas anderes, in Gruppen aufgeteilt, die auf einem oder mehreren Schlüsseln basieren. Die Aufteilung wird auf einer bestimmten Achse eines Objekts durchgeführt. Ein DataFrame kann zum Beispiel nach seinen Zeilen oder seinen Spalten gruppiert werden. Danach wird auf jede Gruppe eine Funktion angewendet, die einen neuen Wert erzeugt. Schließlich werden die Ergebnisse all dieser Funktionsanwendungen in einem Ergebnisobjekt kombiniert. Die Form des Ergebnisobjekts hängt normalerweise davon ab, was mit den Daten gemacht wird.\n",
    "\n",
    "Jeder Gruppierungsschlüssel kann viele Formen annehmen, und die Schlüssel müssen nicht alle vom gleichen Typ sein:\n",
    "* Eine Liste oder ein Array von Werten, die die gleiche Länge wie die zu gruppierende Achse haben\n",
    "* Ein Wert, der einen Spaltennamen in einem DataFrame angibt\n",
    "* Ein Dict oder eine Series, die eine Entsprechung zwischen den Werten auf der Achse, die gruppiert wird, und den Gruppennamen darstellt\n",
    "* Eine Funktion, die auf dem Achsenindex oder den einzelnen Beschriftungen im Index aufgerufen wird\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "\n",
    "**Hinweis:**\n",
    "\n",
    "Die drei letztgenannten Methoden sind Abkürzungen, um ein Array von Werten zu erzeugen, die für die Aufteilung des Objekts verwendet werden.\n",
    "</div>\n",
    "\n",
    "Macht euch keine Sorgen, wenn dies alles abstrakt erscheint. Im Laufe dieses Kapitels werde ich viele Beispiele für all diese Methoden geben. Für den Anfang hier ein kleiner Tabellendatensatz als DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "26872ed8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "920e4738",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Jupyter Tutorial</td>\n",
       "      <td>de</td>\n",
       "      <td>19651.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jupyter Tutorial</td>\n",
       "      <td>en</td>\n",
       "      <td>4722.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>4009.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>PyViz Tutorial</td>\n",
       "      <td>de</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Python Basics</td>\n",
       "      <td>de</td>\n",
       "      <td>525.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Python Basics</td>\n",
       "      <td>en</td>\n",
       "      <td>157.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Title Language  2021-12  2022-01  2022-02\n",
       "0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
       "1  Jupyter Tutorial       en   4722.0   3497.0   4009.0\n",
       "2    PyViz Tutorial       de   2573.0   4873.0   3930.0\n",
       "3              None     None      NaN      NaN      NaN\n",
       "4     Python Basics       de    525.0    427.0    276.0\n",
       "5     Python Basics       en    157.0     85.0    226.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"Title\": [\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            None,\n",
    "            \"Python Basics\",\n",
    "            \"Python Basics\",\n",
    "        ],\n",
    "        \"Language\": [\"de\", \"en\", \"de\", None, \"de\", \"en\"],\n",
    "        \"2021-12\": [19651, 4722, 2573, None, 525, 157],\n",
    "        \"2022-01\": [30134, 3497, 4873, None, 427, 85],\n",
    "        \"2022-02\": [33295, 4009, 3930, None, 276, 226],\n",
    "    }\n",
    ")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "647e625c",
   "metadata": {},
   "source": [
    "Angenommen, ihr möchtet den Summe der Spalte `02/2022` anhand der Beschriftungen von `Title` berechnen. Es gibt mehrere Möglichkeiten, dies zu tun. Eine davon ist der Zugriff auf `02/2022` und der Aufruf von `groupby` mit der Spalte (einer Series) in `Title`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d2f62153",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x106fd4440>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped = df[\"2022-02\"].groupby(df[\"Title\"])\n",
    "\n",
    "grouped"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0f2e825",
   "metadata": {},
   "source": [
    "Diese `grouped`-Variable ist nun ein spezielles `SeriesGroupBy`-Objekt. Es hat noch nichts berechnet, außer einigen Zwischendaten über den Gruppenschlüssel `df['Title']`. Die Idee ist, dass dieses Objekt über alle Informationen verfügt, die benötigt werden, um eine Operation auf jede der Gruppen anzuwenden. Zur Berechnung der Gruppenmittelwerte können wir beispielsweise die Methode `sum` des `GroupBy`-Objekts aufrufen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "2939a291",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title\n",
       "Jupyter Tutorial    37304.0\n",
       "PyViz Tutorial       3930.0\n",
       "Python Basics         502.0\n",
       "Name: 2022-02, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19bcb625",
   "metadata": {},
   "source": [
    "Später werde ich mehr darüber erklären, was passiert, wenn ihr `.sum()` aufruft. Wichtig ist hier, dass die Daten (eine Reihe) durch Aufteilung der Daten auf den Gruppenschlüssel aggregiert wurden, wodurch eine neue Reihe entsteht, die nun durch die eindeutigen Werte in der Spalte `Title` indiziert ist. Der resultierende Index ist `Title`, weil `groupby(df['Title']` dies tat."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "424fdc8a",
   "metadata": {},
   "source": [
    "Hätten wir stattdessen mehrere Arrays als Liste übergeben, würden wir etwas anderes erhalten:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4ceeb0cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language  Title           \n",
       "de        Jupyter Tutorial    19651.0\n",
       "          PyViz Tutorial       2573.0\n",
       "          Python Basics         525.0\n",
       "en        Jupyter Tutorial     4722.0\n",
       "          Python Basics         157.0\n",
       "Name: 2021-12, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sums = df[\"2021-12\"].groupby([df[\"Language\"], df[\"Title\"]]).sum()\n",
    "\n",
    "sums"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "969abdd0",
   "metadata": {},
   "source": [
    "Hier haben wir die Daten anhand von zwei Schlüsseln gruppiert, und die resultierende Reihe hat nun einen hierarchischen Index, der aus den beobachteten eindeutigen Schlüsselpaaren besteht:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c72eb2c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Title</th>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>Python Basics</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>525.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Title     Jupyter Tutorial  PyViz Tutorial  Python Basics\n",
       "Language                                                 \n",
       "de                 19651.0          2573.0          525.0\n",
       "en                  4722.0             NaN          157.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sums.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59025fe7",
   "metadata": {},
   "source": [
    "Häufig befinden sich die Gruppierungsinformationen in demselben DataFrame wie die Daten, die ihr bearbeiten möchtet. In diesem Fall könnt ihr Spaltennamen (egal ob es sich um Zeichenketten, Zahlen oder andere Python-Objekte handelt) als Gruppenschlüssel übergeben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4d7fab97",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Language</th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jupyter Tutorial</th>\n",
       "      <td>deen</td>\n",
       "      <td>24373.0</td>\n",
       "      <td>33631.0</td>\n",
       "      <td>37304.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <td>de</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Python Basics</th>\n",
       "      <td>deen</td>\n",
       "      <td>682.0</td>\n",
       "      <td>512.0</td>\n",
       "      <td>502.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Language  2021-12  2022-01  2022-02\n",
       "Title                                               \n",
       "Jupyter Tutorial     deen  24373.0  33631.0  37304.0\n",
       "PyViz Tutorial         de   2573.0   4873.0   3930.0\n",
       "Python Basics        deen    682.0    512.0    502.0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Title\").sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae165388",
   "metadata": {},
   "source": [
    "Hierbei fällt auf, dass das Ergebnis keine Spalte `Language` enthält. Da es sich bei `df['Language']` nicht um numerische Daten handelt, stört sie im Tabellenlayout und wird daher automatisch aus dem Ergebnis ausgeschlossen. Standardmäßig werden alle numerischen Spalten aggregiert."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1a4f155e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>33295.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>4009.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>2573.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>3930.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>525.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>157.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2021-12  2022-01  2022-02\n",
       "Title            Language                           \n",
       "Jupyter Tutorial de        19651.0  30134.0  33295.0\n",
       "                 en         4722.0   3497.0   4009.0\n",
       "PyViz Tutorial   de         2573.0   4873.0   3930.0\n",
       "Python Basics    de          525.0    427.0    276.0\n",
       "                 en          157.0     85.0    226.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Title\", \"Language\"]).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd25c6d2",
   "metadata": {},
   "source": [
    "Unabhängig vom Ziel der Verwendung von `groupby` ist eine allgemein nützliche `groupby`-Methode `size`, die eine Serie mit den Gruppengrößen zurückgibt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8ec18d81",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language\n",
       "de    3\n",
       "en    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Language\"]).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b2b057e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "\n",
    "**Hinweis:**\n",
    "\n",
    "Alle fehlenden Werte in einem Gruppenschlüssel werden standardmäßig aus dem Ergebnis ausgeschlossen. Dieses Verhalten kann deaktiviert werden, indem `dropna=False` an `groupby` übergeben wird.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "fcbd7160",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Language\n",
       "de     3\n",
       "en     2\n",
       "NaN    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Language\", dropna=False).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "06598c88",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title             Language\n",
       "Jupyter Tutorial  de          1\n",
       "                  en          1\n",
       "PyViz Tutorial    de          1\n",
       "Python Basics     de          1\n",
       "                  en          1\n",
       "NaN               NaN         1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Title\", \"Language\"], dropna=False).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e3d1d0b",
   "metadata": {},
   "source": [
    "## Iteration über Gruppen\n",
    "\n",
    "Das von `groupby` zurückgegebene Objekt unterstützt Iteration und erzeugt eine Folge von 2-Tupeln, die den Gruppennamen zusammen mit dem Datenpaket enthalten. Betrachten wir das Folgende:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "3d297d82",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Jupyter Tutorial\n",
      "              Title Language  2021-12  2022-01  2022-02\n",
      "0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
      "1  Jupyter Tutorial       en   4722.0   3497.0   4009.0\n",
      "PyViz Tutorial\n",
      "            Title Language  2021-12  2022-01  2022-02\n",
      "2  PyViz Tutorial       de   2573.0   4873.0   3930.0\n",
      "Python Basics\n",
      "           Title Language  2021-12  2022-01  2022-02\n",
      "4  Python Basics       de    525.0    427.0    276.0\n",
      "5  Python Basics       en    157.0     85.0    226.0\n"
     ]
    }
   ],
   "source": [
    "for name, group in df.groupby(\"Title\"):\n",
    "    print(name)\n",
    "    print(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aaa800a6",
   "metadata": {},
   "source": [
    "Bei mehreren Schlüsseln ist das erste Element des Tupels ein Tupel von Schlüsselwerten:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "33bb6b4d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Jupyter Tutorial', 'de')\n",
      "              Title Language  2021-12  2022-01  2022-02\n",
      "0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
      "('Jupyter Tutorial', 'en')\n",
      "              Title Language  2021-12  2022-01  2022-02\n",
      "1  Jupyter Tutorial       en   4722.0   3497.0   4009.0\n",
      "('PyViz Tutorial', 'de')\n",
      "            Title Language  2021-12  2022-01  2022-02\n",
      "2  PyViz Tutorial       de   2573.0   4873.0   3930.0\n",
      "('Python Basics', 'de')\n",
      "           Title Language  2021-12  2022-01  2022-02\n",
      "4  Python Basics       de    525.0    427.0    276.0\n",
      "('Python Basics', 'en')\n",
      "           Title Language  2021-12  2022-01  2022-02\n",
      "5  Python Basics       en    157.0     85.0    226.0\n"
     ]
    }
   ],
   "source": [
    "for (i1, i2), group in df.groupby([\"Title\", \"Language\"]):\n",
    "    print((i1, i2))\n",
    "    print(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa7ac14c",
   "metadata": {},
   "source": [
    "Als nächstes wollen wir ein `dict` der Daten als Einzeiler ausgeben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "7080879c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Jupyter Tutorial':               Title Language  2021-12  2022-01  2022-02\n",
       " 0  Jupyter Tutorial       de  19651.0  30134.0  33295.0\n",
       " 1  Jupyter Tutorial       en   4722.0   3497.0   4009.0,\n",
       " 'PyViz Tutorial':             Title Language  2021-12  2022-01  2022-02\n",
       " 2  PyViz Tutorial       de   2573.0   4873.0   3930.0,\n",
       " 'Python Basics':            Title Language  2021-12  2022-01  2022-02\n",
       " 4  Python Basics       de    525.0    427.0    276.0\n",
       " 5  Python Basics       en    157.0     85.0    226.0}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "books = dict(list(df.groupby(\"Title\")))\n",
    "\n",
    "books"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1614d9d",
   "metadata": {},
   "source": [
    "Standardmäßig gruppiert `groupby` auf `axis=0`, aber ihr könnt auch auf jeder der anderen Achsen gruppieren. Zum Beispiel könnten wir die Spalten unseres Beispiels `df` hier nach `dtype` gruppieren wie folgt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "d5974c77",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title        object\n",
       "Language     object\n",
       "2021-12     float64\n",
       "2022-01     float64\n",
       "2022-02     float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "01dc9b39",
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped = df.T.groupby(df.dtypes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "bd81bddb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "float64\n",
      "               0       1       2    3      4      5\n",
      "2021-12  19651.0  4722.0  2573.0  NaN  525.0  157.0\n",
      "2022-01  30134.0  3497.0  4873.0  NaN  427.0   85.0\n",
      "2022-02  33295.0  4009.0  3930.0  NaN  276.0  226.0\n",
      "object\n",
      "                         0                 1               2     3  \\\n",
      "Title     Jupyter Tutorial  Jupyter Tutorial  PyViz Tutorial  None   \n",
      "Language                de                en              de  None   \n",
      "\n",
      "                      4              5  \n",
      "Title     Python Basics  Python Basics  \n",
      "Language             de             en  \n"
     ]
    }
   ],
   "source": [
    "for dtype, group in grouped:\n",
    "    print(dtype)\n",
    "    print(group)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2baec4a6",
   "metadata": {},
   "source": [
    "## Auswählen einer Spalte oder Untergruppe von Spalten\n",
    "\n",
    "Die Indizierung eines `GroupBy`-Objekts, das aus einem DataFrame mit einem Spaltennamen oder einem Array von Spaltennamen erstellt wurde, hat den Effekt einer Spaltenunterteilung für die Aggregation. Dies bedeutet, dass:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e5e8fd94",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11c9eae40>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Title\")[\"2021-12\"]\n",
    "df.groupby(\"Title\")[[\"2022-01\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce99850b",
   "metadata": {},
   "source": [
    "sind vereinfachte Schreibweisen für:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "96205ee8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11ca3dfd0>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"2021-12\"].groupby(df[\"Title\"])\n",
    "df[[\"2022-01\"]].groupby(df[\"Title\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6865f7b5",
   "metadata": {},
   "source": [
    "Insbesondere bei großen Datensätzen kann es wünschenswert sein, nur einige Spalten zu aggregieren. Um zum Beispiel im vorhergehenden Datensatz die Summe nur für die Spalte `01/2022` zu berechnen und das Ergebnis als DataFrame zu erhalten, könnten wir schreiben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "46d4b885",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2022-01</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Title</th>\n",
       "      <th>Language</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>30134.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>3497.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>4873.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>427.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           2022-01\n",
       "Title            Language         \n",
       "Jupyter Tutorial de        30134.0\n",
       "                 en         3497.0\n",
       "PyViz Tutorial   de         4873.0\n",
       "Python Basics    de          427.0\n",
       "                 en           85.0"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"Title\", \"Language\"])[[\"2022-01\"]].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52345b61",
   "metadata": {},
   "source": [
    "Das von dieser Indizierungsoperation zurückgegebene Objekt ist ein gruppierter DataFrame, wenn eine Liste oder ein Array übergeben wird, oder eine gruppierte Serie, wenn nur ein einzelner Spaltenname als Skalar übergeben wird:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "ca221529",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x11c9ae650>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_grouped = df.groupby([\"Title\", \"Language\"])[\"2022-01\"]\n",
    "\n",
    "series_grouped"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "9167f78a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Title             Language\n",
       "Jupyter Tutorial  de          30134.0\n",
       "                  en           3497.0\n",
       "PyViz Tutorial    de           4873.0\n",
       "Python Basics     de            427.0\n",
       "                  en             85.0\n",
       "Name: 2022-01, dtype: float64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_grouped.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3d46fdc",
   "metadata": {},
   "source": [
    "## Gruppierung mit `dicts` und `Series`\n",
    "\n",
    "Gruppierungsinformationen können auch in einer anderen Form als einem Array vorliegen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "bc2df07e",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.iloc[2:3, [2, 3]] = np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec42307f",
   "metadata": {},
   "source": [
    "Angenommen, ich habe eine Gruppenkorrespondenz für die Spalten und möchte die Spalten nach Gruppen zusammenfassen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "fff0698e",
   "metadata": {},
   "outputs": [],
   "source": [
    "mapping = {\"2021-12\": \"Dec 2021\", \"2022-01\": \"Jan 2022\", \"2022-02\": \"Feb 2022\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29f128ac",
   "metadata": {},
   "source": [
    "Nun könnte aus diesem `dict` ein Array konstruiert werden, um es an `groupby` zu übergeben, aber stattdessen können wir auch einfach das `dict` übergeben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "a35f838f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Dec 2021</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>4722.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>525.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb 2022</th>\n",
       "      <td>33295.0</td>\n",
       "      <td>4009.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jan 2022</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                0       1       2  3      4      5\n",
       "Dec 2021  19651.0  4722.0       0  0  525.0  157.0\n",
       "Feb 2022  33295.0  4009.0  3930.0  0  276.0  226.0\n",
       "Jan 2022  30134.0  3497.0       0  0  427.0   85.0"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_column = df.T.groupby(mapping)\n",
    "\n",
    "by_column.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acc8d67c",
   "metadata": {},
   "source": [
    "Die gleiche Funktionalität gilt für `Series`, die als eine Abbildung mit fester Größe betrachtet werden können:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "adad743a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2021-12    Dec 2021\n",
       "2022-01    Jan 2022\n",
       "2022-02    Feb 2022\n",
       "dtype: object"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_series = pd.Series(mapping)\n",
    "\n",
    "map_series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "5be77438",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Dec 2021</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb 2022</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jan 2022</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0  1  2  3  4  5\n",
       "Dec 2021  1  1  0  0  1  1\n",
       "Feb 2022  1  1  1  0  1  1\n",
       "Jan 2022  1  1  0  0  1  1"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.T.groupby(map_series).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee329ff2",
   "metadata": {},
   "source": [
    "## Gruppieren mit Funktionen\n",
    "\n",
    "Die Verwendung von Python-Funktionen ist im Vergleich zu einem `Dict` oder einer `Series` eine allgemeinere Methode zur Definition einer Gruppenzuordnung. Jede Funktion, die als Gruppenschlüssel übergeben wird, wird einmal pro Indexwert aufgerufen, wobei die Rückgabewerte als Gruppennamen verwendet werden. Betrachtet konkret den Beispiel-DataFrame aus dem vorherigen Abschnitt, das die Titel als Indexwerte enthält. Angenommen, Wenn ihr nach der Länge der Namen gruppieren wollt, könnt ihr zwar ein Array mit den Längen der Strings berechnen, aber es ist einfacher, die Funktion `len` zu übergeben:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "48553e76",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(\n",
    "    [\n",
    "        [19651, 30134, 33295],\n",
    "        [4722, 3497, 4009],\n",
    "        [2573, 4873, 3930],\n",
    "        [525, 427, 276],\n",
    "        [157, 85, 226],\n",
    "    ],\n",
    "    index=[\n",
    "        \"Jupyter Tutorial\",\n",
    "        \"Jupyter Tutorial\",\n",
    "        \"PyViz Tutorial\",\n",
    "        \"Python Basics\",\n",
    "        \"Python Basics\",\n",
    "    ],\n",
    "    columns=[\"2021-12\", \"2022-01\", \"2022-02\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "737dfa93",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    2021-12  2022-01  2022-02\n",
       "13        2        2        2\n",
       "14        1        1        1\n",
       "16        2        2        2"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(len).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc755a53",
   "metadata": {},
   "source": [
    "Das Mischen von Funktionen mit Arrays, Dicts oder Series ist kein Problem, da alles intern in Arrays umgewandelt wird:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "5976d024",
   "metadata": {},
   "outputs": [],
   "source": [
    "languages = [\"de\", \"en\", \"de\", \"de\", \"en\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "8a49da12",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>2021-12</th>\n",
       "      <th>2022-01</th>\n",
       "      <th>2022-02</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">13</th>\n",
       "      <th>de</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <th>de</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">16</th>\n",
       "      <th>de</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       2021-12  2022-01  2022-02\n",
       "13 de        1        1        1\n",
       "   en        1        1        1\n",
       "14 de        1        1        1\n",
       "16 de        1        1        1\n",
       "   en        1        1        1"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([len, languages]).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e2622a9",
   "metadata": {},
   "source": [
    "## Gruppierung nach Indexebenen\n",
    "\n",
    "Eine letzte praktische Funktion für hierarchisch indizierte Datensätze ist die Möglichkeit der Aggregation anhand einer der Indexebenen einer Achse. Schauen wir uns ein Beispiel an:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "ce3a50f6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>Month</th>\n",
       "      <th colspan=\"2\" halign=\"left\">2021-12</th>\n",
       "      <th colspan=\"2\" halign=\"left\">2022-01</th>\n",
       "      <th colspan=\"2\" halign=\"left\">2022-02</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>Version</th>\n",
       "      <th>latest</th>\n",
       "      <th>stable</th>\n",
       "      <th>latest</th>\n",
       "      <th>stable</th>\n",
       "      <th>latest</th>\n",
       "      <th>stable</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Jupyter Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>30134.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>33295.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>4722.0</td>\n",
       "      <td>1825.0</td>\n",
       "      <td>3497.0</td>\n",
       "      <td>2576.0</td>\n",
       "      <td>4009.0</td>\n",
       "      <td>3707.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>de</th>\n",
       "      <td>2573.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NaN</th>\n",
       "      <th>NaN</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Python Basics</th>\n",
       "      <th>de</th>\n",
       "      <td>525.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>en</th>\n",
       "      <td>157.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>226.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Month                 2021-12          2022-01          2022-02        \n",
       "Version                latest  stable   latest  stable   latest  stable\n",
       "Jupyter Tutorial de   19651.0     0.0  30134.0     0.0  33295.0     0.0\n",
       "                 en    4722.0  1825.0   3497.0  2576.0   4009.0  3707.0\n",
       "PyViz Tutorial   de    2573.0     0.0   4873.0     0.0   3930.0     0.0\n",
       "NaN              NaN      NaN     NaN      NaN     NaN      NaN     NaN\n",
       "Python Basics    de     525.0     0.0    427.0     0.0    276.0     0.0\n",
       "                 en     157.0     0.0     85.0     0.0    226.0     0.0"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "version_hits = [\n",
    "    [19651, 0, 30134, 0, 33295, 0],\n",
    "    [4722, 1825, 3497, 2576, 4009, 3707],\n",
    "    [2573, 0, 4873, 0, 3930, 0],\n",
    "    [None, None, None, None, None, None],\n",
    "    [525, 0, 427, 0, 276, 0],\n",
    "    [157, 0, 85, 0, 226, 0],\n",
    "]\n",
    "df = pd.DataFrame(\n",
    "    version_hits,\n",
    "    index=[\n",
    "        [\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"Jupyter Tutorial\",\n",
    "            \"PyViz Tutorial\",\n",
    "            None,\n",
    "            \"Python Basics\",\n",
    "            \"Python Basics\",\n",
    "        ],\n",
    "        [\"de\", \"en\", \"de\", None, \"de\", \"en\"],\n",
    "    ],\n",
    "    columns=[\n",
    "        [\"2021-12\", \"2021-12\", \"2022-01\", \"2022-01\", \"2022-02\", \"2022-02\"],\n",
    "        [\"latest\", \"stable\", \"latest\", \"stable\", \"latest\", \"stable\"],\n",
    "    ],\n",
    ")\n",
    "df.columns.names = [\"Month\", \"Version\"]\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "1ab217b9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">Jupyter Tutorial</th>\n",
       "      <th>PyViz Tutorial</th>\n",
       "      <th>NaN</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Python Basics</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>de</th>\n",
       "      <th>en</th>\n",
       "      <th>de</th>\n",
       "      <th>NaN</th>\n",
       "      <th>de</th>\n",
       "      <th>en</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2021-12</th>\n",
       "      <td>19651.0</td>\n",
       "      <td>6547.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>525.0</td>\n",
       "      <td>157.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-01</th>\n",
       "      <td>30134.0</td>\n",
       "      <td>6073.0</td>\n",
       "      <td>4873.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>427.0</td>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-02</th>\n",
       "      <td>33295.0</td>\n",
       "      <td>7716.0</td>\n",
       "      <td>3930.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>276.0</td>\n",
       "      <td>226.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Jupyter Tutorial         PyViz Tutorial  NaN Python Basics       \n",
       "                      de      en             de  NaN            de     en\n",
       "Month                                                                    \n",
       "2021-12          19651.0  6547.0         2573.0  0.0         525.0  157.0\n",
       "2022-01          30134.0  6073.0         4873.0  0.0         427.0   85.0\n",
       "2022-02          33295.0  7716.0         3930.0  0.0         276.0  226.0"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.T.groupby(level=\"Month\").sum()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.13 Kernel",
   "language": "python",
   "name": "python313"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}