Intake-GUI: Exploring data in a graphical user interface¶
Intake GUI has been re-implemented so that it can be made available not only in Jupyter notebooks, but also in other web applications. It displays the contents of all installed catalogs and enables local and remote catalogs to be selected and to be searched and selected from.
Intake supports the division of labor between data engineers who curate, manage, and deploy data, and data scientists who analyse and visualise data without having to know how it’s stored.
The Intake GUI is based on Panel, with the control panel offering a composite dashboard solution for displaying plots, images, tables, texts and widgets. Panel works both in a Jupyter notebook and in a standalone Tornado application.
From a data engineer’s point of view, this means that you can deploy the recording GUI at an endpoint and use it as a data exploration tool for your data users. This also means that it’s easy to adapt and reorganise the GUI in order to insert your own logo, reuse parts of it in your own applications or add new functions.
In the future, Intake-GUI should also allow the input of user parameters as well as the editing and saving of catalogs.
[1]:
import intake
intake.gui
[1]:
The GUI contains three main areas:
a list of catalogs. The builtin catalog shown by defaul tcontains data records installed in the system, just like
intake.cat.a list of the sources in the currently selected catalog.
a description of the currently selected source.
Ad 1: Catalogs¶
No catalog is currently displayed in the list of catalogs. However, under the three main areas there are three buttons that can be used to add, remove, or search catalogs.
The buttons are also available through the API, e.g. for Add Catalog with:
[2]:
intake.gui.add("./us_crime/us_crime.yaml")
Remote catalogs are for example available at
Ad 2. Sources¶
Selecting a source from the list updates the descriptive text on the left side of the user interface.
This is also available via the API:
[3]:
intake.gui.sources
[3]:
[name: us_crime
container: dataframe
plugin: ['csv']
driver: ['csv']
description: US Crime data [UCRDataTool](https://www.ucrdatatool.gov/Search/Crime/State/StatebyState.cfm)
direct_access: forbid
user_parameters: []
metadata:
plots:
line_example:
kind: line
y: ['Robbery', 'Burglary']
x: Year
violin_example:
kind: violin
y: ['Burglary rate', 'Larceny-theft rate', 'Robbery rate', 'Violent Crime rate']
group_label: Type of crime
value_label: Rate per 100k
invert: True
args:
urlpath: {{ CATALOG_DIR }}/data/crime.csv]
This consists of a list of regular Intake data source entries. To look at the first entries, we can enter the following:
[4]:
source = intake.gui.sources[0]
source.gui
[4]:
[5]:
intake.gui.source.description
[5]:
[6]:
cat = intake.open_catalog("./us_crime/us_crime.yaml")
cat.gui
%opts magic unavailable (pyparsing cannot be imported)
%compositor magic unavailable (pyparsing cannot be imported)
[6]:
[7]:
source = intake.gui.sources[0]
[8]:
source
name: us_crime
container: dataframe
plugin: ['csv']
driver: ['csv']
description: US Crime data [UCRDataTool](https://www.ucrdatatool.gov/Search/Crime/State/StatebyState.cfm)
direct_access: forbid
user_parameters: []
metadata:
plots:
line_example:
kind: line
y: ['Robbery', 'Burglary']
x: Year
violin_example:
kind: violin
y: ['Burglary rate', 'Larceny-theft rate', 'Robbery rate', 'Violent Crime rate']
group_label: Type of crime
value_label: Rate per 100k
invert: True
args:
urlpath: {{ CATALOG_DIR }}/data/crime.csv
[9]:
cat.us_crime.plot.bivariate(
"Burglary rate",
"Property crime rate",
legend=False,
width=500,
height=400
) * cat.us_crime.plot.scatter(
"Burglary rate",
"Property crime rate",
color="black",
size=15,
legend=False,
) + cat.us_crime.plot.table(
["Burglary rate", "Property crime rate"],
width=350,
height=350
)
[9]:
Ad 3. Source view¶
As soon as catalogs are loaded and the desired sources have been selected, they are available under the attribute intake.gui.sources. Each source entry has methods and can be opened as a data source like any catalog entry. For Source: UCMerced_LandUse_by_landuse, the entry looks like this:
name: UCMerced_LandUse_by_landuse
container: None
plugin: []
description: All images matching given landuse from UCMerced_LandUse/Image.
direct_access: forbid
user_parameters: [{'name': 'landuse', 'description': 'which landuse to collect', 'type': 'str', 'default': 'airplane'}]
metadata:
args:
urlpath: s3://earth-data/UCMerced_LandUse/Images/{{ landuse }}/{{ landuse }}{id:2d}.tif
storage_options:
anon: True
concat_dim: id
coerce_shape: [256, 256]
Below the list of sources there is a series of buttons for opening up the selected data source: Plot opens a sub-window to display the predefined (i.e. the ones specified in yaml) plots for the selected source.
See also: