{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## **Quickstart Guide**\n", "\n", "This guide covers the standard usage pattern and basic functionality to help you get started with twinLab. In this jupyter notebook we will:\n", "\n", "1. Upload a dataset to twinLab.\n", "2. Use `Emulator.train` to train a surrogate model.\n", "3. Use the model to make a prediction with `Emulator.predict`.\n", "4. Visualise the results and their uncertainty.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Third-party imports\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# Project imports\n", "import twinlab as tl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Your twinLab information**\n", "\n", "Confirm your twinLab version\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'cloud': '2.0.0',\n", " 'modal': '0.2.0',\n", " 'library': '1.2.0',\n", " 'image': 'jasper-twinlab-deployment'}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tl.versions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And view your user information, including how many credits you have.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'username': 'jasper@digilab.co.uk', 'credits': 0}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tl.user_information()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Upload a dataset**\n", "\n", "Datasets must be data presented as a `pandas.DataFrame` object, or a filepaths which points to a csv file that can be parsed to a `pandas.DataFrame` object. **Both must be formatted with clearly labelled columns.** Here, we will label the input (predictor) variable `x` and the output variable `y`. In `twinlab`, data is expected to be in column-feature format, meaning each row represents a single data sample, and each column represents a data feature.\n", "\n", "`twinLab` contains a `Dataset` class with attirbutes and methods to process, view and summarise the dataset. Datasets must be created with a `dataset_id` which is used to access them. The dataset can be uploaded using the `upload` method.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
00.696469-0.817374
10.2861390.887656
20.2268510.921553
30.551315-0.326334
40.719469-0.832518
50.4231060.400669
60.980764-0.164966
70.684830-0.960764
80.4809320.340115
90.3921180.845795
\n", "
" ], "text/plain": [ " x y\n", "0 0.696469 -0.817374\n", "1 0.286139 0.887656\n", "2 0.226851 0.921553\n", "3 0.551315 -0.326334\n", "4 0.719469 -0.832518\n", "5 0.423106 0.400669\n", "6 0.980764 -0.164966\n", "7 0.684830 -0.960764\n", "8 0.480932 0.340115\n", "9 0.392118 0.845795" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Dataframe is uploading.\n", "Processing dataset\n", "Dataset example_data was processed.\n" ] } ], "source": [ "x = [\n", " 0.6964691855978616,\n", " 0.28613933495037946,\n", " 0.2268514535642031,\n", " 0.5513147690828912,\n", " 0.7194689697855631,\n", " 0.42310646012446096,\n", " 0.9807641983846155,\n", " 0.6848297385848633,\n", " 0.48093190148436094,\n", " 0.3921175181941505,\n", "]\n", "\n", "y = [\n", " -0.8173739564129022,\n", " 0.8876561174050408,\n", " 0.921552660721474,\n", " -0.3263338765412979,\n", " -0.8325176123242133,\n", " 0.4006686354731812,\n", " -0.16496626502368078,\n", " -0.9607643657025954,\n", " 0.3401149876855609,\n", " 0.8457949914442409,\n", "]\n", "\n", "# Creating the dataframe using the above arrays\n", "df = pd.DataFrame({\"x\": x, \"y\": y})\n", "\n", "# View the dataset before uploading\n", "display(df)\n", "\n", "# Define the name of the dataset\n", "dataset_id = \"example_data\"\n", "\n", "# Intialise a Dataset object\n", "dataset = tl.Dataset(id=dataset_id)\n", "\n", "# Upload the dataset\n", "dataset.upload(df, verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Train an emulator**\n", "\n", "The `Emulator` class is used to train and implement your surrogate models. As with datasets, an id is defined, this is what the model will be saved as in the cloud. When training a model the arguments are passed using a `TrainParams` object; `TrainParams` is a class that contains all the necessary parameters needed to train your model. To train the model we use the `Emulator.train` function, inputting the `TrainParams` object as an argument to this function.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model example_emulator has begun training.\n", "Training complete!\n", "\n" ] } ], "source": [ "# Initialise emulator\n", "emulator_id = \"example_emulator\"\n", "\n", "emulator = tl.Emulator(id=emulator_id)\n", "\n", "# Define the training parameters for your emulator\n", "params = tl.TrainParams(train_test_ratio=1.0)\n", "\n", "# Train the mulator using the train method\n", "emulator.train(\n", " dataset=dataset, inputs=[\"x\"], outputs=[\"y\"], params=params, verbose=True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Prediction using the trained emulators**\n", "\n", "The surrogate model is now trained and saved to the cloud under the `emulator_id`. It can now be used to make predictions. First define a dataset of inputs for which you want to find outputs; ensure that this is a `pandas.DataFrame` object. Then call `Emulator.predict` with the keyword arguments being the evaluation dataset.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x
00.000000
10.007874
20.015748
30.023622
40.031496
......
1230.968504
1240.976378
1250.984252
1260.992126
1271.000000
\n", "

128 rows × 1 columns

\n", "
" ], "text/plain": [ " x\n", "0 0.000000\n", "1 0.007874\n", "2 0.015748\n", "3 0.023622\n", "4 0.031496\n", ".. ...\n", "123 0.968504\n", "124 0.976378\n", "125 0.984252\n", "126 0.992126\n", "127 1.000000\n", "\n", "[128 rows x 1 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " y y\n", "0 0.617689 0.656265\n", "1 0.629105 0.640576\n", "2 0.640630 0.624421\n", "3 0.652252 0.607809\n", "4 0.663957 0.590755\n" ] } ], "source": [ "# Define the inputs for the dataset\n", "x_eval = np.linspace(0, 1, 128)\n", "\n", "# Convert to a dataframe\n", "df_eval = pd.DataFrame({\"x\": x_eval})\n", "display(df_eval)\n", "\n", "# Predict the results\n", "predictions = emulator.predict(df_eval)\n", "result_df = pd.concat([predictions[0], predictions[1]], axis=1)\n", "df_mean, df_stdev = result_df.iloc[:, 0], result_df.iloc[:, 1]\n", "df_mean, df_stdev = df_mean.values, df_stdev.values\n", "print(result_df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Viewing the results**\n", "\n", "`Emulator.predict` outputs mean values for each input and their standard deviation; this gives the abilty to nicely visualise the uncertainty in results.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot parameters\n", "nsigs = [1, 2]\n", "# nsigs = [0.674, 1.960, 2.576]\n", "color = \"blue\"\n", "alpha = 0.5\n", "plot_training_data = True\n", "plot_model_mean = True\n", "plot_model_bands = True\n", "\n", "# Plot results\n", "grid = df_eval[\"x\"]\n", "mean = df_mean\n", "err = df_stdev\n", "if plot_model_bands:\n", " label = r\"Model prediction\"\n", " plt.fill_between(grid, np.nan, np.nan, lw=0, color=color, alpha=alpha, label=label)\n", " for isig, nsig in enumerate(nsigs):\n", " plt.fill_between(\n", " grid,\n", " mean - nsig * err,\n", " mean + nsig * err,\n", " lw=0,\n", " color=color,\n", " alpha=alpha / (isig + 1),\n", " )\n", "if plot_model_mean:\n", " label = r\"Model prediction\" if not plot_model_bands else None\n", " plt.plot(grid, mean, color=color, alpha=alpha, label=label)\n", "if plot_training_data:\n", " plt.plot(df[\"x\"], df[\"y\"], \".\", color=\"black\", label=\"Training data\")\n", "plt.xlim((0.0, 1.0))\n", "plt.xlabel(r\"$X$\")\n", "plt.ylabel(r\"$y$\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Deleteing datasets and campaigns**\n", "\n", "To keep your cloud storage tidy you should delete your datasets and emulators when you are finished with them. `Emulator.delete` and `Dataset.delete` deletes the emulators and the datasets from the cloud storage respectively.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Delete dataset\n", "dataset.delete()\n", "\n", "# Delete campaign\n", "emulator.delete()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 4 }