Development#

Notes for developers. If you want to get involved, please do! We welcome all kinds of contributions, for example:

docs fixes/clarifications
bug reports
bug fixes
feature requests
pull requests
tutorials

Workflows#

We don't mind whether you use a branching or forking workflow. However, please only push to your own branches, pushing to other people's branches is often a recipe for disaster, is never required in our experience so is best avoided.

Try and keep your merge requests as small as possible (focus on one thing if you can). This makes life much easier for reviewers which allows contributions to be accepted at a faster rate.

Installation#

For development, we rely on uv for all our dependency management. To get started, you will need to make sure that uv is installed (instructions here).

This project is a uv workspace, which means that it contains more than one Python package. uv commands will by default target the root bookshelf package, but if you wish to target another package you can use the --package flag.

For all of work, we use our Makefile. You can read the instructions out and run the commands by hand if you wish, but we generally discourage this because it can be error prone. In order to create your environment, run make virtual-environment.

If there are any issues, the messages from the Makefile should guide you through. If not, please raise an issue in the issue tracker.

Language#

We use British English for our development. We do this for consistency with the broader work context of our lead developers.

Versioning#

This package follows the version format described in PEP440 and Semantic Versioning to describe how the version should change depending on the updates to the code base.

Our commit messages are written using written to follow the conventional commits standard which makes it easy to find the commits that matter when traversing through the commit history.

Note

We don't use the commit messages from conventional commits to automatically generate the changelog and release documentation.

The notebooks generating the datasets#

The top-level directory notebooks contains the notebooks used to produce the Books. Each notebook corresponds with a single Volume (collection of Books with the same name).

Each notebook also has a corresponding .yaml file containing the latest metadata for the Book. See the NotebookMetadata schema(bookshelf.schema.NotebookMetadata) for the expected format of this file.

Creating a new `Volume`#

Start by copying example.py and example.yaml and renaming to the name of the new volume. This provides a simple example to get started.
Update {volume}.yaml with the correct metadata
Update the fetch and processing steps as needed, adding additional Resources to the Book as needed.
Run the notebook and check the output
TODO Perform the release procedure to upload the built book to the remote BookShelf bookshelf save {volume}

Updating a `Volume`'s version#

Update the version attribute in the metadata file
Modify other metadata attributes as needed
Update the data fetching and processing steps in the notebook
Run the notebook and check the output
TODO Perform the release procedure to upload the built book to the remote BookShelf

Testing a notebook locally#

You can run a notebook with a specified output directory for local testing:

uv run bookshelf run --output /path/to/custom/directory <notebook_name>

The generated book can then be used directly from the local directory. Note that the path to the custom directory needs to specify the version of the Book. When loading the Book, you must also specify the version and the edition otherwise it will query the remote bookshelf.

import bookshelf

shelf = bookshelf.BookShelf("/path/to/custom/directory/{version}")
edition = 1

new_book = shelf.load("{notebook_name}", version="{version}", edition=edition)

When updating an existing Book, remember to increase the version or the edition to make sure you load your newly generated data, not the old data.

Releasing#

Releasing is semi-automated via a CI job. The CI job requires the type of version bump that will be performed to be manually specified. See the poetry docs for the list of available bump rules.

Standard process#

The steps required are the following:

Bump the version: manually trigger the "bump" stage from the latest commit in main (pipelines are here). A valid "bump_rule" (see https://python-poetry.org/docs/cli/#version) will need to be specified via the "BUMP_RULE" CI variable (see https://docs.gitlab.com/ee/ci/variables/). This will then trigger a release, including publication to PyPI.
Download the artefacts from the release job. The release_notes.md artefact will be pre-filled with the list of changes included in this release. You find it in the release-bundle zip file at the artefacts section. The announcements section should be completed manually to highlight any particularly notable changes or other announcements (or deleted if not relevant for this release).
Once the release notes are filled out, use them to make a release.
That's it, release done, make noise on social media of choice, do whatever else
Enjoy the newly available version