PyData Global 2025

projspec: what's this project anyway?
2025-12-09 , General Track

Most code and related workflows take place in "projects", directories with descriptive metadata. There are so many types of these around these days, it is hard to know what is contained where. projspec solves this for the majority of the python-data ecosystem, so that you can introspect your projects, act on them, and search across all your projects, local or remote.


Daily workflows in pydata usually occur in the context of projects - a directory tree of stuff, with special metadata files describing those contents. Many metadata specifications are in use for each of the many tools that operate on projects, storing information in small yaml, toml or json files, or in the pyproject.toml file for python-specific projects. This model encompasses not only the majority of the environment management tools and task runners in pydata (uv, pixi, poetry, etc) but other essential tools (e.g., git), definitions (e.g., hugging-face dataset), deployment (briefcase, helm, wheel) and workflow-specific metadata (e.g., pyscript).

The range of possible metadata is bewildering! Most projects show how to invoke their functionality in README files, with the first step downloading some specific tool. In some way, all this flexibility has taken us backwards. There is no easy way to tell what type a project is and what definitions it contains without reading the supporting documentation and browsing specific files, or even downloading the whole thing and running a specific tool against it.

projspec aspires to be a layer over the most common pydata related project types. It provides introspection of project type and contents from the metadata definitions, and this can be done on remote project directories too. For each project type, we infer a set of "contents" (things that are defined in the project and inherently part of it) and "artifacts" (things the project can make or do, usually by calling a subprocess). A project can be multiple types at once: a project designed to be executed with pixi, for instance, still likely contains git information and may also have dataset declarations, things that pixi is not concerned with. Projects may also contain sub-projects of the same or different type, e.g., a conda recipe alongside a code library.

Projspec, due to be released in time for this talk, will provide a handy API to work with projects of many types, including introspection and effecting actions. It will have a way to index many projects locally or remotely, to allow for querying with complex criteria, to find the project that matches your needs - contains certain datasets, depends on specific library/versions or is capable of creating particular output types. We will demonstrate all of this!


Prior Knowledge Expected:

No