Adventures in Rye

Rye is a fantastic packaging solution for Python. See Python Package Management – Rye is All You Need by Elliot Simpson (Kiwi PyCon 2024)

There have been a couple of things to work out though.

Installing Rye on A Colleague’s Windows Computer in a Corporate Setting

The installation didn’t seem to properly work. Solution – switch off VPN when trying to install rye. Success! 🙂

Guardrails-AI

Most versions worked fine with rye but 0.5.10 didn’t make it possible to run the guardrails hub commands to install valid_sql e.g.

guardrails hub install hub://guardrails/valid_sql

So, having activated the virtual environment, I did a pip installation instead

python -m pip install guardrails-ai

and then added the dependency to pyproject.toml so rye wouldn’t remove it if I ran rye sync later. Installing valid_sql from guardrails hub worked after that.

wxPython

I ran:

rye add wxpython

and it built it successfully. Unfortunately, I didn’t build support for webview in.

Problem – trying to run the following didn’t work:

wx.html2.WebView.New() -> NotImplementedError

To build support for webview I needed to:

sudo apt install libwebkit2gtk-4.1-dev

and

sudo apt install clang

to avoid the error:

error: command 'clang' failed: No such file or directory

and possibly libwebkit2gtk-4.1-dev (although I haven’t checked whether removing it breaks things)

rye remove wxpython

Then, before running:

rye add wxpython --sync

I deleted ~/.cache/uv/built-wheels-v2/index/b2a7eb67d4c26b82/wxpython

Everything built correctly after that.

Internal Python Imports Without Tears

Flaky, Breaky Internal Importing

Maybe you’ve been lucky so far and never experienced the flaky, breaky side of internal importing in Python – for example, mysterious ModuleNotFound exceptions. But if internal importing has been a problem, typically when code is imported from a multi-folder code base, it is good to know there is at least one guaranteed solution.

If you want your internal imports to work reliably, the following rules will guarantee success. Other approaches may also work, at least some of the time, but the rules presented below will always work. Following these rules will also make it possible to run modules as scripts1 no matter where they occur in the code structure. This can be especially useful during development.

Tear-Free Rules2

  • Be Absolute
  • Anchor Well
  • Folder Imports Bad!
  • Don’t Just “Run” Code

#1 Be Absolute

Use absolute importing everywhere in your code package, not just where you think internal importing is broken

Python has both relative and absolute importing. Relative importing is relative to the file doing the importing and uses dots e.g. one for the existing folder, two for the parent folder, three for grandparent etc. How many dots you need, and what your import looks like, depends on where you are in the folder structure. Absolute importing starts from an anchor point which is the same wherever the file doing the importing is.

DO:

import charm.tables.freq
import charm.conf
import charm.utils.stats.parametric.ttest

DON’T:

from ..stats.parametric import ttest

from . import ttest  ## importing same thing but doing it from another module

#2 Anchor Well

Anchor absolute imports from the code package folder. E.g. if we have a module in a location like: /home/pygrant/projects/charm/charm/tables/freq.py we would import it like:

import charm.tables.freq

assuming the code package folder was the rightmost charm folder.

It is common to use the same name for the surrounding folder as the code package folder but we don’t have to and the following example might make the situation more clear. If we have a module in a location like: /home/pygrant/projects/charm_project/charm/tables/freq.py we would import it from charm (the code package folder) not charm_project (the surrounding folder).

DO:

import charm.tables.freq

DON’T:

import tables.freq  ## <===== missing the anchor i.e. the code package folder

#3 Folder Imports Bad!

Don’t import folders – instead import modules or attributes of modules.

DO:

import charm.tables.freq

DON’T:

import charm.tables  ## <== a folder - so you might be importing
                     ## the __init__.py under that folder if there is one
tables.freq.get_html()

#4 Don’t Just “Run” Code

One doesn’t simply run code. Code is always executed with a particular context, often implicitly. Use one of the ways that works i.e.

  • that puts the surrounding folder in the sys.path, so Python can find your modules to actually import their contents
  • and resolves module names (in particular, folders used with the dot notation) – once again, so Python can find your modules

Either use -m option

python -m code_package.folder.folder.script  <– without .py extension

or

ensure your IDE has the surrounding folder, the folder surrounding the code_package, in its python path (sys.path) (possibly defined in quirky, possibly unintuitive IDE-specific ways)

You can always run the following one-liner before the problem to see what is in your python path:

import sys; print('\n'.join(sys.path))

Final Comments

I have seen mysterious internal importing problems impact numerous Python developers. The Python import system is very flexible and extensible but it is far from simple. Flaky, breaky internal importing is definitely not only a problem for beginners.

Confusion is increased by such factors as:

  • Python ignoring repeated imports of the same module (name caching). This is a perfectly logical behaviour but it means a faulty import in one piece of code might be ignored in favour of a working import in another piece of code. Or vice versa – a working import statement might be ignored because of an earlier faulty import. Remember in Rule #1 – “Use absolute importing everywhere in your code package”
  • IDE quirks e.g. in VS Code I was advised by a friend that the following was necessary for success:

    In .vscode/settings.json add:

    "terminal.integrated.env.windows": {
    "PYTHONPATH": "${workspaceFolder}"
    }

    where ${workspaceFolder} points “to the surrounding folder”, either relative to the workspace folder using the ${...} syntax, or as an absolute path. Also put this in .env as an absolute path to the surrounding folder:
    PYTHONPATH=<path-to-surrounding folder>

    Simple right? 😉

    PyCharm seems to require using the correct content root and allowing the Content Roots to be added to the PYTHONPATH. If the project is created off the surrounding folder this is probably the default behaviour but if this doesn’t happen it is not obvious how to fix the problem.

You don’t necessarily have to follow the “rules” above to get your internal imports working, but why take the risk? Follow the rules and then you can turn your precious attention to other programming issues. Here they are again:

  • Be Absolute
  • Anchor Well
  • Folder Imports Bad!
  • Don’t Just “Run” Code
  1. Running a script means actually doing something (e.g. writing a file, making an API call, etc)
    rather than just defining something without running it (e.g. defining a function or a class). ↩︎
  2. The “Tears” theme is a nod to the popular statistics book “Statistics Without Tears” ↩︎

Python Versions – What They Add

This is a personal list of what I really liked (and disliked) about each release of Python.

3.11

  • 19% speed improvement
  • more informative KeyError handling
  • TOML batteries included

3.10

  • Better messages for syntax errors e.g. "SyntaxError: { was never closed" for line 1 where the curly brace started rather than "SyntaxError: '{' was never closed" for line 3 which was an innocent line
  • Note – Structural Pattern Matching should be considered more of an anti-feature given its problems and its limited benefits for a dynamically typed language like Python

3.9

  • String methods removesuffix and removeprefix (NOT same as rstrip() as that works on the letters). Note – absence of underscores in method names
  • Union operator for dicts (new dict which is an update of the first by the second) e.g. up_to_date_dict = orig_dict | fresh_dict

3.8

  • f-strings support e.g. f"{var=}"
  • walrus operator := (an antifeature with costs that outweigh benefits)
  • positional-only parameters (so can change names without breaking code) – like extending list comprehensions to cover dictionaries, sets, and tuples it completes the coverage as you’d expect

3.7

  • Nothing

3.6

  • f-strings (massive)
  • Underscores in numbers e.g. 3_500_000 – as a data guy this is huge

Second Impressions of Python Pattern Matching

Less is More

One of the beautiful things about Python is its simplicity. We don’t want it to end up like those languages which have a designed-by-committee feel where everyone gets to add features and there are many ways of doing everything. Every feature that is added not only has to have some value but that value must outweigh the cost of the additional complexity. The more features, the more learning is required before being able to understand and edit other people’s code; the greater the risk that learning will be broad and shallow; and the larger the bug surface for the language. Simple is good.

In the previous blog post I concluded that Pattern Matching was a positive addition to the language. After looking further into the gotchas of Python Pattern Matching, and listening to the arguments of a friend (you know who you are :-)), I have become much less sure. In balance, I suspect Python Pattern Matching probably doesn’t pass the Must-Be-Really-Valuable-To-Justify-the-Increased-Complexity test.

Arguments Against Python Pattern Matching

Significant Gotchas

This section includes some material from the previous blog post but with a different emphasis and more detail.

Similarity to object instantiation misleading

Imagine we have a Point class:

class Point:
def __init__(self, x, y):
pass


case Point(x, y): seems to me to be an obvious way of looking for such a Point object and unpacking its values into x and y but it isn’t allowed. It is a perfectly valid syntax for instantiating a Point object but we are not instantiating an object and supplying the object the case condition – instead we are supplying a pattern to be matched and unpacked. We have to have a firm grasp on the notion that Python patterns are not objects. If we forget we get a TypeError:

case Point(0, 999):
TypeError: Point() accepts 0 positional sub-patterns (2 given)

Note, we must match the parameter names (the left side) but can unpack to any variable names we like (the right side). For example, all of the following will work:

case Point(x=x, y=y):
case Point(x=lon, y=lat):

case Point(x=apple, y=banana):

but

case Point(a=x, b=y):

will not.

It is a bit disconcerting being forced to use what feel like keyword arguments in our patterns when the original class definition is optionally positional. We should expect lots of mistakes here and it’ll require concentration to pick them up in code review.

Similarity to isinstance misleading

case Point:, case int:, case str:, case float: don’t work as you might expect. The proper approach is to supply parentheses: using the example of integer patterns, we need case int():, or, if we want to “capture” the value into, say, x, case int(x):. But if we don’t know about the need for parentheses, or we slip up (easy to do) these inadvertant patterns will match anything and assign it to the name Point or int or str etc. Definitely NOT what we want.

the builtin str is now broken – hopefully obviously

The only protection against making this mistake is when you accidentally do this before other case conditions – e.g.

case int:
^
SyntaxError: name capture 'int' makes remaining patterns unreachable

Otherwise you’re on your own and completely free to make broken code. This will probably be a common error because of our experience with isinstance where we supply the type e.g. isinstance(x, int). Which reminds me of a passage in Through the Looking-Glass, and What Alice Found There by Lewis Carroll.

‘Crawling at your feet,’ said the Gnat (Alice drew her feet back in some alarm), ‘you may observe a Bread-and-Butterfly. Its wings are thin slices of Bread-and-butter, its body is a crust, and its head is a lump of sugar.’

‘And what does it live on?’

‘Weak tea with cream in it.’

A new difficulty came into Alice’s head. ‘Supposing it couldn’t find any?’ she suggested.

‘Then it would die, of course.’

‘But that must happen very often,’ Alice remarked thoughtfully.

‘It always happens,’ said the Gnat. After this, Alice was silent for a minute or two, pondering.

Summary

There will be lots of mistakes when using the match case even in the most common cases. And more complex usage is not readable without learning a lot more about Pattern Matching. See PEP 622.

Readable but only if you understand the mini-language

Basically it is a yet another mini-language to learn.

Arguments for Python Pattern Matching

Sometimes we need to pattern match and a match / case syntax is quite elegant. The way values are unpacked into names is also really nice.

We will certainly love this feature if we are moving away from duck typing and adopting the programming style of statically-typed languages like Scala. But maybe we shouldn’t encourage this style of programming by making it easier to write in Python.

Verdict

In balance, Python Pattern Matching doesn’t seem to pass the Must-Be-Really-Valuable-To-Justify-the-Increased-Complexity test. And I’m not alone in wondering this (Musings on Python’s Pattern Matching). I enjoyed coming to grips with the syntax but I think it is like for else and Python Enums – best avoided. But we will see. We can’t always tell what the future of a feature will be – maybe it will turn out to be very useful and one day there will be a third blog post ;-).

First Impressions of Python Pattern Matching

Controversy – Python Adding Everything But the Kitchen Sink?

Is Python piling in too many new features taken from other languages? Is Pattern Matching yet another way of doing things for a language which has prided itself on there being one obvious way of doing things? In short, is Python being ruined by people who don’t appreciate the benefits of Less Is More?

Short answer: No ;-). Long answer: see below. Changed answer: see Second Impressions of Python Pattern Matching

I appreciate arguments for simplicity, and I want the bar for new features to be high, but I am glad Pattern Matching made its way in. It will be the One Obvious Way for doing, errr, pattern matching. Pattern matching may not be as crucial in a dynamically typed language like Python but it is still useful. And the syntax is nice too. match and case are pretty self-explanatory. Of course, there are some gotchas to watch out for but pattern matching is arguably one of the most interesting additions to Python since f-strings in Python 3.6. So let’s have a look and see what we can do with it.

What is Pattern Matching?

Not having used pattern matching before in other languages I wasn’t quite sure how to think about it. According to Tomáš Karabela I’ve been using something similar without realising it (Python 3.10 Pattern Matching in Action). But what is Pattern Matching? And why would I use it in Python?

I have found it useful to think of Pattern Matching as a Switch statement on steroids.

Pattern Matching is a Switch Statement on Steroids

The switch aspect is the way the code passes through various expressions until it matches. That makes total sense – one of the earliest things we need to do in programming is respond according the value / nature of something. Even SQL has CASE WHEN statements.

The steroids aspect has two parts:

Unpacking

Unpacking is beautiful and elegant so it is a real pleasure to find it built into Python’s Pattern Matching. case (x, y): looks for a two-tuple and unpacks into x and y names ready to use in the code under the case condition. case str(x): looks for a string and assigns the name x to it. Handy.

Type Matching

Duck typing can be a wonderful thing but sometimes it is useful to match on type. case Point(x=x, y=y): only matches if an instance of the Point class. case int(x): only matches if an integer. Note – case Point(x, y): doesn’t work in a case condition because positional sub-patterns aren’t allowed. Confused? More detail on possible gotchas below:

Gotchas

Sometimes you think exactly the same as the language feature, sometimes not. Here are some mistakes I made straight away. Typically they were caused by a basic misunderstanding of how Pattern Matching “thinks”.

Patterns aren’t Objects

case Point(x, y): seems to me to be an obvious way of looking for a Point object and unpacking its values into x and y but it isn’t allowed. It is the correct syntax for instantiating a Point object but we are not instantiating an object and supplying the object the case condition – instead we are supply a pattern to be matched and unpacked. We have to have a firm grasp on the notion that Python patterns are not objects.

Patterns Ain’t Objects

If we forget we get a TypeError:

case Point(0, 999):
TypeError: Point() accepts 0 positional sub-patterns (2 given)

Note, you must match the parameter names (the left side) but can unpack to any variable names you like (the right side). It may feel a bit odd being forced to use what feel like keyword arguments when the original class definition is positional but we must remember that we aren’t making an object – we are designing a pattern and collecting variables / names.

case Point(x=0, y=y): the x= and y= are the required syntax for a pattern. We insist on x being 0 but y can be anything (which we add the name y to). We could equally have written case Point(x=0, y=y_val): or case Point(x=0, y=spam):.

case Point:, case int:, case str:, case float: don’t work as you might expect. They match anything and assign it to the name Point or int or str etc. Definitely NOT what you want. The only protection is when you accidentally do this before other case conditions – e.g.

case int:
^
SyntaxError: name capture 'int' makes remaining patterns unreachable

This might become a common error because of our experience with isinstance where we supply the type e.g. isinstance(x, int). Remember:

case Patterns have to be Patterns

Instead, using the example of integer patterns, we need case int():, or, if we want to “capture” the value into, say, x, case int(x):.

Guards and Traditional Switch Statements

It is very common in switch / case when statements to have conditions. Sometimes it is the whole point of the construct – we supply one value and respond differently according to its values / attributes. E.g. in rough pseudocode if temp < 0 freezing, if > 100 boiling, otherwise normal. In Pattern Matching value conditions are secondary. We match on a pattern first and then, perhaps evaluating the unpacked variables in an expression, we apply a guard condition.

case float(x) if abs(x) < 100:
...
case float(x) if abs(x) < 200:
etc

Depending on how it’s used we could think of “Pattern Matching” as “Pattern and Guard Condition Matching”.

The most similar to a classic switch construct would be:

match:
case val if val < 10:
...
case val if val < 20:
...
etc

One final thought: there seems to be nothing special about the “default” option (using switch language) – namely, case _:. It merely captures anything that hasn’t already been matched and puts _ as the name i.e. it is a throwaway variable. We could capture and use that value with a normal variable name although that is optional because there’s nothing stopping us from referencing the original name fed into match. But, for example, case mopup: would work.

How to play with it on Linux

Make image using Dockerfile e.g. the following based on Install Python3 in Ubuntu Docker (I added vim and a newer Ubuntu image plus changed apt-get to apt (even though it allegedly has an unstable cli interface):

FROM ubuntu:20.04

RUN apt update && apt install -y software-properties-common gcc && \
add-apt-repository -y ppa:deadsnakes/ppa

RUN apt update && apt install -y python3.10 python3-distutils python3-pip python3-apt vim

docker build --tag pyexp .

(don’t forget the dot at the end – that’s a reference to the path to find Dockerfile)

Then make container:

docker create --name pyexp_cont pyexp

and run it with access to bash command line

docker container run -it pyexp /bin/bash

Useful Links

Pattern matching tutorial for Pythonic code | Pydon’t

Python 3.10 Pattern Matching in Action

PEP 622

Problem upgrading Dokuwiki after long neglect

When you upgrade Dokuwiki (brilliant wiki BTW) it does it in-place. Unfortunately, I hadn’t upgraded for about 5 years in spite of numerous warnings so the deprecated.php file bit me by assuming there were specific files in my instance of dokuwiki that needed deprecating and breaking when the system was so old they weren’t ;-). Once I figured the problem out the solution was simply to comment out the appropriate parts of deprecated.php. After all, there is no need to deprecate something you don’t even have.

Reminder: include

ini_set('display_errors', 1); ini_set('display_startup_errors', 1); error_reporting(E_ALL);

in index.php when trying to identify problems running a PHP app. And remove it afterwards.

An easy way of using SuperHELP

Many people find using the command line a bit daunting and Jupyter notebooks unfamiliar so here is another way of using SuperHELP that may be much easier.

  1. Put the following at the top of your script:

    import superhelp
    superhelp.this()


  2. Run the script
  3. See the advice
  4. Learn something; make changes 🙂

If you don’t want the default web output you can specify other output such as ‘cli’ (command line interface) or ‘md’ (markdown):

import superhelp
superhelp.this(output='md')

If you don’t want the default ‘Extra’ level of detail you can specify a different detail_level (‘Brief’ or ‘Main’) e.g.

import superhelp
superhelp.this(output='md', detail_level='Brief')

or:

import superhelp
superhelp.this(detail_level='Main')

Edited to remove reference to explicit file_path argument e.g. superhelp.this(__file__). It is not needed anymore thanks to the Python inspect library. Also displayer -> output and message_level -> detail_level.

Why SuperHELP for Python?

I created SuperHELP to make it easier for people to write good Python code. You can find the project at https://pypi.org/project/superhelp/

SuperHELP logo

To summarise the rationale for SuperHELP I have thought of a few taglines:

SuperHELP – Python help that really helps!

SuperHELP – Help for Humans!

and even

SuperHELP – Make Python Pythonic!

Some context might help explain:

Python has secured its position in the top tier of programming languages and more people than ever are learning to write Python. But, let’s be honest, a lot of Python code being written is not what it could be. Even an elegant language like Python can be written badly or in a way that is hard to read or maintain.

To make it easier for people to write good Python we are already well-served by IDEs from IDLE upwards. People have easy access to style linters and IDEs clearly signal syntax errors and basic mistakes like unused variables. But wouldn’t it be great if people could check their code to see if there are better, more Pythonic ways of doing things? Or to learn more about basic language features and Python data structures than the standard help can offer?

That’s where SuperHELP can play a role.

So what exactly is SuperHELP? Basically it is an advice engine. SuperHELP reads a snippet of Python and provides advice, warnings, and basic information based on what it finds. For example, it might notice a function docstring is missing and show a template for adding one. Or identify the use of a named tuple and explain how to add docstrings to individual fields or the named tuple as a whole.

The intention is to make sure that everyone, from beginners upwards, learns something useful. Even an advanced Python programmer might not appreciate the benefits of using functools.wraps when creating their own decorator. Or an experienced Java programmer might not realise that Python properties are a much better option than getters and setters.

So how can people use SuperHELP? For most people, the easiest way will be to open a binder web notebook and enter their code there.

Binder

Of course, because SuperHELP is a pip package https://pypi.org/project/superhelp/ it can be installed alongside Python on a machine and used directly from the terminal e.g.

shelp --code "people = ['Tomas', 'Sal']" --output html --detail-level Main

If the output chosen is html (the default) output looks like:

And if –output cli (command line interface i.e. terminal or console) is selected, output looks like:

In all likelihood, there will be other ways of making SuperHELP advice more readily available – probably through integration with other platforms and processes.

SuperHELP needs your help:

  • If you have any ideas, or the ability to help in some way, please contact me at superhelp@p-s.co.nz.
  • Spread the word about SuperHELP through your social networks.

ImageMagick cache resources exhausted resolved

My sofastatistics application relies on ImageMagick to convert PDFs to PNGs. The sort of command run under the hood was:

convert -density 1200 -borderColor "#ff0000" -border 1x1 -fuzz 1% -trim "/home/g/projects/sofastats_proj/storage/img_processing/pdf2img_testing/KEEPME/raw_pdf.pdf" "/home/g/projects/sofastats_proj/storage/img_processing/pdf2img_testing/density_1200_fuzz_on_#ff0000.png"

Recently, commands like this stopped working properly on my development machine. They wouldn’t handle high resolutions (600dpi seemed to be the limit for the images I was handling) and it took a very long time to complete.

I finally worked out what was going on by running the same tests on different machines.

Seemingly modest differences in CPU specs can create massive differences in the time required to convert PDFs to PNGs. What takes 4 seconds on an i7 can take 71 seconds on an i5. And creating a 1200 dpi image might take 0.5 minutes on an i7 and 18.5 minutes on an i5. So the slowdown was because I had shifted from a fast desktop to a (more convenient but slower) laptop.

The second issue was the error message about cache resources exhausted. This happened on a range of fast and slow machines and the amount of RAM seemed irrelevant. Interestingly, the problem only occurred on Ubuntu 17.04 and not 16.10. The reason was the policy.xml settings in /etc/ImageMagick-6/. It seems the following was set too low:

<policy domain="resource" name="disk" value="1GiB">
I changed it to:
<policy domain="resource" name="disk" value="10GiB"/>
and it would successfully create high-resolution PNGs even if it took a long time.

Hmmm – now I change the disk setting back and I am still able to make the higher-resolution images, even after rebooting. WAT?!

One other note – settings in policy.xml cannot be loosened through arguments supplied to the convert program via the CLI – they can only be tightened. It looks like these changes are all about security concerns with the intention of preventing malicious resource starvation.

Some references:

Eclipse Neon with Pydev on Ubuntu 17.04

  1. Check whether 32 or 64 bit – System Settings > Details > Overview – then download from https://www.eclipse.org/downloads/
  2. Right click on eclipse-inst-linux32.tar.gz and select Extract Here
  3. cd into eclipse-installer/ then run ./eclipse-inst
  4. Choose top item (standard Java version of eclipse)
  5. Make desktop icon for launcher as per step 7 here – How to install Eclipse using its installer remembering to point to java-neon instead of java-mars etc. Drag file onto launcher and icon will add itself and be operational
  6. If you don’t see toolbars in eclipse modify exec line Exec=env SWT_GTK3=0 as per eclipse doesn’t work with ubuntu 16.04
  7. I also added pydev in the usual way using link to http://www.pydev.org/updates as per http://www.pydev.org/download.html