Second Impressions of Python Pattern Matching

Less is More

One of the beautiful things about Python is its simplicity. We don’t want it to end up like those languages which have a designed-by-committee feel where everyone gets to add features and there are many ways of doing everything. Every feature that is added not only has to have some value but that value must outweigh the cost of the additional complexity. The more features, the more learning is required before being able to understand and edit other people’s code; the greater the risk that learning will be broad and shallow; and the larger the bug surface for the language. Simple is good.

In the previous blog post I concluded that Pattern Matching was a positive addition to the language. After looking further into the gotchas of Python Pattern Matching, and listening to the arguments of a friend (you know who you are :-)), I have become much less sure. In balance, I suspect Python Pattern Matching probably doesn’t pass the Must-Be-Really-Valuable-To-Justify-the-Increased-Complexity test.

Arguments Against Python Pattern Matching

Significant Gotchas

This section includes some material from the previous blog post but with a different emphasis and more detail.

Similarity to object instantiation misleading

Imagine we have a Point class:

class Point:
def __init__(self, x, y):
pass


case Point(x, y): seems to me to be an obvious way of looking for such a Point object and unpacking its values into x and y but it isn’t allowed. It is a perfectly valid syntax for instantiating a Point object but we are not instantiating an object and supplying the object the case condition – instead we are supplying a pattern to be matched and unpacked. We have to have a firm grasp on the notion that Python patterns are not objects. If we forget we get a TypeError:

case Point(0, 999):
TypeError: Point() accepts 0 positional sub-patterns (2 given)

Note, we must match the parameter names (the left side) but can unpack to any variable names we like (the right side). For example, all of the following will work:

case Point(x=x, y=y):
case Point(x=lon, y=lat):

case Point(x=apple, y=banana):

but

case Point(a=x, b=y):

will not.

It is a bit disconcerting being forced to use what feel like keyword arguments in our patterns when the original class definition is optionally positional. We should expect lots of mistakes here and it’ll require concentration to pick them up in code review.

Similarity to isinstance misleading

case Point:, case int:, case str:, case float: don’t work as you might expect. The proper approach is to supply parentheses: using the example of integer patterns, we need case int():, or, if we want to “capture” the value into, say, x, case int(x):. But if we don’t know about the need for parentheses, or we slip up (easy to do) these inadvertant patterns will match anything and assign it to the name Point or int or str etc. Definitely NOT what we want.

the builtin str is now broken – hopefully obviously

The only protection against making this mistake is when you accidentally do this before other case conditions – e.g.

case int:
^
SyntaxError: name capture 'int' makes remaining patterns unreachable

Otherwise you’re on your own and completely free to make broken code. This will probably be a common error because of our experience with isinstance where we supply the type e.g. isinstance(x, int). Which reminds me of a passage in Through the Looking-Glass, and What Alice Found There by Lewis Carroll.

‘Crawling at your feet,’ said the Gnat (Alice drew her feet back in some alarm), ‘you may observe a Bread-and-Butterfly. Its wings are thin slices of Bread-and-butter, its body is a crust, and its head is a lump of sugar.’

‘And what does it live on?’

‘Weak tea with cream in it.’

A new difficulty came into Alice’s head. ‘Supposing it couldn’t find any?’ she suggested.

‘Then it would die, of course.’

‘But that must happen very often,’ Alice remarked thoughtfully.

‘It always happens,’ said the Gnat. After this, Alice was silent for a minute or two, pondering.

Summary

There will be lots of mistakes when using the match case even in the most common cases. And more complex usage is not readable without learning a lot more about Pattern Matching. See PEP 622.

Readable but only if you understand the mini-language

Basically it is a yet another mini-language to learn.

Arguments for Python Pattern Matching

Sometimes we need to pattern match and a match / case syntax is quite elegant. The way values are unpacked into names is also really nice.

We will certainly love this feature if we are moving away from duck typing and adopting the programming style of statically-typed languages like Scala. But maybe we shouldn’t encourage this style of programming by making it easier to write in Python.

Verdict

In balance, Python Pattern Matching doesn’t seem to pass the Must-Be-Really-Valuable-To-Justify-the-Increased-Complexity test. I enjoyed coming to grips with the syntax but I think it is like for else and Python Enums – best avoided. But we will see. We can’t always tell what the future of a feature will be – maybe it will turn out to be very useful and one day there will be a third blog post ;-).

First Impressions of Python Pattern Matching

Controversy – Python Adding Everything But the Kitchen Sink?

Is Python piling in too many new features taken from other languages? Is Pattern Matching yet another way of doing things for a language which has prided itself on there being one obvious way of doing things? In short, is Python being ruined by people who don’t appreciate the benefits of Less Is More?

Short answer: No ;-). Long answer: see below. Changed answer: see Second Impressions of Python Pattern Matching

I appreciate arguments for simplicity, and I want the bar for new features to be high, but I am glad Pattern Matching made its way in. It will be the One Obvious Way for doing, errr, pattern matching. Pattern matching may not be as crucial in a dynamically typed language like Python but it is still useful. And the syntax is nice too. match and case are pretty self-explanatory. Of course, there are some gotchas to watch out for but pattern matching is arguably one of the most interesting additions to Python since f-strings in Python 3.6. So let’s have a look and see what we can do with it.

What is Pattern Matching?

Not having used pattern matching before in other languages I wasn’t quite sure how to think about it. According to Tomáš Karabela I’ve been using something similar without realising it (Python 3.10 Pattern Matching in Action). But what is Pattern Matching? And why would I use it in Python?

I have found it useful to think of Pattern Matching as a Switch statement on steroids.

Pattern Matching is a Switch Statement on Steroids

The switch aspect is the way the code passes through various expressions until it matches. That makes total sense – one of the earliest things we need to do in programming is respond according the value / nature of something. Even SQL has CASE WHEN statements.

The steroids aspect has two parts:

Unpacking

Unpacking is beautiful and elegant so it is a real pleasure to find it built into Python’s Pattern Matching. case (x, y): looks for a two-tuple and unpacks into x and y names ready to use in the code under the case condition. case str(x): looks for a string and assigns the name x to it. Handy.

Type Matching

Duck typing can be a wonderful thing but sometimes it is useful to match on type. case Point(x=x, y=y): only matches if an instance of the Point class. case int(x): only matches if an integer. Note – case Point(x, y): doesn’t work in a case condition because positional sub-patterns aren’t allowed. Confused? More detail on possible gotchas below:

Gotchas

Sometimes you think exactly the same as the language feature, sometimes not. Here are some mistakes I made straight away. Typically they were caused by a basic misunderstanding of how Pattern Matching “thinks”.

Patterns aren’t Objects

case Point(x, y): seems to me to be an obvious way of looking for a Point object and unpacking its values into x and y but it isn’t allowed. It is the correct syntax for instantiating a Point object but we are not instantiating an object and supplying the object the case condition – instead we are supply a pattern to be matched and unpacked. We have to have a firm grasp on the notion that Python patterns are not objects.

Patterns Ain’t Objects

If we forget we get a TypeError:

case Point(0, 999):
TypeError: Point() accepts 0 positional sub-patterns (2 given)

Note, you must match the parameter names (the left side) but can unpack to any variable names you like (the right side). It may feel a bit odd being forced to use what feel like keyword arguments when the original class definition is positional but we must remember that we aren’t making an object – we are designing a pattern and collecting variables / names.

case Point(x=0, y=y): the x= and y= are the required syntax for a pattern. We insist on x being 0 but y can be anything (which we add the name y to). We could equally have written case Point(x=0, y=y_val): or case Point(x=0, y=spam):.

case Point:, case int:, case str:, case float: don’t work as you might expect. They match anything and assign it to the name Point or int or str etc. Definitely NOT what you want. The only protection is when you accidentally do this before other case conditions – e.g.

case int:
^
SyntaxError: name capture 'int' makes remaining patterns unreachable

This might become a common error because of our experience with isinstance where we supply the type e.g. isinstance(x, int). Remember:

case Patterns have to be Patterns

Instead, using the example of integer patterns, we need case int():, or, if we want to “capture” the value into, say, x, case int(x):.

Guards and Traditional Switch Statements

It is very common in switch / case when statements to have conditions. Sometimes it is the whole point of the construct – we supply one value and respond differently according to its values / attributes. E.g. in rough pseudocode if temp < 0 freezing, if > 100 boiling, otherwise normal. In Pattern Matching value conditions are secondary. We match on a pattern first and then, perhaps evaluating the unpacked variables in an expression, we apply a guard condition.

case float(x) if abs(x) < 100:
...
case float(x) if abs(x) < 200:
etc

Depending on how it’s used we could think of “Pattern Matching” as “Pattern and Guard Condition Matching”.

The most similar to a classic switch construct would be:

match:
case val if val < 10:
...
case val if val < 20:
...
etc

One final thought: there seems to be nothing special about the “default” option (using switch language) – namely, case _:. It merely captures anything that hasn’t already been matched and puts _ as the name i.e. it is a throwaway variable. We could capture and use that value with a normal variable name although that is optional because there’s nothing stopping us from referencing the original name fed into match. But, for example, case mopup: would work.

How to play with it on Linux

Make image using Dockerfile e.g. the following based on Install Python3 in Ubuntu Docker (I added vim and a newer Ubuntu image plus changed apt-get to apt (even though it allegedly has an unstable cli interface):

FROM ubuntu:20.04

RUN apt update && apt install -y software-properties-common gcc && \
add-apt-repository -y ppa:deadsnakes/ppa

RUN apt update && apt install -y python3.10 python3-distutils python3-pip python3-apt vim

docker build --tag pyexp .

(don’t forget the dot at the end – that’s a reference to the path to find Dockerfile)

Then make container:

docker create --name pyexp_cont pyexp

and run it with access to bash command line

docker container run -it pyexp /bin/bash

Useful Links

Pattern matching tutorial for Pythonic code | Pydon’t

Python 3.10 Pattern Matching in Action

PEP 622

Python Enum Gotcha

There is plenty of useful information in Why You Should Use More Enums In Python – A gentle introduction to enumerations in Python. After reading it I decided to look into using Enums more. Unfortunately I hit a major Gotcha quite quickly.

Basically, comparisons don’t work with vanilla Enum (unlike IntEnum). Checking the official Python documentation seemed to confirm this understanding:

“Comparisons against non-enumeration values will always compare not equal (again, IntEnum was explicitly designed to behave differently …)” — https://docs.python.org/3/library/enum.html

In the snippet below, Pieces subclasses vanilla Enum and has, at least from my point of view, very unexpected results. Pieces2 is based on IntEnum and behaves as might be expected.

import enum

class Pieces(enum.Enum):
    PAWN = 8
    ROOK = 2
    BISHOP = 2


print(2 == Pieces.ROOK) ## False WAT?! Always not equal to non enums
print(Pieces.ROOK == 2) ## False WAT?!
print(Pieces.BISHOP == Pieces.ROOK) ## True
print(Pieces.PAWN == Pieces.ROOK) ## False


class Pieces2(enum.IntEnum):
    PAWN = 8
    ROOK = 2
    BISHOP = 2


print(2 == Pieces2.ROOK) ## True
print(Pieces2.ROOK == 2) ## True
print(Pieces2.BISHOP == Pieces2.ROOK) ## True
print(Pieces2.PAWN == Pieces2.ROOK) ## False

It is easy to imagine this behaviour creating baffling bugs. Interesting.

An easy way of using SuperHELP

Many people find using the command line a bit daunting and Jupyter notebooks unfamiliar so here is another way of using SuperHELP that may be much easier.

  1. Put the following at the top of your script:

    import superhelp
    superhelp.this()


  2. Run the script
  3. See the advice
  4. Learn something; make changes 🙂

If you don’t want the default web output you can specify other output such as ‘cli’ (command line interface) or ‘md’ (markdown):

import superhelp
superhelp.this(output='md')

If you don’t want the default ‘Extra’ level of detail you can specify a different detail_level (‘Brief’ or ‘Main’) e.g.

import superhelp
superhelp.this(output='md', detail_level='Brief')

or:

import superhelp
superhelp.this(detail_level='Main')

Edited to remove reference to explicit file_path argument e.g. superhelp.this(__file__). It is not needed anymore thanks to the Python inspect library. Also displayer -> output and message_level -> detail_level.

Why SuperHELP for Python?

I created SuperHELP to make it easier for people to write good Python code. You can find the project at https://pypi.org/project/superhelp/

SuperHELP logo

To summarise the rationale for SuperHELP I have thought of a few taglines:

SuperHELP – Python help that really helps!

SuperHELP – Help for Humans!

and even

SuperHELP – Make Python Pythonic!

Some context might help explain:

Python has secured its position in the top tier of programming languages and more people than ever are learning to write Python. But, let’s be honest, a lot of Python code being written is not what it could be. Even an elegant language like Python can be written badly or in a way that is hard to read or maintain.

To make it easier for people to write good Python we are already well-served by IDEs from IDLE upwards. People have easy access to style linters and IDEs clearly signal syntax errors and basic mistakes like unused variables. But wouldn’t it be great if people could check their code to see if there are better, more Pythonic ways of doing things? Or to learn more about basic language features and Python data structures than the standard help can offer?

That’s where SuperHELP can play a role.

So what exactly is SuperHELP? Basically it is an advice engine. SuperHELP reads a snippet of Python and provides advice, warnings, and basic information based on what it finds. For example, it might notice a function docstring is missing and show a template for adding one. Or identify the use of a named tuple and explain how to add docstrings to individual fields or the named tuple as a whole.

The intention is to make sure that everyone, from beginners upwards, learns something useful. Even an advanced Python programmer might not appreciate the benefits of using functools.wraps when creating their own decorator. Or an experienced Java programmer might not realise that Python properties are a much better option than getters and setters.

So how can people use SuperHELP? For most people, the easiest way will be to open a binder web notebook and enter their code there.

Binder

Of course, because SuperHELP is a pip package https://pypi.org/project/superhelp/ it can be installed alongside Python on a machine and used directly from the terminal e.g.

shelp --code "people = ['Tomas', 'Sal']" --output html --detail-level Main

If the output chosen is html (the default) output looks like:

And if –output cli (command line interface i.e. terminal or console) is selected, output looks like:

In all likelihood, there will be other ways of making SuperHELP advice more readily available – probably through integration with other platforms and processes.

SuperHELP needs your help:

  • If you have any ideas, or the ability to help in some way, please contact me at superhelp@p-s.co.nz.
  • Spread the word about SuperHELP through your social networks.