Duck Hinting – Reconciling Python Type Hinting & Duck Typing

Type hinting and Duck Typing are both central parts of modern Python. How can we get them to work together?

Type Hinting

Type hinting allows us to indicate what types a function expects as arguments and what types it will return. E.g.

def get_greeting(name: str, age: int) -> str:

As the name makes clear, type hints are hints only but there are tools that enable type hinting to be checked and enforced.

Type hinting is still maturing in Python and more recent versions are less verbose and more readable e.g. int | float rather than typing.Union[int, float]

Initially I didn’t like type hinting. I suspected it was a costly and ritualistic safety behaviour rather than a way of writing better code. And people can certainly use type hinting like that. But I have changed my overall position on Python type hinting.

There are at least three ways to use type hinting in Python:

  • To improve readability and reduce confusion. For example, what is the following function expecting for the date parameter?

    def myfunc(date):

    Is it OK if I supply date as a number (20220428) or must it be a string (“20220428”) or maybe a datetime.date object? Type hinting can remove that confusion

    def myfunc(date: int):

    It is now much more straightforward to consume this function and to modify it with confidence. I strongly recommend using type hinting like this.

    The following is very readable in my view:

    def get_data(date: int, *, show_chart=False) -> pd.DataFrame:

    Note: there is no need to add : bool when the meaning is obvious (unless wanting to use static checking as discussed below). It just increases the noise-to-signal ratio in the code.

    def get_data(date: int, *, show_chart: bool=False) -> pd.DataFrame:

    On a similar vein, if a parameter obviously expects an integer or a float I don’t add a type hint. For example type hinting for the parameters below reduces readability for negligible practical gain:

    def create_coord(x: int | float, y: int | float) -> Coord:

    ## knowing mypy considers int a subtype of float
    def create_coord(x: float, y: float) -> Coord

    Instead

    def create_coord(x, y) -> Coord:

    is probably the right balance. To some extent it is a matter of personal taste.

    I hope people aren’t deterred from using basic type hinting to increase readability by the detail required to fully implement type hinting.
  • To enable static checks with the aim of preventing type-based bugs – which potentially makes sense when working on a complex code base worked on by multiple coders. Not so sure about ordinary scripts – the costs can be very high (see endless Stack Overflow questions on Type Hinting complexities and subtleties).
  • For some people I suspect type hinting is a ritual self-soothing behaviour which functions to spin out the stressful decision-making parts of programming. Obviously I am against this especially when it makes otherwise beautiful, concise Python code “noisy” and less readable.

Duck Typing

Python follows a Duck Typing philosophy – we look for matching behaviour not specific types. If it walks like a duck and quacks like a duck it’s a duck!

For example, we might not care whether we get a tuple or a list as long as we can reference the items by index. Returning to the Duck illustration, we don’t test for the DNA of a Duck (its type) we check for behaviours we’ll rely on e.g. can it quack?

There are pros and cons to every approach to typing but I like the way Python’s typing works: strong typing (1 != ‘1’); dynamic typing (defined at run-time); and duck typing (anything as long as it quacks).

Structural Type Hinting using Protocol

If we were able to blend type hinting with duck typing we would get something where we could specify accepted types based on the behaviours they support.

Fortunately this is very easy in Python using Protocol. Below I contrast Nominal Type Hinting (based on the names of types) with Structural Type Hinting (based on internal details of types e.g. behaviours / methods)

Full Example in Code

from typing import Protocol

class AttackerClass:
    def attack(self):
       pass

class Soldier(AttackerClass):
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Soldier {self.name} swings their sword!")

class Archer(AttackerClass):
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Archer {self.name} fires their arrow!")

class Catapult:
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Catapult {self.name} hurls their fireball!")

## only accept instances of AttackerClass (or its subclasses)
def all_attack_by_type(units: list[AttackerClass]):
    for unit in units:
        unit.attack()

s1 = Soldier('Tim')
s2 = Soldier('Sal')
a1 = Archer('Cal')
c1 = Catapult('Mech')

all_attack_by_type([s1, s2, a1])

## will run but not wouldn't pass static check
## because c1 not an AttackerClass instance
## (or the instance of a subclass)

## comment out next line if checking with mypy etc - will fail
all_attack_by_type([s1, s2, a1, c1])

class AttackerProtocol(Protocol):
    def attack(self) -> None:
        … ## idiomatic to use ellipsis

def all_attack_duck_typed(units: list[AttackerProtocol]):
    for unit in units:
        unit.attack()

## will run as before even though c1 included
## but will also pass a static check
all_attack_duck_typed([s1, s2, a1, c1])

But what if we cannot or should not modify an object by adding the required method to it directly e.g. code in library code? That is the topic of the next blog post.

Python Named Tuples vs Data Classes

Named tuples (collections.namedtuple) are a very convenient way of gathering data together so it can be passed around. They provide a guaranteed data structure and parts can be referenced by name. It is even possible to set defaults, albeit in a clunky way. And they’re fast and lightweight (see https://death.andgravity.com/namedtuples). What more could you want!

Example:

"""
Process Points Of Interest (POIs)
“””
from collections import namedtuple
POI = namedtuple('POI', 'x, y, lbl')

pois = []
for … :
pois.append(POI(x, y, city))
… (maybe in a different module)
for poi in pois:
logging.info(f"{poi.lbl}")

.lbl will always refer to a label and I can add more fields to POI in future and this code here will still work. If I change lbl to label in POI this will break hard and obviously.

An alternative is data classes – a more readable version of standard named tuples that makes defaults much easier to use. Hmmmm – maybe these are better. They arrived in 3.7 so instead of:

from collections import namedtuple
## Note - defaults apply to rightmost first
POI = namedtuple('POI', 'x, y, lbl', defaults=['Unknown'])

We can now use:

from dataclasses import dataclass
@dataclass(frozen=True, order=True)

class POI:
x: float
y: float
lbl: str = 'Unknown'

The readable type hinting is nice too and the data structure self-documents.

Not bad but I have concluded data classes are generally an inferior version of what we can make with typing.NamedTuple. Instead of:

from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class POI:
    x: float
    y: float
    lbl: str = 'Unknown'

We can use:

from typing import NamedTuple
class POI(NamedTuple):
    x: float
    y: float
    lbl: str = 'Unknown'

which has immutability and ordering out of the box not to mention having less boilerplate.

Data classes are only superior as far as I can see when mutability is wanted (frozen=False).

Another black mark against data classes – you have to import asdict from dataclasses to create a dictionary whereas namedtuples (and NamedTuples) have the _asdict() method built in.

Note – it very easy to swap from NamedTuple to dataclass should the need for mutability ever arise – all the field definitions are the same.

So the verdict is in favour of typing.NamedTuple.

That’s a lot of detail on something minor but I do use small data structures of this sort a lot to make code safer and more readable. Vanilla tuples are usually dangerous (they force you to use indexing referencing which is brittle and unreadable); and dictionaries require consistent use of keys and it is potentially painful to change keys. NamedTuples seem like a perfect balance in Python. Make them a standard part of your coding practice.

Python Versions – What They Add

This is a personal list of what I really liked (and disliked) about each release of Python.

3.11

  • 19% speed improvement
  • more informative KeyError handling
  • TOML batteries included

3.10

  • Better messages for syntax errors e.g. "SyntaxError: { was never closed" for line 1 where the curly brace started rather than "SyntaxError: '{' was never closed" for line 3 which was an innocent line
  • Note – Structural Pattern Matching should be considered more of an anti-feature given its problems and its limited benefits for a dynamically typed language like Python

3.9

  • String methods removesuffix and removeprefix (NOT same as rstrip() as that works on the letters). Note – absence of underscores in method names
  • Union operator for dicts (new dict which is an update of the first by the second) e.g. up_to_date_dict = orig_dict | fresh_dict

3.8

  • f-strings support e.g. f"{var=}"
  • walrus operator := (an antifeature with costs that outweigh benefits)
  • positional-only parameters (so can change names without breaking code) – like extending list comprehensions to cover dictionaries, sets, and tuples it completes the coverage as you’d expect

3.7

  • Nothing

3.6

  • f-strings (massive)
  • Underscores in numbers e.g. 3_500_000 – as a data guy this is huge

Extracting from Youtube on Ubuntu

How to extract a video from Youtube and optionally to extract mp3 from the video. On Ubuntu 20.10:

Install youtube-dl (for downloading from youtube):

sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl

Install ffmpeg for sound extraction:

sudo apt install ffmpeg

How to extract video (note – has to be your paths to python3 and youtube-dl):

/usr/bin/python3 /usr/local/bin/youtube-dl https://www.youtube.com/watch?<youtubeid>

How to extract sound:

ffmpeg -i '<title>-<youtubeid>.mkv' '<title>.mp3'

With thanks to 3 Easy Ways to Download YouTube Videos in Ubuntu and Other Linux Distributions; How to Extract Audio From Video in Ubuntu and Other Linux Distributions; and Youtube-dl: Python not found (18.04)

Second Impressions of Python Pattern Matching

Less is More

One of the beautiful things about Python is its simplicity. We don’t want it to end up like those languages which have a designed-by-committee feel where everyone gets to add features and there are many ways of doing everything. Every feature that is added not only has to have some value but that value must outweigh the cost of the additional complexity. The more features, the more learning is required before being able to understand and edit other people’s code; the greater the risk that learning will be broad and shallow; and the larger the bug surface for the language. Simple is good.

In the previous blog post I concluded that Pattern Matching was a positive addition to the language. After looking further into the gotchas of Python Pattern Matching, and listening to the arguments of a friend (you know who you are :-)), I have become much less sure. In balance, I suspect Python Pattern Matching probably doesn’t pass the Must-Be-Really-Valuable-To-Justify-the-Increased-Complexity test.

Arguments Against Python Pattern Matching

Significant Gotchas

This section includes some material from the previous blog post but with a different emphasis and more detail.

Similarity to object instantiation misleading

Imagine we have a Point class:

class Point:
def __init__(self, x, y):
pass


case Point(x, y): seems to me to be an obvious way of looking for such a Point object and unpacking its values into x and y but it isn’t allowed. It is a perfectly valid syntax for instantiating a Point object but we are not instantiating an object and supplying the object the case condition – instead we are supplying a pattern to be matched and unpacked. We have to have a firm grasp on the notion that Python patterns are not objects. If we forget we get a TypeError:

case Point(0, 999):
TypeError: Point() accepts 0 positional sub-patterns (2 given)

Note, we must match the parameter names (the left side) but can unpack to any variable names we like (the right side). For example, all of the following will work:

case Point(x=x, y=y):
case Point(x=lon, y=lat):

case Point(x=apple, y=banana):

but

case Point(a=x, b=y):

will not.

It is a bit disconcerting being forced to use what feel like keyword arguments in our patterns when the original class definition is optionally positional. We should expect lots of mistakes here and it’ll require concentration to pick them up in code review.

Similarity to isinstance misleading

case Point:, case int:, case str:, case float: don’t work as you might expect. The proper approach is to supply parentheses: using the example of integer patterns, we need case int():, or, if we want to “capture” the value into, say, x, case int(x):. But if we don’t know about the need for parentheses, or we slip up (easy to do) these inadvertant patterns will match anything and assign it to the name Point or int or str etc. Definitely NOT what we want.

the builtin str is now broken – hopefully obviously

The only protection against making this mistake is when you accidentally do this before other case conditions – e.g.

case int:
^
SyntaxError: name capture 'int' makes remaining patterns unreachable

Otherwise you’re on your own and completely free to make broken code. This will probably be a common error because of our experience with isinstance where we supply the type e.g. isinstance(x, int). Which reminds me of a passage in Through the Looking-Glass, and What Alice Found There by Lewis Carroll.

‘Crawling at your feet,’ said the Gnat (Alice drew her feet back in some alarm), ‘you may observe a Bread-and-Butterfly. Its wings are thin slices of Bread-and-butter, its body is a crust, and its head is a lump of sugar.’

‘And what does it live on?’

‘Weak tea with cream in it.’

A new difficulty came into Alice’s head. ‘Supposing it couldn’t find any?’ she suggested.

‘Then it would die, of course.’

‘But that must happen very often,’ Alice remarked thoughtfully.

‘It always happens,’ said the Gnat. After this, Alice was silent for a minute or two, pondering.

Summary

There will be lots of mistakes when using the match case even in the most common cases. And more complex usage is not readable without learning a lot more about Pattern Matching. See PEP 622.

Readable but only if you understand the mini-language

Basically it is a yet another mini-language to learn.

Arguments for Python Pattern Matching

Sometimes we need to pattern match and a match / case syntax is quite elegant. The way values are unpacked into names is also really nice.

We will certainly love this feature if we are moving away from duck typing and adopting the programming style of statically-typed languages like Scala. But maybe we shouldn’t encourage this style of programming by making it easier to write in Python.

Verdict

In balance, Python Pattern Matching doesn’t seem to pass the Must-Be-Really-Valuable-To-Justify-the-Increased-Complexity test. And I’m not alone in wondering this (Musings on Python’s Pattern Matching). I enjoyed coming to grips with the syntax but I think it is like for else and Python Enums – best avoided. But we will see. We can’t always tell what the future of a feature will be – maybe it will turn out to be very useful and one day there will be a third blog post ;-).

First Impressions of Python Pattern Matching

Controversy – Python Adding Everything But the Kitchen Sink?

Is Python piling in too many new features taken from other languages? Is Pattern Matching yet another way of doing things for a language which has prided itself on there being one obvious way of doing things? In short, is Python being ruined by people who don’t appreciate the benefits of Less Is More?

Short answer: No ;-). Long answer: see below. Changed answer: see Second Impressions of Python Pattern Matching

I appreciate arguments for simplicity, and I want the bar for new features to be high, but I am glad Pattern Matching made its way in. It will be the One Obvious Way for doing, errr, pattern matching. Pattern matching may not be as crucial in a dynamically typed language like Python but it is still useful. And the syntax is nice too. match and case are pretty self-explanatory. Of course, there are some gotchas to watch out for but pattern matching is arguably one of the most interesting additions to Python since f-strings in Python 3.6. So let’s have a look and see what we can do with it.

What is Pattern Matching?

Not having used pattern matching before in other languages I wasn’t quite sure how to think about it. According to Tomáš Karabela I’ve been using something similar without realising it (Python 3.10 Pattern Matching in Action). But what is Pattern Matching? And why would I use it in Python?

I have found it useful to think of Pattern Matching as a Switch statement on steroids.

Pattern Matching is a Switch Statement on Steroids

The switch aspect is the way the code passes through various expressions until it matches. That makes total sense – one of the earliest things we need to do in programming is respond according the value / nature of something. Even SQL has CASE WHEN statements.

The steroids aspect has two parts:

Unpacking

Unpacking is beautiful and elegant so it is a real pleasure to find it built into Python’s Pattern Matching. case (x, y): looks for a two-tuple and unpacks into x and y names ready to use in the code under the case condition. case str(x): looks for a string and assigns the name x to it. Handy.

Type Matching

Duck typing can be a wonderful thing but sometimes it is useful to match on type. case Point(x=x, y=y): only matches if an instance of the Point class. case int(x): only matches if an integer. Note – case Point(x, y): doesn’t work in a case condition because positional sub-patterns aren’t allowed. Confused? More detail on possible gotchas below:

Gotchas

Sometimes you think exactly the same as the language feature, sometimes not. Here are some mistakes I made straight away. Typically they were caused by a basic misunderstanding of how Pattern Matching “thinks”.

Patterns aren’t Objects

case Point(x, y): seems to me to be an obvious way of looking for a Point object and unpacking its values into x and y but it isn’t allowed. It is the correct syntax for instantiating a Point object but we are not instantiating an object and supplying the object the case condition – instead we are supply a pattern to be matched and unpacked. We have to have a firm grasp on the notion that Python patterns are not objects.

Patterns Ain’t Objects

If we forget we get a TypeError:

case Point(0, 999):
TypeError: Point() accepts 0 positional sub-patterns (2 given)

Note, you must match the parameter names (the left side) but can unpack to any variable names you like (the right side). It may feel a bit odd being forced to use what feel like keyword arguments when the original class definition is positional but we must remember that we aren’t making an object – we are designing a pattern and collecting variables / names.

case Point(x=0, y=y): the x= and y= are the required syntax for a pattern. We insist on x being 0 but y can be anything (which we add the name y to). We could equally have written case Point(x=0, y=y_val): or case Point(x=0, y=spam):.

case Point:, case int:, case str:, case float: don’t work as you might expect. They match anything and assign it to the name Point or int or str etc. Definitely NOT what you want. The only protection is when you accidentally do this before other case conditions – e.g.

case int:
^
SyntaxError: name capture 'int' makes remaining patterns unreachable

This might become a common error because of our experience with isinstance where we supply the type e.g. isinstance(x, int). Remember:

case Patterns have to be Patterns

Instead, using the example of integer patterns, we need case int():, or, if we want to “capture” the value into, say, x, case int(x):.

Guards and Traditional Switch Statements

It is very common in switch / case when statements to have conditions. Sometimes it is the whole point of the construct – we supply one value and respond differently according to its values / attributes. E.g. in rough pseudocode if temp < 0 freezing, if > 100 boiling, otherwise normal. In Pattern Matching value conditions are secondary. We match on a pattern first and then, perhaps evaluating the unpacked variables in an expression, we apply a guard condition.

case float(x) if abs(x) < 100:
...
case float(x) if abs(x) < 200:
etc

Depending on how it’s used we could think of “Pattern Matching” as “Pattern and Guard Condition Matching”.

The most similar to a classic switch construct would be:

match:
case val if val < 10:
...
case val if val < 20:
...
etc

One final thought: there seems to be nothing special about the “default” option (using switch language) – namely, case _:. It merely captures anything that hasn’t already been matched and puts _ as the name i.e. it is a throwaway variable. We could capture and use that value with a normal variable name although that is optional because there’s nothing stopping us from referencing the original name fed into match. But, for example, case mopup: would work.

How to play with it on Linux

Make image using Dockerfile e.g. the following based on Install Python3 in Ubuntu Docker (I added vim and a newer Ubuntu image plus changed apt-get to apt (even though it allegedly has an unstable cli interface):

FROM ubuntu:20.04

RUN apt update && apt install -y software-properties-common gcc && \
add-apt-repository -y ppa:deadsnakes/ppa

RUN apt update && apt install -y python3.10 python3-distutils python3-pip python3-apt vim

docker build --tag pyexp .

(don’t forget the dot at the end – that’s a reference to the path to find Dockerfile)

Then make container:

docker create --name pyexp_cont pyexp

and run it with access to bash command line

docker container run -it pyexp /bin/bash

Useful Links

Pattern matching tutorial for Pythonic code | Pydon’t

Python 3.10 Pattern Matching in Action

PEP 622

Python Enum Gotcha

There is plenty of useful information in Why You Should Use More Enums In Python – A gentle introduction to enumerations in Python. After reading it I decided to look into using Enums more. Unfortunately I hit a major Gotcha quite quickly.

Basically, comparisons don’t work with vanilla Enum (unlike IntEnum). Checking the official Python documentation seemed to confirm this understanding:

“Comparisons against non-enumeration values will always compare not equal (again, IntEnum was explicitly designed to behave differently …)” — https://docs.python.org/3/library/enum.html

In the snippet below, Pieces subclasses vanilla Enum and has, at least from my point of view, very unexpected results. Pieces2 is based on IntEnum and behaves as might be expected.

import enum

class Pieces(enum.Enum):
    PAWN = 8
    ROOK = 2
    BISHOP = 2


print(2 == Pieces.ROOK) ## False WAT?! Always not equal to non enums
print(Pieces.ROOK == 2) ## False WAT?!
print(Pieces.BISHOP == Pieces.ROOK) ## True
print(Pieces.PAWN == Pieces.ROOK) ## False


class Pieces2(enum.IntEnum):
    PAWN = 8
    ROOK = 2
    BISHOP = 2


print(2 == Pieces2.ROOK) ## True
print(Pieces2.ROOK == 2) ## True
print(Pieces2.BISHOP == Pieces2.ROOK) ## True
print(Pieces2.PAWN == Pieces2.ROOK) ## False

It is easy to imagine this behaviour creating baffling bugs. Interesting.

Problem upgrading Dokuwiki after long neglect

When you upgrade Dokuwiki (brilliant wiki BTW) it does it in-place. Unfortunately, I hadn’t upgraded for about 5 years in spite of numerous warnings so the deprecated.php file bit me by assuming there were specific files in my instance of dokuwiki that needed deprecating and breaking when the system was so old they weren’t ;-). Once I figured the problem out the solution was simply to comment out the appropriate parts of deprecated.php. After all, there is no need to deprecate something you don’t even have.

Reminder: include

ini_set('display_errors', 1); ini_set('display_startup_errors', 1); error_reporting(E_ALL);

in index.php when trying to identify problems running a PHP app. And remove it afterwards.

An easy way of using SuperHELP

Many people find using the command line a bit daunting and Jupyter notebooks unfamiliar so here is another way of using SuperHELP that may be much easier.

  1. Put the following at the top of your script:

    import superhelp
    superhelp.this()


  2. Run the script
  3. See the advice
  4. Learn something; make changes 🙂

If you don’t want the default web output you can specify other output such as ‘cli’ (command line interface) or ‘md’ (markdown):

import superhelp
superhelp.this(output='md')

If you don’t want the default ‘Extra’ level of detail you can specify a different detail_level (‘Brief’ or ‘Main’) e.g.

import superhelp
superhelp.this(output='md', detail_level='Brief')

or:

import superhelp
superhelp.this(detail_level='Main')

Edited to remove reference to explicit file_path argument e.g. superhelp.this(__file__). It is not needed anymore thanks to the Python inspect library. Also displayer -> output and message_level -> detail_level.

Why SuperHELP for Python?

I created SuperHELP to make it easier for people to write good Python code. You can find the project at https://pypi.org/project/superhelp/

SuperHELP logo

To summarise the rationale for SuperHELP I have thought of a few taglines:

SuperHELP – Python help that really helps!

SuperHELP – Help for Humans!

and even

SuperHELP – Make Python Pythonic!

Some context might help explain:

Python has secured its position in the top tier of programming languages and more people than ever are learning to write Python. But, let’s be honest, a lot of Python code being written is not what it could be. Even an elegant language like Python can be written badly or in a way that is hard to read or maintain.

To make it easier for people to write good Python we are already well-served by IDEs from IDLE upwards. People have easy access to style linters and IDEs clearly signal syntax errors and basic mistakes like unused variables. But wouldn’t it be great if people could check their code to see if there are better, more Pythonic ways of doing things? Or to learn more about basic language features and Python data structures than the standard help can offer?

That’s where SuperHELP can play a role.

So what exactly is SuperHELP? Basically it is an advice engine. SuperHELP reads a snippet of Python and provides advice, warnings, and basic information based on what it finds. For example, it might notice a function docstring is missing and show a template for adding one. Or identify the use of a named tuple and explain how to add docstrings to individual fields or the named tuple as a whole.

The intention is to make sure that everyone, from beginners upwards, learns something useful. Even an advanced Python programmer might not appreciate the benefits of using functools.wraps when creating their own decorator. Or an experienced Java programmer might not realise that Python properties are a much better option than getters and setters.

So how can people use SuperHELP? For most people, the easiest way will be to open a binder web notebook and enter their code there.

Binder

Of course, because SuperHELP is a pip package https://pypi.org/project/superhelp/ it can be installed alongside Python on a machine and used directly from the terminal e.g.

shelp --code "people = ['Tomas', 'Sal']" --output html --detail-level Main

If the output chosen is html (the default) output looks like:

And if –output cli (command line interface i.e. terminal or console) is selected, output looks like:

In all likelihood, there will be other ways of making SuperHELP advice more readily available – probably through integration with other platforms and processes.

SuperHELP needs your help:

  • If you have any ideas, or the ability to help in some way, please contact me at superhelp@p-s.co.nz.
  • Spread the word about SuperHELP through your social networks.