Internal Python Imports Without Tears

Flaky, Breaky Internal Importing

Maybe you’ve been lucky so far and never experienced the flaky, breaky side of internal importing in Python – for example, mysterious ModuleNotFound exceptions. But if internal importing has been a problem, typically when code is imported from a multi-folder code base, it is good to know there is at least one guaranteed solution.

If you want your internal imports to work reliably, the following rules will guarantee success. Other approaches may also work, at least some of the time, but the rules presented below will always work. Following these rules will also make it possible to run modules as scripts1 no matter where they occur in the code structure. This can be especially useful during development.

Tear-Free Rules2

  • Be Absolute
  • Anchor Well
  • Folder Imports Bad!
  • Don’t Just “Run” Code

#1 Be Absolute

Use absolute importing everywhere in your code package, not just where you think internal importing is broken

Python has both relative and absolute importing. Relative importing is relative to the file doing the importing and uses dots e.g. one for the existing folder, two for the parent folder, three for grandparent etc. How many dots you need, and what your import looks like, depends on where you are in the folder structure. Absolute importing starts from an anchor point which is the same wherever the file doing the importing is.

DO:

import charm.tables.freq
import charm.conf
import charm.utils.stats.parametric.ttest

DON’T:

from ..stats.parametric import ttest

from . import ttest  ## importing same thing but doing it from another module

#2 Anchor Well

Anchor absolute imports from the code package folder. E.g. if we have a module in a location like: /home/pygrant/projects/charm/charm/tables/freq.py we would import it like:

import charm.tables.freq

assuming the code package folder was the rightmost charm folder.

It is common to use the same name for the surrounding folder as the code package folder but we don’t have to and the following example might make the situation more clear. If we have a module in a location like: /home/pygrant/projects/charm_project/charm/tables/freq.py we would import it from charm (the code package folder) not charm_project (the surrounding folder).

DO:

import charm.tables.freq

DON’T:

import tables.freq  ## <===== missing the anchor i.e. the code package folder

#3 Folder Imports Bad!

Don’t import folders – instead import modules or attributes of modules.

DO:

import charm.tables.freq

DON’T:

import charm.tables  ## <== a folder - so you might be importing
                     ## the __init__.py under that folder if there is one
tables.freq.get_html()

#4 Don’t Just “Run” Code

One doesn’t simply run code. Code is always executed with a particular context, often implicitly. Use one of the ways that works i.e.

  • that puts the surrounding folder in the sys.path, so Python can find your modules to actually import their contents
  • and resolves module names (in particular, folders used with the dot notation) – once again, so Python can find your modules

Either use -m option

python -m code_package.folder.folder.script  <– without .py extension

or

ensure your IDE has the surrounding folder, the folder surrounding the code_package, in its python path (sys.path) (possibly defined in quirky, possibly unintuitive IDE-specific ways)

You can always run the following one-liner before the problem to see what is in your python path:

import sys; print('\n'.join(sys.path))

Final Comments

I have seen mysterious internal importing problems impact numerous Python developers. The Python import system is very flexible and extensible but it is far from simple. Flaky, breaky internal importing is definitely not only a problem for beginners.

Confusion is increased by such factors as:

  • Python ignoring repeated imports of the same module (name caching). This is a perfectly logical behaviour but it means a faulty import in one piece of code might be ignored in favour of a working import in another piece of code. Or vice versa – a working import statement might be ignored because of an earlier faulty import. Remember in Rule #1 – “Use absolute importing everywhere in your code package”
  • IDE quirks e.g. in VS Code I was advised by a friend that the following was necessary for success:

    In .vscode/settings.json add:

    "terminal.integrated.env.windows": {
    "PYTHONPATH": "${workspaceFolder}"
    }

    where ${workspaceFolder} points “to the surrounding folder”, either relative to the workspace folder using the ${...} syntax, or as an absolute path. Also put this in .env as an absolute path to the surrounding folder:
    PYTHONPATH=<path-to-surrounding folder>

    Simple right? 😉

    PyCharm seems to require using the correct content root and allowing the Content Roots to be added to the PYTHONPATH. If the project is created off the surrounding folder this is probably the default behaviour but if this doesn’t happen it is not obvious how to fix the problem.

You don’t necessarily have to follow the “rules” above to get your internal imports working, but why take the risk? Follow the rules and then you can turn your precious attention to other programming issues. Here they are again:

  • Be Absolute
  • Anchor Well
  • Folder Imports Bad!
  • Don’t Just “Run” Code
  1. Running a script means actually doing something (e.g. writing a file, making an API call, etc)
    rather than just defining something without running it (e.g. defining a function or a class). ↩︎
  2. The “Tears” theme is a nod to the popular statistics book “Statistics Without Tears” ↩︎

The When of Python Project

The When of Python is a fledgling community initiative. The goal is to effectively shrink Python so it fits our brains by providing guidance on when we should use particular language features (and when we should not).

For example, should we use collections.namedtuple, typing.namedtuple, or dataclasses.dataclass? Should we use the walrus operator? StrEnum? Structural Pattern Matching? Comprehensions? Lambda? etc.

Find out more at https://whenof.python.nz/blog. The project can be followed at https://twitter.com/WhenOfPython and the video of the Kiwi PyCon talk which launched the project is at https://t.co/MgGi6kQeme. There is also a Proof-of-Concept app you can check out at https://whenof.python.nz

Making External Objects Play Nicely with Type Hinting

The Problem

We want our function to type hint such that only objects with a particular method e.g. .attack(), can be received. Something like this:

def all_attack(units: Sequence(HasAttackMethod)):

One of the objects only has a .fire() method. How do we make sure it can meet the requirement and be used in the same way as the other objects?

Extend the External Object – Make a Wrapper

Here we simply make a light wrapper around the object that doesn’t change the interface apart from adding the missing method. The new method is added to the object using setattr and everything else is delegated to the original object.

Now we can supply the extended object and it will function as required. It will also meet the required type specification, meeting the Attacker Protocol by having an .attack() method.

"""
OK - we have one weird external object - how do we handle it?
"""
from functools import partial
from typing import Protocol, Sequence


class Soldier:
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Soldier {self.name} swings their sword!")

class Archer:
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Archer {self.name} fires their arrow!")

class Catapult:
    def __init__(self, name):
        self.name = name
    def fire(self):  ## not the required name
        print(f"Catapult {self.name} hurls their fireball!")

s1 = Soldier('Tim')
s2 = Soldier('Sal')
a1 = Archer('Cal')
c1 = Catapult('Mech')


class Extend():
    def __init__(self, original, method_name, method):
        self.original = original
        ## being a method self needs to come first
        method_with_self_arg = partial(method, self=self)
        ## Add method e.g. .attack() 
        setattr(self, method_name, method_with_self_arg)
    def __getattr__(self, name, *args, **kwargs):
        """
        Delegate everything other than the added method
        to the original object
        """
        original_attr = self.original.__getattribute__(name)
        if hasattr(original_attr, '__call__'):  ## handle methods
            return original_attr(*args, **kwargs)
        else:  ## handle ordinary attributes
            return original_attr

def catapult_attack(self):
    self.original.fire()

c1_extended = Extend(original=c1, method_name='attack',
    method=catapult_attack)

class Attacker(Protocol):
    def attack(self) -> None:
        ...

def all_attack(units: Sequence[Attacker]):
    for unit in units:
        unit.attack()

all_attack([s1, s2, a1, c1_extended])
print(c1_extended.name)

Making a Polymorpher

Another approach is to make a polymorpher function. This relies upon the classes pip package https://pypi.org/project/classes/. See also: https://classes.readthedocs.io/en/latest/pages/supports.html#regular-types

As at the time of writing there seem to be problems making this code pass mypy which defeats the purpose. The explanation seems to be some brittleness in the (very promising) classes package but that is not fully confirmed.

For example, if you were making a game and you needed a function to only accept objects with an attack() method you could do it like this:

def all_attack(units: Sequence[Supports[RequiredAttackType]]):
    for unit in units:
        unit.attack()

There are some nasty and opaque steps required to make this work. These can be concentrated in one function to make it easy to use this duck hinting approach. In the code below the nastiest part is in polymorpher(). The function is included to show how it works and so others can potentially improve on it.

"""
Purpose: in a static-typing context,
how do we enable external objects
to comply with type requirements in a function?

E.g. all_attack(units: Sequence[RequiredType])
which only permits sequences of the required type.

That is no problem with Soldier and Archer
as they have attack methods.
But Catapult doesn't have an attack method.
How do we give it one without:
* changing the source code
* monkey patching

And how do we type hint (enforced by mypy) so that only objects which have the attack method can be received?

Basically it would be nice if we could add an attack method to
Catapult using a decorator e.g. like

@attack.instance(Catapult)
def _fn_catapult(catapult: Catapult):
    print(f"Catapult {catapult.name} hurls their fireball!")

And, instead of saying which types are acceptable,
follow a duck typing approach and say
what behaviours are required e.g. must have an attack method. E.g.

def all_attack(units: Sequence[Supports[RequiredAttackType]]):
    …

To make that all work, we have made a nasty, opaque function (polymorpher) that gives us both parts needed - the decorator,
and the behaviour type.
"""

from typing import Protocol, Sequence

## pip installation required
from classes import AssociatedType, Supports, typeclass

## UNITS *********************************************************

class Soldier:
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Soldier {self.name} swings their sword!")

class Archer:
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Archer {self.name} fires their arrow!")

## assume this class defined an object we can't / shouldn't modify
class Catapult:
    def __init__(self, name):
        self.name = name

s1 = Soldier('Tim')
s2 = Soldier('Sal')
a1 = Archer('Cal')
c1 = Catapult('Mech')

## ATTACKING *******************************************************
Opaque, somewhat nasty function that makes it easy to make external objects polymorphic so they'll work with a function requiring 
compliant objects

def polymorpher(*, method_name: str):
    ## want separate types depending on method_name
    RequiredType = type(method_name, (AssociatedType, ), {})
    @typeclass(RequiredType)
    def fn(instance):
        pass
    def self2pass(self):
        pass
    RequiredProtocol = type(
        f'{method_name}_protocol',
        (Protocol, ),
        {'_is_runtime_protocol': True,
             method_name: self2pass,
        })
    @fn.instance(protocol=RequiredProtocol)
    def _fn_run_method(obj: RequiredProtocol):
        return getattr(obj, method_name)()
    return fn, RequiredType

## method_name and returned fn don't have to share name
attack, RequiredAttackType = polymorpher(method_name='attack')

## now easy to make external objects polymorphic
so they'll work with a function requiring compliant objects

@attack.instance(Catapult)
def _fn_catapult(catapult: Catapult):
    print(f"Catapult {catapult.name} hurls their fireball!")

def all_attack(units: Sequence[Supports[RequiredAttackType]]):
    for unit in units:
        attack(unit)  ## note - not unit.attack()

## UNITs ATTACKing ****************************************************

units = [s1, s2, a1, c1]
all_attack(units)

Wrap Up

Type hinting and duck typing are both parts of modern Python and they need to play together well – ideally in a simpler and standardised way. The code above hopefully assists with that evolution.

See also https://www.daan.fyi/writings/python-protocols and https://medium.com/alan/python-typing-with-mypy-progressive-type-checking-on-a-large-code-base-74e13356bd3a

And a big thanks to Ben Denham for doing most of the heavy lifting with making the polymorpher (almost ;-)) work.

Duck Hinting – Reconciling Python Type Hinting & Duck Typing

Type hinting and Duck Typing are both central parts of modern Python. How can we get them to work together?

Type Hinting

Type hinting allows us to indicate what types a function expects as arguments and what types it will return. E.g.

def get_greeting(name: str, age: int) -> str:

As the name makes clear, type hints are hints only but there are tools that enable type hinting to be checked and enforced.

Type hinting is still maturing in Python and more recent versions are less verbose and more readable e.g. int | float rather than typing.Union[int, float]

Initially I didn’t like type hinting. I suspected it was a costly and ritualistic safety behaviour rather than a way of writing better code. And people can certainly use type hinting like that. But I have changed my overall position on Python type hinting.

There are at least three ways to use type hinting in Python:

  • To improve readability and reduce confusion. For example, what is the following function expecting for the date parameter?

    def myfunc(date):

    Is it OK if I supply date as a number (20220428) or must it be a string (“20220428”) or maybe a datetime.date object? Type hinting can remove that confusion

    def myfunc(date: int):

    It is now much more straightforward to consume this function and to modify it with confidence. I strongly recommend using type hinting like this.

    The following is very readable in my view:

    def get_data(date: int, *, show_chart=False) -> pd.DataFrame:

    Note: there is no need to add : bool when the meaning is obvious (unless wanting to use static checking as discussed below). It just increases the noise-to-signal ratio in the code.

    def get_data(date: int, *, show_chart: bool=False) -> pd.DataFrame:

    On a similar vein, if a parameter obviously expects an integer or a float I don’t add a type hint. For example type hinting for the parameters below reduces readability for negligible practical gain:

    def create_coord(x: int | float, y: int | float) -> Coord:

    ## knowing mypy considers int a subtype of float
    def create_coord(x: float, y: float) -> Coord

    Instead

    def create_coord(x, y) -> Coord:

    is probably the right balance. To some extent it is a matter of personal taste.

    I hope people aren’t deterred from using basic type hinting to increase readability by the detail required to fully implement type hinting.
  • To enable static checks with the aim of preventing type-based bugs – which potentially makes sense when working on a complex code base worked on by multiple coders. Not so sure about ordinary scripts – the costs can be very high (see endless Stack Overflow questions on Type Hinting complexities and subtleties).
  • For some people I suspect type hinting is a ritual self-soothing behaviour which functions to spin out the stressful decision-making parts of programming. Obviously I am against this especially when it makes otherwise beautiful, concise Python code “noisy” and less readable.

Duck Typing

Python follows a Duck Typing philosophy – we look for matching behaviour not specific types. If it walks like a duck and quacks like a duck it’s a duck!

For example, we might not care whether we get a tuple or a list as long as we can reference the items by index. Returning to the Duck illustration, we don’t test for the DNA of a Duck (its type) we check for behaviours we’ll rely on e.g. can it quack?

There are pros and cons to every approach to typing but I like the way Python’s typing works: strong typing (1 != ‘1’); dynamic typing (defined at run-time); and duck typing (anything as long as it quacks).

Structural Type Hinting using Protocol

If we were able to blend type hinting with duck typing we would get something where we could specify accepted types based on the behaviours they support.

Fortunately this is very easy in Python using Protocol. Below I contrast Nominal Type Hinting (based on the names of types) with Structural Type Hinting (based on internal details of types e.g. behaviours / methods)

Full Example in Code

from typing import Protocol

class AttackerClass:
    def attack(self):
       pass

class Soldier(AttackerClass):
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Soldier {self.name} swings their sword!")

class Archer(AttackerClass):
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Archer {self.name} fires their arrow!")

class Catapult:
    def __init__(self, name):
        self.name = name
    def attack(self):
        print(f"Catapult {self.name} hurls their fireball!")

## only accept instances of AttackerClass (or its subclasses)
def all_attack_by_type(units: list[AttackerClass]):
    for unit in units:
        unit.attack()

s1 = Soldier('Tim')
s2 = Soldier('Sal')
a1 = Archer('Cal')
c1 = Catapult('Mech')

all_attack_by_type([s1, s2, a1])

## will run but not wouldn't pass static check
## because c1 not an AttackerClass instance
## (or the instance of a subclass)

## comment out next line if checking with mypy etc - will fail
all_attack_by_type([s1, s2, a1, c1])

class AttackerProtocol(Protocol):
    def attack(self) -> None:
        … ## idiomatic to use ellipsis

def all_attack_duck_typed(units: list[AttackerProtocol]):
    for unit in units:
        unit.attack()

## will run as before even though c1 included
## but will also pass a static check
all_attack_duck_typed([s1, s2, a1, c1])

But what if we cannot or should not modify an object by adding the required method to it directly e.g. code in library code? That is the topic of the next blog post.

Python Named Tuples vs Data Classes

Named tuples (collections.namedtuple) are a very convenient way of gathering data together so it can be passed around. They provide a guaranteed data structure and parts can be referenced by name. It is even possible to set defaults, albeit in a clunky way. And they’re fast and lightweight (see https://death.andgravity.com/namedtuples). What more could you want!

Example:

"""
Process Points Of Interest (POIs)
“””
from collections import namedtuple
POI = namedtuple('POI', 'x, y, lbl')

pois = []
for … :
pois.append(POI(x, y, city))
… (maybe in a different module)
for poi in pois:
logging.info(f"{poi.lbl}")

.lbl will always refer to a label and I can add more fields to POI in future and this code here will still work. If I change lbl to label in POI this will break hard and obviously.

An alternative is data classes – a more readable version of standard named tuples that makes defaults much easier to use. Hmmmm – maybe these are better. They arrived in 3.7 so instead of:

from collections import namedtuple
## Note - defaults apply to rightmost first
POI = namedtuple('POI', 'x, y, lbl', defaults=['Unknown'])

We can now use:

from dataclasses import dataclass
@dataclass(frozen=True, order=True)

class POI:
x: float
y: float
lbl: str = 'Unknown'

The readable type hinting is nice too and the data structure self-documents.

Not bad but I have concluded data classes are generally an inferior version of what we can make with typing.NamedTuple. Instead of:

from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class POI:
    x: float
    y: float
    lbl: str = 'Unknown'

We can use:

from typing import NamedTuple
class POI(NamedTuple):
    x: float
    y: float
    lbl: str = 'Unknown'

which has immutability and ordering out of the box not to mention having less boilerplate.

Data classes are only superior as far as I can see when mutability is wanted (frozen=False).

Another black mark against data classes – you have to import asdict from dataclasses to create a dictionary whereas namedtuples (and NamedTuples) have the _asdict() method built in.

Note – it very easy to swap from NamedTuple to dataclass should the need for mutability ever arise – all the field definitions are the same.

So the verdict is in favour of typing.NamedTuple.

That’s a lot of detail on something minor but I do use small data structures of this sort a lot to make code safer and more readable. Vanilla tuples are usually dangerous (they force you to use indexing referencing which is brittle and unreadable); and dictionaries require consistent use of keys and it is potentially painful to change keys. NamedTuples seem like a perfect balance in Python. Make them a standard part of your coding practice.

Python Versions – What They Add

This is a personal list of what I really liked (and disliked) about each release of Python.

3.11

  • 19% speed improvement
  • more informative KeyError handling
  • TOML batteries included

3.10

  • Better messages for syntax errors e.g. "SyntaxError: { was never closed" for line 1 where the curly brace started rather than "SyntaxError: '{' was never closed" for line 3 which was an innocent line
  • Note – Structural Pattern Matching should be considered more of an anti-feature given its problems and its limited benefits for a dynamically typed language like Python

3.9

  • String methods removesuffix and removeprefix (NOT same as rstrip() as that works on the letters). Note – absence of underscores in method names
  • Union operator for dicts (new dict which is an update of the first by the second) e.g. up_to_date_dict = orig_dict | fresh_dict

3.8

  • f-strings support e.g. f"{var=}"
  • walrus operator := (an antifeature with costs that outweigh benefits)
  • positional-only parameters (so can change names without breaking code) – like extending list comprehensions to cover dictionaries, sets, and tuples it completes the coverage as you’d expect

3.7

  • Nothing

3.6

  • f-strings (massive)
  • Underscores in numbers e.g. 3_500_000 – as a data guy this is huge

Problem upgrading Dokuwiki after long neglect

When you upgrade Dokuwiki (brilliant wiki BTW) it does it in-place. Unfortunately, I hadn’t upgraded for about 5 years in spite of numerous warnings so the deprecated.php file bit me by assuming there were specific files in my instance of dokuwiki that needed deprecating and breaking when the system was so old they weren’t ;-). Once I figured the problem out the solution was simply to comment out the appropriate parts of deprecated.php. After all, there is no need to deprecate something you don’t even have.

Reminder: include

ini_set('display_errors', 1); ini_set('display_startup_errors', 1); error_reporting(E_ALL);

in index.php when trying to identify problems running a PHP app. And remove it afterwards.