ImageMagick cache resources exhausted resolved

My sofastatistics application relies on ImageMagick to convert PDFs to PNGs. The sort of command run under the hood was:

convert -density 1200 -borderColor "#ff0000" -border 1x1 -fuzz 1% -trim "/home/g/projects/sofastats_proj/storage/img_processing/pdf2img_testing/KEEPME/raw_pdf.pdf" "/home/g/projects/sofastats_proj/storage/img_processing/pdf2img_testing/density_1200_fuzz_on_#ff0000.png"

Recently, commands like this stopped working properly on my development machine. They wouldn’t handle high resolutions (600dpi seemed to be the limit for the images I was handling) and it took a very long time to complete.

I finally worked out what was going on by running the same tests on different machines.

Seemingly modest differences in CPU specs can create massive differences in the time required to convert PDFs to PNGs. What takes 4 seconds on an i7 can take 71 seconds on an i5. And creating a 1200 dpi image might take 0.5 minutes on an i7 and 18.5 minutes on an i5. So the slowdown was because I had shifted from a fast desktop to a (more convenient but slower) laptop.

The second issue was the error message about cache resources exhausted. This happened on a range of fast and slow machines and the amount of RAM seemed irrelevant. Interestingly, the problem only occurred on Ubuntu 17.04 and not 16.10. The reason was the policy.xml settings in /etc/ImageMagick-6/. It seems the following was set too low:

<policy domain="resource" name="disk" value="1GiB">
I changed it to:
<policy domain="resource" name="disk" value="10GiB"/>
and it would successfully create high-resolution PNGs even if it took a long time.

Hmmm – now I change the disk setting back and I am still able to make the higher-resolution images, even after rebooting. WAT?!

One other note – settings in policy.xml cannot be loosened through arguments supplied to the convert program via the CLI – they can only be tightened. It looks like these changes are all about security concerns with the intention of preventing malicious resource starvation.

Some references:

Eclipse Neon with Pydev on Ubuntu 17.04

  1. Check whether 32 or 64 bit – System Settings > Details > Overview – then download from https://www.eclipse.org/downloads/
  2. Right click on eclipse-inst-linux32.tar.gz and select Extract Here
  3. cd into eclipse-installer/ then run ./eclipse-inst
  4. Choose top item (standard Java version of eclipse)
  5. Make desktop icon for launcher as per step 7 here – How to install Eclipse using its installer remembering to point to java-neon instead of java-mars etc. Drag file onto launcher and icon will add itself and be operational
  6. If you don’t see toolbars in eclipse modify exec line Exec=env SWT_GTK3=0 as per eclipse doesn’t work with ubuntu 16.04
  7. I also added pydev in the usual way using link to http://www.pydev.org/updates as per http://www.pydev.org/download.html

Shifting my git repo up a folder

My SOFA Statistics git repo was in a location that made sense at the time but became increasingly annoying. And I needed to restructure things anyway to prepare for snap packaging. Time for a shift.

Step 1 – shift existing .git folder and .gitignore
Step 2 – shift all the folders and content
Step 3 – git mv to relocate all folders and files relative to new .git location so git recognises that the files are the same but relocated i.e. will keep all the history

Launchpad – Bazaar to Git

I’ve stored my SOFA Statistics code on launchpad since 2009 and used bazaar (bzr) to do it. But a lot has changed since then and I know use git on a daily basis in my job. So I’d much rather use git for SOFA. Fortunately that is now possible on launchpad.

I found https://help.launchpad.net/Code/Git to be useful apart from the migration instructions. These didn’t work for me. For example, I had no luck installing sudo apt-get install bzr-fastimportInstead I found https://design.canonical.com/2015/01/converting-projects-between-git-and-bazaar/.

Need ~/.bazaar/plugins

If plugins folder not there, cd ~/.bazaar; mkdir plugins

cd ~/.bazaar/plugins


bzr branch lp:bzr-fastimport fastimport

cd ~/projects/SOFA/sofastatistics/sofa.repo/sofa.main/

git init

bzr fast-export --plain . | git fast-import

gitk --all

YES! It’s all there

Archive .bzr in case

USER is launchpad-p-s in my case (yes – a strange choice which made sense at the time)

PROJECT is sofastatistics

So as per
[url "git+ssh://USER@git.launchpad.net/"]
insteadof = lp:

added the following to ~/.gitconfig

[url "git+ssh://launchpad-p-s@git.launchpad.net/"]
insteadof = lp:

Note – if not using lp: etc I had trouble with my ssh key – possibly something to do with confusion between user g and user launchpad-p-s.

I own my own project so to implement git remote add origin lp:PROJECT I ran:

git remote add origin lp:sofastatistics

Note: would only work if insteadof setting added to ~/.gitconfig as described earlier https://help.launchpad.net/Code/Git

Otherwise I would have to git remote add origin git+ssh://git.launchpad.net/sofastatistics

As per https://launchpad.net/PROJECT/+configure-code. I.e.

https://launchpad.net/sofastatistics/+configure-code

Confirmed by making a check folder then cloning the code in: git clone git://git.launchpad.net/~launchpad-p-s/sofastatistics

Messed up Thunderbird folders – sharks circling

We’ve all done it – messed up something so badly while trying to do something clever we’d sell our first-born just to get back to where we were (see https://xkcd.com/349/). And if we succeed at merely restoring the status quo, we’re pitifully grateful.

It all started with trying to find some lost photos that Shotwell couldn’t find the originals of. Presumably they had been linked to only and then the originals deleted leaving only the reference and the thumbnail behind. So here was the plan:

  1. Identify all email attachments which are images
  2. See if any of them have the same name as the missing images
  3. Open email based on date and sender to recover image

The good news is the plan worked for lots of the missing photos. Thanks to Python3, import mailbox, and import email. The bad news was when I opened Thunderbird the next day. The folder I had been working on was missing. So in addition to my missing photos I also had seemingly lost 1.8GB of emails.

tldr; 1) Close TB; 2) rename the missing folder in your file system and delete the .msf version; 3) Open TB; 4) Close TB; restore original name; 5) Open TB – success? Inspired by http://kb.mozillazine.org/Empty_folders

Now back to the original problem. But I should probably run a full backup first.

Simple flask app on heroku – all steps (almost)

Note – instructions assume Ubuntu Linux.

See Getting Started with Python on Heroku (Flask) for the official instructions. The instructions below tackle things differently and include redis-specific steps.

Don’t need postgresql for my app even though needed for heroku demo app. Using redis for simple key-value store.

Main reason for each step is indicated in bold at start. There are lots of steps but there are lots of things being achieved. And each purpose only requires a few steps so probably hard to streamline any further.

  1. APP & BEST PRACTICE
    >> sudo apt-get install python3 python3-pip python-virtualenv git ruby redis-server redis-tools
  2. HEROKU
    Get free heroku account
  3. HEROKU
    Install heroku toolbelt Heroku setup. Sets up virtualenvwrapper for you too (one less thing to figure out)
  4. HEROKU
    Do the once-ever authentication
    >> heroku login
  5. APP
    Make project folder e.g.
    >> mkdir ~/projects/myproj
  6. APP
    >> cd ~/projects/myproj
  7. HEROKU
    >> echo “web: python main.py” > Procfile
  8. HEROKU & BEST PRACTICE
    >> git init
  9. HEROKU & BEST PRACTICE
    >> mkvirtualenv sticky

    So requirements for specific project can be separated from other project – lets heroku identify actual requirements. Normally “workon sticky” thereafter; deactivate to exit virtual env

  10. APP
    >> pip3 install flask
    Note – installed within virtualenv
  11. HEROKU
    Save the following as requirements.txt – needed by heroku so it knows the dependencies. Update version of redis as appropriate. gunicorn is a better approach than the flask test server
    flask
    gunicorn
    redis==2.10.3
  12. HEROKU
    So we can use Python 3.4 instead of the current default of 2.7:
    >> echo “python-3.4.3” > runtime.txt
  13. APP & HEROKU

    Make a toy app to get started from.

    Note – modify the standard demo flask app to add a port to ease eventual heroku deployment. Otherwise the app will fail because of a problem with the port when running

    heroku ps:scale web=1

    Starting process with command `python main.py`
    ...
    Web process failed to bind to $PORT within 60 seconds of launch

    Here is an example (may need updating if flask changes):

    import os
    from flask import Flask
    app = Flask(__name__)

    @app.route("/")
    def hello():
        return "Hello World!"

    if __name__ == "__main__":
        port = int(os.environ.get("PORT", 33507))
        app.run(host='0.0.0.0', port=port)

  14. BEST PRACTICE
    >> deactivate
  15. Make a module to make it easier to work with redis – let’s call it store.py:

    import os
    import urllib
    import redis

    url = urllib.parse.urlparse(os.environ.get('REDISTOGO_URL',
        'redis://localhost:6379'))
    redis = redis.Redis(host=url.hostname, port=url.port, db=0,
        password=url.password)

    We can then use redis like this:
    from store import redis

  16. APP
    Keep building app locally. The following is good for redis: Redis docs. And flasks docs are always good: Flask Docs – Minimal Application
  17. HEROKU & BEST PRACTICE

    Before deploying to production:

    1. Update git otherwise you’ll be deploying old code – heroku uses git for deployment
    2. set app.debug to False (although no rush when just getting started and not expecting the app to get hit much)
    3. probably switch to gunicorn sooner or later (will need to change ProcFile to
      web: gunicorn main:app --workers $WEB_CONCURRENCY
      )
    4. Example nginx.conf:

      # As long as /etc/nginx/sites-enable/ points to
      # this conf file nginx can use it to work with
      # the server_name defined (the name of the file
      # doesn't matter - only the server_name setting)
      # sudo ln -s /home/vagrant/src/nginx.conf ...
      #     ... /etc/nginx/sites-enabled/myproj.com
      # Confirm this link is correct
      # e.g. less /etc/nginx/sites-enabled/myproj.com

      server {
          listen 80;
          server_name localhost;

          location /static { # static content is

              # handled directly by NGINX which means
              # the nginx user (www-data) will need
              # read permissions to this folder
              root /home/vagrant/src;

          }

          location / { # all else passed to Gunicorn to handle
              # Pass to wherever I bind Gunicorn to serve to
              # Only gunicorn needs rights to read, write,
              # and execute scripts in the app folders
              proxy_pass http://127.0.0.1:8888;
          }
      }

    5. Example gunicorn.conf
      import multiprocessing

      bind = "127.0.0.1:8888" # ensure nginx passes to this port
      logfile = "/home/vagrant/gunicorn.log"
      workers = multiprocessing.cpu_count() * 2 + 1

  18. HEROKU
    >> heroku create

    Should now be able to browse to the url supplied as stdout fom command e.g.
    https://not-real-1234.herokuapp.com/. Note – not working yet – still need to deploy to new app

    >> git push heroku master

    Must then actually spin up the app:

    >> heroku ps:scale web=1

    A shortcut for opening is

    >> heroku open

  19. HEROKU
    Add redis support (after first deployment – otherwise

    ! No app specified.
    ! Run this command from an app folder or specify which app to use with --app APP.
    )
    >> heroku addons:create redistogo

    Note – need to register credit card to use any add-ons, even if free ones. Go to https://heroku.com/verify

Some other points: when developing on a different machine, I needed to supply my public key to heroku from that other machine (Permission denied (publickey) when deploying heroku code. fatal: The remote end hung up unexpectedly).

heroku keys:add ~/.ssh/id_rsa.pub

And the full sequence for upgrading your app after the prerequisites have been fulfilled is:

  1. git commit to local repo
  2. Then git push to heroku
  3. Then run heroku ps:scale web=1 again

And I had a problem when I switched from Python 2 to 3 with redis – my heroku push wouldn’t work. By looking at the logs (>> heroku logs –tail) I found that import imap wouldn’t work and searching on that generally found I needed a newer version of redis than I had specified foolishly in requirements.txt.

Complex good – complicated bad

In the Zen of Python we are taught that complex is better than complicated. Which is fair enough if we understand the terms as follows:

It is ok if something is complex so long as it is not complicated.

complex: composed of many interconnected parts; compound; composite

complicated: difficult to analyze or understand

Complex vs Complicated

Any decent web framework is going to be a bit complex becauise of all the moving parts it has to handle. But it should make sense and be logically structured enough to avoid being overly complicated.

Eclipse and PyDev on Utopic

I upgraded to Utopic (Utopic Unicorn a.k.a 14.10) and eclipse wouldn’t complete loading anymore. Solution:

Download latest plain vanilla Eclipse from the standard downloads page. And feel free to donate something too.

sudo su

chown -R root:root /home/username/eclipse && mv /home/username/Downloads/eclipse /opt

ln -s /opt/eclipse/eclipse /usr/local/bin/eclipse && exit

Start by running:

eclipse

It didn’t even break PyDev so my luck’s finally turning ;-).

https://www.tumblr.com/search/install+eclipse+ubuntu

IDLE3 as default for py files on Ubuntu

Yes – I know, there are better alternatives to IDLE out there, but I am used to it for quick and dirty changes to python files (I use eclipse + pydev for more serious work). And I am increasingly making the switch to Python 3. So when I double click on a py file, odds are I want to open it with IDLE for Python 3 not Python 2.

Start by making sure you have a desktop file like the following:

gksudo gedit /usr/share/applications/idle-python3.4.desktop

[Desktop Entry]
Name=IDLE (using Python-3.4)
Comment=Integrated Development Environment for Python (using Python-3.4)
Exec=/usr/bin/idle-python3.4
Icon=/usr/share/pixmaps/python3.4.xpm
Terminal=false
Type=Application
Categories=Application;Development;
StartupNotify=true

Then make the desktop entry the default for python files:

gedit ~/.local/share/application/mimeapps.list

[Default Applications]
text/w-python=idle-python3.4.desktop

Note – no trailing semi-colon.

And in Linux Mint:

Linux Mint:

ls /usr/share/applications/

identify appropriate .desktop file

gedit /usr/share/applications/defaults.list

add the appropriate .desktop file reference at the front of the python line as appropriate.

Saddest Programming Concept Ever

Python has spoiled me for other languages – I accept that – but I still wasn’t fully prepared for some of the horrors I discovered in Javascript. Which made the satiric article by James Mickens, “To Wash It All Away“, all the more enjoyable. Here is a slice I especially liked:

Much like C, JavaScript uses semicolons to terminate many kinds of statements. However, in JavaScript, if you forget a semicolon, the JavaScript parser can automatically insert semicolons where it thinks that semicolons might ought to possibly maybe go. This sounds really helpful until you realize that semicolons have semantic meaning. You can’t just scatter them around like you’re the Johnny Appleseed of punctuation. Automatically inserting semicolons into source code is like mishearing someone over a poor cell-phone connection, and then assuming that each of the dropped words should be replaced with the phrase “your mom.” This is a great way to create excitement in your interpersonal relationships, but it is not a good way to parse code. Some JavaScript libraries intentionally begin with an initial semicolon, to ensure that if the library is appended to another one (e.g., to save HTTP roundtrips during download), the JavaScript parser will not try to merge the last statement of the first library and the first statement of the second library into some kind of semicolon-riven statement party. Such an initial semicolon is called a “defensive semicolon.” That is the saddest programming concept that I’ve ever heard, and I am fluent in C++.