New Project to Develop Open Source Statistics Program

SOFA (Statistics Open For All), is a new, open source statistics program currently under development (see http://www.sofastatistics.com). The development version of SOFA can already connect directly to a range of different databases and lets users display results in an attractive format ready to share or put in a spread­sheet. A packaged release is being prepared.

SOFA focuses on being easy to use rather than on providing a comprehensive array of tests. So while SOFA won’t replace sophisticated statistics systems like R, there is a good chance it will do what a large number of people need and do it well.

SOFA is written in python, and will work on PCs, Macs, and Linux computers (e.g. Ubuntu).

People interested in using a free statistics program are welcome to contact the project (http://sofastatistics.com/contact.php).

wxWebkit on Intrepid

wxWebKit is a very important cross-platform control for the wxPython GUI toolkit. It enables you to display complex HTML pages (including live from the web) e.g. tabular output or help files.

Fortunately, wxWebkit can be made to work on Ubuntu Intrepid. Indeed, I am editing this blog item right now from within a wxPython wxWebKit control :-). But it is not a straight forward process at this time and it is not properly installed as such. But you can start testing it after following the steps suggested below.

[Update – there is now a deb file you can use thanks to Christoph Willing. Use the following commands:
# NB the next line is one long command line
sudo wget http://www.vislab.uq.edu.au/debuntu/sources.list.d/intrepid.list -O /etc/apt/sources.list.d/uqvislab.list

sudo apt-get update

sudo apt-get install python-webkitwx
]

I was never able to get wxWebKit to compile and work following the instructions here http://wxwebkit.wxcommunity.com/index.php?n=Main.Requirements. I expect this will change in the future.

Here are some instructions that have worked for me on more than one machine. My deepest thanks go to Christoph Willing and Kevin Ollivier.

Get the correct wx swig deb from http://www.vislab.uq.edu.au/debuntu/intrepid/swigwx1.3_1.3.29_i386.deb and install it.

Get a patched bakefile deb from http://www.vislab.uq.edu.au/debuntu/intrepid/bakefile_0.2.5-1_i386.deb and install it.

Get intrepid_prereqs and i_files.tar.gz from http://www.vislab.uq.edu.au/research/accessgrid/software/debuntu/wxwebkit/.

Run the prereqs file from the folder you have stored it in e.g. Desktop:

cd ~/Desktop
sudo bash intrepid_prereqs

Then extract the i_files folder from i_files.tar.gz and put it (the folder with its contents, not just the contents) under: /usr/include/wx-2.8/wx/wxPython

OK – the preparation is done. Now to get the source by checking out the subversion repository:

cd ~
svn checkout http://svn.webkit.org/repository/webkit/trunk WebKit

Then cd into ~/WebKit/WebKitTools/Scripts and run:
./build-webkit --wx --wx-args="wxgc wxpython" 2>&1 | tee op

NB if your build fails for some reason and you want to run it again, run clean first:
./build-webkit --wx --wx-args="wxgc wxpython" --clean

The file op will be made in ~/WebKit/WebKitTools/Scripts so you can check what happened. If your build is successful there will be a clear message to that effect at the end. Regrettably, an absence of error messages is not the presence of success ;-).

To test wxWebKit you will need to save the following file in the ~/WebKit/WebKitBuild/Release folder (this is a minor variation of the standard test file to sidestep some bugs).

#!/usr/bin/python

# Copyright (C) 2007 Kevin Ollivier All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY APPLE COMPUTER, INC. ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE COMPUTER, INC. OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import wx
import webview

class TestPanel(wx.Panel):
def __init__(self, parent, log, frame=None):
wx.Panel.__init__(self, parent, -1,
style=wx.TAB_TRAVERSAL|wx.CLIP_CHILDREN|wx.NO_FULL_REPAINT_ON_RESIZE)
self.log = log
self.current = "http://wxPython.org/"
self.frame = frame

if frame:
self.titleBase = frame.GetTitle()

sizer = wx.BoxSizer(wx.VERTICAL)
btnSizer = wx.BoxSizer(wx.HORIZONTAL)

self.webview = webview.WebView(self, -1)

btn = wx.Button(self, -1, "Open", style=wx.BU_EXACTFIT)
self.Bind(wx.EVT_BUTTON, self.OnOpenButton, btn)
btnSizer.Add(btn, 0, wx.EXPAND|wx.ALL, 2)

btn = wx.Button(self, -1, "<--", style=wx.BU_EXACTFIT) self.Bind(wx.EVT_BUTTON, self.OnPrevPageButton, btn) btnSizer.Add(btn, 0, wx.EXPAND|wx.ALL, 2) btn = wx.Button(self, -1, "-->", style=wx.BU_EXACTFIT)
self.Bind(wx.EVT_BUTTON, self.OnNextPageButton, btn)
btnSizer.Add(btn, 0, wx.EXPAND|wx.ALL, 2)

btn = wx.Button(self, -1, "Stop", style=wx.BU_EXACTFIT)
self.Bind(wx.EVT_BUTTON, self.OnStopButton, btn)
btnSizer.Add(btn, 0, wx.EXPAND|wx.ALL, 2)

btn = wx.Button(self, -1, "Refresh", style=wx.BU_EXACTFIT)
self.Bind(wx.EVT_BUTTON, self.OnRefreshPageButton, btn)
btnSizer.Add(btn, 0, wx.EXPAND|wx.ALL, 2)

txt = wx.StaticText(self, -1, "Location:")
btnSizer.Add(txt, 0, wx.CENTER|wx.ALL, 2)

self.location = wx.ComboBox(self, -1, "",
style=wx.CB_DROPDOWN|wx.PROCESS_ENTER)

self.Bind(wx.EVT_COMBOBOX, self.OnLocationSelect, self.location)
self.location.Bind(wx.EVT_KEY_UP, self.OnLocationKey)
self.location.Bind(wx.EVT_CHAR, self.IgnoreReturn)
btnSizer.Add(self.location, 1, wx.EXPAND|wx.ALL, 2)

sizer.Add(btnSizer, 0, wx.EXPAND)
sizer.Add(self.webview, 1, wx.EXPAND)

self.webview.LoadURL(self.current)
self.location.Append(self.current)

# self.webview.Bind(webview.EVT_WEBVIEW_STATE_CHANGED, self.OnStateChanged)

self.SetSizer(sizer)

def OnStateChanged(self, event):
statusbar = self.GetParent().GetStatusBar()
if statusbar:
if event.GetState() == webview.WEBVIEW_STATE_NEGOTIATING:
statusbar.SetStatusText("Contacting " + event.GetURL())
elif event.GetState() == webview.WEBVIEW_STATE_TRANSFERRING:
statusbar.SetStatusText("Loading " + event.GetURL())
elif event.GetState() == webview.WEBVIEW_STATE_STOP:
statusbar.SetStatusText("")
self.location.SetValue(event.GetURL())
self.GetParent().SetTitle("wxWebView - " + self.webview.GetPageTitle())

def OnLocationKey(self, evt):
if evt.GetKeyCode() == wx.WXK_RETURN:
URL = self.location.GetValue()
self.location.Append(URL)
self.webview.LoadURL(URL)
else:
evt.Skip()

def IgnoreReturn(self, evt):
if evt.GetKeyCode() != wx.WXK_RETURN:
evt.Skip()

def OnLocationSelect(self, evt):
url = self.location.GetStringSelection()
self.webview.LoadURL(url)

def OnOpenButton(self, event):
dlg = wx.TextEntryDialog(self, "Open Location",
"Enter a full URL or local path",
self.current, wx.OK|wx.CANCEL)
dlg.CentreOnParent()

if dlg.ShowModal() == wx.ID_OK:
self.current = dlg.GetValue()
self.webview.LoadURL(self.current)

dlg.Destroy()

def OnPrevPageButton(self, event):
self.webview.GoBack()

def OnNextPageButton(self, event):
self.webview.GoForward()

def OnStopButton(self, evt):
self.webview.Stop()

def OnRefreshPageButton(self, evt):
self.webview.Reload()

class wkFrame(wx.Frame):
def __init__(self):
wx.Frame.__init__(self, None, -1, "WebKit in wxPython!")

self.panel = TestPanel(self, -1)
# self.panel.webview.LoadURL("http://www.wxwidgets.org/")
self.CreateStatusBar()

class wkApp(wx.App):
def OnInit(self):
self.webFrame = wkFrame()
self.SetTopWindow(self.webFrame)
self.webFrame.Show()

return True

app = wkApp(redirect=False)
app.MainLoop()

Assuming you named that file my_wxwebkit_test.py, cd into ~/WebKit/WebKitBuild/Release and run

export LD_LIBRARY_PATH=`pwd`

so that the library files can be found in spite of their unexpected location (because we haven’t properly installed our files into the standard locations). Then run

python my_wxwebkit_test.py

and enjoy :-).

Better Python Console Gedit Plugin

Just came across Better Python Console – which is a plugin for Gedit, the main text editor in Ubuntu. Think of it as being like IDLE. I use Eclipse with PyDev and the PyDev extension for heavy development and the console for anything quick and dirty.

Get the plugin, plus some basic info, here –
http://live.gnome.org/Gedit/Plugins/BetterPythonConsole/Walkthrough

Remember, to work successfully, the betterpythonconsole needs the plugin file (“betterpythonconsole.gedit-plugin”) copied into the /usr/lib/gedit-2/plugins folder. The betterpythonconsole folder (containing 4 or so scripts only) needs to be there as well.

Eclipse in Ubuntu with PyDev and Pydev Extension

Installing eclipse and pydev is done from synaptic and is straight-forward. Installing pydev extensions happens from Help > Software Updates > Find and Install > Search for new features to install. Have a remote site of http://pydev.sourceforge.net/updates/

NB we only want pydev, not the optional pydev mylyn integration (if you get nagged for things about mylyn you forgot to uncheck that part 😉 ).

When installing plugins, start eclipse from the terminal as root
sudo eclipse
otherwise you lack the rights to add certain folders etc. See http://ubuntuforums.org/showthread.php?t=568894.

Then set interpreter for python: Window > Preferences > PyDev > Interpreter – Python > New
to /usr/bin/python2.5 etc.

Kexi review (Access bruiser but not Access Killer yet)

After my disappointing experiences with OpenOffice Base last year I was worried that Kexi wouldn’t be much good. But it was – within its limits. The interface seemed excellent and intuitive and even though it wasn’t anywhere near as familiar to me as the Access interface (many thousands of hours) I found myself getting quite fast at it quite easily. My real issues relate to scripting. It was exciting to see that I could use python with Kexi as my scripting language but it is not clear yet how to add lots of sophisticated functionality e.g. when I update this field I want to check 3 other values and some data in the database and enable, disable some widgets and produce some messageboxes. This may well be easy (someone please correct me) but I don’t have lots of time to gamble on this if it is not ready. If I can get that sorted (including lots of good documentation on how to control and read from the widgets using python), I will take the next step and check out Kugar – the reporting part of Kexi. If all these parts work, and there is at least one decent book on Kexi in English, we might actually have an Access Killer. Kexi + MySQL + Kugar could be a winning combination for rapid application development – especially by non-programmers.

[update version 2.2 is released and looks promising – http://www.koffice.org/news/koffice-2-2-released/]

PyInstaller Round 2

Round 1 was nearly 18 months ago (PyInstaller1.2) and enabled me to successfully deploy a GUI application as a folder (although the XP button styles weren’t quite right – solved below).

Round 2 lasted about 8 hours and involved pyInstaller 1.3 on XP with python 2.5, wxPython 2.8.7.1, and win32com 2.1.0. If I’d had these tips at the beginning it would have taken 20 minutes max 🙁 :

  1. Don’t name any folders “python”. It shouldn’t matter and usually it doesn’t but sometimes it does. Use mypy or something similar instead. That could have saved 6 hours right there 😉 . If a module was referred to as python.msaccess, for example, the python part would be treated as a module and expected to have an __init__ method. NB everything worked fine except when it was processed into an executable by pyinstaller.
  2. When testing the build process of a spec file, set console to True (or 1) and debug to True (or 1).
  3. When running the build process from a batch file, add the command
    pause
    as the final line. Then you can see all the errors, if any, and have a chance at fixing them. Can also add something like raw_input(“Hit Enter to continue”) at the end of Config.py etc to ensure files like upx are configured successfully.
  4. To identify problems with the executable, run it from a batch file, and include pause as the final command on its own line. NB make the exe with debug=True and Console=True for these steps (revert when issues fixed).
  5. Set console=False (or 0) so that XP buttons look more attractive than the older, rectangular form.
    Rounded XP buttons
                     vs
    Rectangular XP buttons

    XP and wxPython and XP buttons etc
  6. If doing a single file executable, remember to add a.binaries after a.scripts
    a.scripts, a.binaries,
    and set exclude_binaries=False
  7. It really is quite easy to edit a spec file – it is just python after all.
  8. Scripts should be without comma separation and with single backslashes in the makespec process (Batch file), and with commas and double backslashes in the python spec file.
  9. Icon images etc can be kept in the same folder as the executable for ease of portability. Inside the script they should not have an absolute path.
  10. Start with a simple HelloWorld test as per the very helpful http://www.thescripts.com/forum/thread579554.html to check all systems are working before tackling a more complex, real-world example which will require getting down and dirty with the spec document.
  11. python “…Build.py” “…..” won’t work on my system – need “C:\Python25\python.exe” “etc …”

There is excellent documentation available at PyInstaller Manual

UPDATE: got a mysterious error on a script that worked well until it was processed by pyinstaller. Without all the gruesome details, it was because I had the wxPython application set to not redirect its output AND I had some print statements tucked away in some code. At some point (when the printed output reached 4096 bytes?) up popped an [Error 9] Bad file descriptor error. In future, diagnose by making an output.txt file and setting redirect to true. What is in the output file?

UPDATE: Use the -w parameter to skip the DOS window during execution.

UPDATE: Pyinstaller 1.5 with Python 2.6 (Round 3)

Sluggish MySQL because of ContentProtect

A program I wrote using python and MySQL ran much, much slower on my development computer than on the client’s computer. And their computer seemed about the same spec as mine. I upgraded from python 2.4 to 2.5. Nope. Was it the RAM. Nope – they had half a Gb to my 1Gb. Was it the hard drive? The L1 and L2 cache? Who could tell.

A friend (a very good friend) suggested it might be a MySQL setting. I was looking around in there when I noticed the named pipe alternative to the TCP/IP protocol. TCP/IP. Hmmmm I had sort of noticed the wireless network icon flickering in the taskbar when running MySQL. Could it be ….. ContentProtect!?

I reconfigured MySQL to not use TCP/IP:

connMySQL = MySQLdb.connect(host=maint.DB_HOST, user=maint.DB_USER,
passwd=maint.DB_PWD, db=maint.DB_DATABASE,
named_pipe=maint.NAMED_PIPE)

http://mysql-python.sourceforge.net/MySQLdb.html

… and the program suddenly flew! I could barely keep up with the screen output. Unfortunately my favourite MySQL manager, SQLyog, did not support named pipes.

http://www.webyog.com/forums//index.php?s=035d8d492234b760704ed30c35331bdd
&showtopic=2314&view=findpost&p=10917

But if I switched on support for TCP/IP the code relying on named pipes stopped working. I would get “error 2017: can’t open named pipe to host”.

The answer was to manually add one line to my my.ini file – namely:
enable-named-pipe

Now I could use the named pipes where possible, thus sidestepping the numbing effect of ContentProtect, while still being able to use tools that required or expected TCP/IP.

A useful reference is on:
http://dev.mysql.com/doc/refman/5.0/en/can-not-connect-to-server.html
#can-not-connect-to-server-on-windows

Running MySQL scripts (.sql files) from python

This should have been more obvious. All I wanted was to run a simple script e.g. myscript.sql in MySQL from python in Windows.

Some things worked fine from the DOS prompt but failed from within python (and RUN for that matter).

Here is the answer in the form of a simple function (NB to get your indentation right !):

def run_sql_script(scriptname):
    "Run a script in MySQL"
    import subprocess
    import time
    #this next line is too long for this blog but you will need it on one line to run
    args = "\"C:\\Program Files\\MySQL\\MySQL Server 5.0\\bin\\mysql.exe\" 
        -h%s -u%s -p%s --database=databasename < C:/Projects/projectname/3_scripts/%s" 
        % (DB_HOST, DB_USER, DB_PWD, scriptname)
    #print args
    child = subprocess.Popen(args=args, shell=True, executable="C:\\windows\\system32\\cmd.exe")
    #need to check whether finished or not every so often
    i = True
    elapsed = 0
    while i == True:
        time.sleep(10)
        elapsed = elapsed + 10
        if child.poll() == None:
            elapsed_mins = float(elapsed)/60
            print "%.2f minutes elapsed running %s" % (elapsed_mins, scriptname)
        else:
            print "Finished running script " + scriptname
            i = False

Key points: can't use call, must use full Popen and explicitly name the shell (and use it!).

NB scripts can run for a long time e.g. 30 minutes so it is a good idea to make the function keep the user informed.

If you want to kill it, open Task Manager and kill mysql.exe.

In Windows there are apparently some horrible compromises to be made when doing some simple things. See ...adventures-in-python-launching-subprocesses/

There is an alternative approach but it doesn't seem to ever end the subprocess:

#args = "cmd /k \"C:\\Program Files\\MySQL\\MySQL Server 5.0\\bin\\mysql.exe\" -h%s -u%s -p%s --database=databasename < C:\\Projects\\projectname\\test.sql" % (DB_HOST, DB_USER, DB_PWD) print args #http://www.realtechnews.com/posts/2777 re: the /k !!!!!

Extracting text from PDFs using python and pdftotext

The answer was reasonably simple but it was very gruelling to obtain ;-). Firstly, the false leads:

1) Prescript proved to be an out-of-date, unsupported waste of time.

2) Ghostscript has never had much emphasis on user-friendliness or documentation. Was hoping to use its pdf2ascii functionality. Can’t remember precisely what happened but I think it only generated error messages for me.

3) pyPdf looks promising (the text extract functionality is still quite recent) but it didn’t get the text in the correct order – should probably revisit it later:

import pyPdf
"""http://pybrary.net/pyPdf/"""

def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
    # Iterate pages
    for i in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    #content = " ".join(content.replace("\xa0", " ").strip().split())
    return content

print getPDFContent("pdfs/test.pdf")

—-

But I repeat – watch this option for the future. The developer is right onto it, as can be seen from the comment for the extractText method from the pdf.py module:

# Locate all text drawing commands, in the order they are provided in the
# content stream, and extract the text. This works well for some PDF
# files, but poorly for others, depending on the generator used. This will
# be refined in the future. Do not rely on the order of text coming out of
# this function, as it will change if this function is made more
# sophisticated.
#

# Stability: Added in v1.7, will exist for all future v1.x releases. May
# be overhauled to provide more ordered text in the future.
# @return a string object

http://pybrary.net/pyPdf/

4) pdftotext – bingo

Install pdftotext (a breeze in Ubuntu via Synaptic). In Windows refer to the brilliant, user-friendly documentation of Jeff Porter www.ire.org/training/nettour/pdf/PDFTOTEXT.pdf for step-by-step instructions.

http://www.foolabs.com/xpdf/download.html

pdftotext is part of XPDF – “Xpdf is an open source viewer for Portable Document Format (PDF) files. (These are also sometimes also called ‘Acrobat’ files, from the name of Adobe’s PDF software.) The Xpdf project also includes a PDF text extractor, PDF-to-PostScript converter, and various other utilities.

Xpdf runs under the X Window System on UNIX, VMS, and OS/2. The non-X components (pdftops, pdftotext, etc.) also run on Win32 systems and should run on pretty much any system with a decent C++ compiler. ” http://www.foolabs.com/xpdf/about.html

XPDF is GPL2

The python code is barely there but you can see the possibilities:

import os
os.system(“C:\\ … xpdf\\pdftotext -layout C:\\ … xpdf\\test.pdf”)
raw_input(“Finished”)

The text came out in the correct order thanks to the -format option.