TypeError: expected an object with the buffer interface
For the past few months, I've been 'defensive coding' wrt to Python 3.x; basically, if there is a construct that:
- will be broken under 3.x
and
- the alternate (which is not broken) is supported under 2.6+
I've been trying to use that instead.
Here is a "3K-ism" got me, that was completely unanticipated. I encountered it when running my Python environment description script under Python 3.x.
It seems that if I try to use any string methods on strings returned from a subprocess (i.e., the stdout buffer contents when POpen is called with stdout=subprocess.PIPE), I get a:
TypeError: expected an object with the buffer interface
A minimal but complete code fragment that generates this error is:
#! /usr/bin/env python import subprocess p = subprocess.Popen(["cat", "test.txt"], shell=False, stdout=subprocess.PIPE) p.wait() lines = p.stdout.read() lines.replace('e', 'x')
This problem does not occur if the file is read directly using a file object. The reason for the problem is clear when printing the contents of the stdout buffer:
b'The quick brown fox jumps over the lazy dog.\n'
As can be seen, the result consists of byte string or sequence of bytes, rather than being unicode string. It seems that subprocess.Popen returns all its results as bytes, and client code now has to take the extra step of casting them into strings to use them as such. So, for example, this works perfectly:
#! /usr/bin/env python import subprocess p = subprocess.Popen(["cat", "test.txt"], shell=False, stdout=subprocess.PIPE) p.wait() lines = str(p.stdout.read(), "utf-8") lines.replace('e', 'x')
While the above works fine in Python 3.x, it is broken under 2.6 and below, as str() does not accept the second argument specifying the encoding.
So the most generic way to do it is to use decode("utf-8") of the byte string object:
#! /usr/bin/env python import subprocess p = subprocess.Popen(["cat", "test.txt"], shell=False, stdout=subprocess.PIPE) p.wait() lines = p.stdout.read().decode("utf-8") lines.replace('e', 'x')
I am not sure if I like the extra layer of complexity, however trivial it may be ...
feed
Comments
9 comments postedThank you very much for your analysis of the problem and proposed solutions!
The issue bit me when attempting to find substrings in the output of readline() and readlines().
thanks, that's helpful!
This was super-helpful to me too! Thanks!
it was so helpful.
thanks
Thanks, just what I needed :)
parse_subprocess_pipes()" that wraps up the byte to unicode string conversion as well as other post-processing might be the way to go, as it would reduce code clutter even further.Interesting: in Python 2.x I prefer s.decode('UTF-8') to unicode(s, 'UTF-8'), *especially* if "s" is a longer expression returning a string, like in your example.
Back to the subprocess module: because of complicated OS-level issues, reading from stdout is dangerous and could result in a dead-lock (the child process might be blocked trying to write a large amount of data into std*err*, while the parent blocks reading from the child's std*out*). You'd better use p.communicate().
Post new comment