Embed Python and Import Modules in C/C++

This is a post I owe a friend since ever. In one of our defunct hacklab meetings, we had an extensive talk about Python including many things I had never heard of like lists by compression. Yet, a topic that was left aside was embedding Python in C/C++ code, which we all knew was possible but we didn't know how. Since then, and maybe because of my happy experiences embedding Lua and QtScript in C++, I promised I'd get into that, and, using Python documentation, some blog posts and a lot of trial and error, got it working a couple of months ago. You can view a full code example in my most recently published project: QGLFlush, yet that code doesn't deal with some more advanced stuff like sharing objects between Python and C/C++, nor does this post which is not intended to be a tutorial. Also, this tutorial works for Python 2.6 and I haven't tested Python C API in Python 3.0 but I don't think there could have been many changes regarding that API.

For starters, the Python C API is extremely flexible and simple to use. You first initialize the Python engine using Py_Initialize() function, then you do whatever you need to do with Python, and finally you cleanup using Py_Finalize(). The following code is a minimal example of the overall approach

#include <Python.h>
int main()
{
Py_Initialize();
PyRun_SimpleString("print 'Hello World'");
Py_Finalize();
return 0;
}

Simple, huh? To compile and run that code, add the Python 2.6 include directory to the include path and add the python2.6 library. You'll obviously need development packages for Python in order to have the Python include directory. In my system, assuming the code was saved in main.c and the binary will be named PyTest, the compile command would be

gcc -o PyTest -I/usr/include/python2.6 -lpython2.6 main.c

But to be honest with the Python language, there's nothing very interesting to do with Python until you import modules. To do so, simply add a the import commands you need in the string to evaluate. For example, if I would like to calculate  the MD5 digest of my name, I would do something like this:

PyRun_SimpleString("import hashlibn"
"m=hashlib.md5()n"
"m.update('David E. Narváez')n"
"print m.hexdigest()");

Of course, as code gets more and more complex, you won't like to have all your code in a C string, so you may want to use the PyRun_SimpleFile() function to execute the contents of a file.

On the other hand, and what makes this post different from most of the other posts I've seen about embedding Python in C/C++, is that I was interested on importing a module from C/C++ and making it available in the Python environment. Why would I want to do that? Well, imagine you want to supply an embedded interpreter in your hashing C/C++ application, wouldn't it be comfortable for the user to be able to do hashing routines without bothering on importing the hashlib module? After all, the user will be importing that module each time he uses the Python environment supplied in your application.

So to do this, we must first understand what the import command really does in Python: besides doing the obvious (importing the module) it also creates an object in the Python environment that references the imported module. So despite of the fact that there's a good looking PyImport_ImportModule() function in the Python C API, it only does the first part of the job. You'll then need to register that module with an object name that will reference it. One other thing you'll need to know is what is a module added to (what we have, so far, referred as "the Python environment"): in fact, modules are imported into other modules, and since PyRun_SimpleString() runs everything under the __main__ module, we'll first need to get a reference to that module and import the hashlib module into it. In our case, to import the hashlib module from C/C++, we'd do this:

Py_Initialize();
PyObject * mainModule = PyImport_AddModule("__main__");
PyObject * hashlibModule = PyImport_ImportModule("hashlib");
PyModule_AddObject(mainModule, "hashlib", hashlibModule);
PyRun_SimpleString("m=hashlib.md5()n"
"m.update('David E. Narváez')n"
"print m.hexdigest()");
Py_Finalize();

Notice how we can remove the import hashlib instruction in the string to run, since that module was imported before. It's also interesting to notice that in the PyModule_AddObject() call we could have given any other name to refer to the hashlib module, something you most likely don't want to do in your code so that users don't get confused, but is still worth to mention.

That's all good, but then there are many other popular digest algorithms available in the hashlib module which you would use under python with code similar to this one:

from hashlib import md5, sha1
name = 'David E. Narváez'
m = md5()
s = sha1()
m.update(name)
s.update(name)
print m.hexdigest()
print s.hexdigest()

This brings us to the problem of implementing the from hashlib import md5, sha1 command using the Python C API which is still simple, yet a little more complex than just importing the module. What this command actually does is importing two members of the hashlib module (md5 and sha1) into the __main__ module and have them available through objects named the same way. The following commands will do that from C/C++:

PyObject * subModules = PyList_New(0);
PyList_Append(subModules, PyString_FromString("md5"));
PyList_Append(subModules, PyString_FromString("sha1"));
PyObject * hashlibImports = PyImport_ImportModuleEx("hashlib", NULL, NULL, subModules);
PyObject * md5Module = PyObject_GetAttr(hashlibImports, PyString_FromString("md5"));
PyObject * sha1Module = PyObject_GetAttr(hashlibImports, PyString_FromString("sha1"));
PyModule_AddObject(mainModule, "md5", md5Module);
PyModule_AddObject(mainModule, "sha1", sha1Module);

I must admit this piece of code looks a lot more alarming than the rest of the other examples, yet most of the complexity lies in the creation of a Python list that holds the name of the objects that will be imported from the hashlib module, not in the importing procedure itself. The string to be run by PyRun_SimpleString() won't include the from hashlib import md5, sha1 now that it was already added by the Python C API.

Finally, what about from hashlib import * command to import everything under the hashlib module? Well, the procedure resembles the one just exposed, but since we don't know exactly what are the names of the modules to be imported, we need to iterate through all of the members of the C object that references the hashlib module to import them, using their attribute names as names for the Python objects that will reference them in the Python code. Although iterating through the attribute names of an object sounds like a complex task in languages like C and C++, it's pretty simple to do so using the Python C API:

PyObject * hashlibModule = PyImport_ImportModule("hashlib");
PyObject * moduleAttributes = PyObject_Dir(hashlibModule);
Py_ssize_t pos = 0;
Py_ssize_t attrSize = PyList_Size(moduleAttributes);
while (pos < attrSize)
{
PyObject * attrValue = PyList_GetItem(moduleAttributes, pos);
PyModule_AddObject(mainModule, PyString_AsString(attrValue), PyObject_GetAttr(hashlibModule,attrValue));
pos++;
}

And that's it. Again, you can check QGLFlush for an example of how to import OpenGL modules in your Python interpreter or play around with these examples to get deeper into customizing the Python environment you wish to provide to the users of your application.