Handling templated types and basic type conversions.
Posted on April 25, 2014 by Arjun Comar

I’ve been hard at work on a set of bindings to the OpenCV library, especially its C++ API, and I’ve been incorporating them into an idiomatic Haskell library. In the course of doing so I’ve written a few small bits of code that are more general than are strictly necessary for an OpenCV binding, and I think others would find them useful. I’ve also not seen any other attempt at a library like this. I’m calling it Foreign.CPP for the time being, as a parallel to Foreign.C from base. The idea is to provide access to commonly used types within C++ libraries that are difficult to access directly from within Haskell. This post should also be interesting to anyone who’s ever thought about trying to access C++ from languages that don’t have a good C++ FFI but do have a good C FFI (read, most of them). Most of what I’m talking about is not at all particular to Haskell. You can find the library itself on Github, but keep in mind, it’s very much a work in progress!

Handling std::string

The easiest place to start was also the first problematic type I encountered. With the start of the 3.0 branch, OpenCV has dropped its own hand-rolled string type in favor of the more ubiquitous std::string. This is a positive change for the OpenCV library as a whole, but it poses a bit of a problem for my automatically generated bindings. The C bindings were easy to make work by simply adding #include <string>. But on the Haskell side, the type just isn’t provided anywhere I could find.

So I added it myself. I gave it a very simple interface at first. Foreign.C gives an interface to CString which is a thin veneer over Ptr CChar, or char* in C. Since std::string also provides a conversion to char* via the c_str() method, that seems like a good bridging point.

I’ve found the bindings-DSL library to be an easy and fast way to write bindings to C libraries, and I tend to make use of it when I can. The library provides a couple of nice utilities, but I mostly use it for the convenient foreign import declarations. Perhaps that’s a silly reason to include a dependency, but it feels like a light one, and I haven’t had reason to drop it yet.

So with that in mind, the first interface to std::string I wrote was (actually, it still is!) very straightforward.

#opaque_t stdstring             --this translates into a plain data declaration with no constructors
#ccall create_std_string, IO (Ptr <stdstring>)                  
#ccall cstring_to_std_string, CString -> IO (Ptr <stdstring>)   
#ccall std_string_to_cstring, Ptr <stdstring> -> IO CString

So all this gives us is a way to create a std::string from within Haskell, and a way to convert to and from CStrings. It turns out, we don’t need all that much more than this. You don’t actually want to work directly with this type in Haskell, since it makes you incompatible with the rest of the ecosystem. But now there’s a way to get in and get out when you need it. It turns out, you also need to know the length of the string to convert to/from Haskell Strings, but that’s in the same style as the above.

On the C side, this looks more or less the same. With one caveat anyway.

typedef std::string stdstring

extern "C" {
    stdstring* create_std_string() {
        return new stdstring;
    }
}

Woah, new! And we’re returning a raw pointer! This is awful C++ style! Yep, it really is.

Memory Management

Here’s what’s really going on. So far, I’ve been talking about interoperating with C++ from within Haskell, and that I’ve got this neat little library that makes it a little easier. That’s actually not anywhere near the whole story. The truth is, interoperating with C++ means first providing a translation layer from C++ to C and then from C into Haskell. Each of these abstractions and translation layers is imperfect and prone to leaking. You can see the first bit of leakage here.

That’s a pointer to a heap allocated string being returned. And it’s allocated with new. That means it needs to be released with delete when we’re done with it. The C++ idiom is to never work directly with a pointer like this; it’s almost always a better idea to work with automatically mananged memory by stack allocating whenever possible. If that’s unworkable and you really need to heap-allocate, the correct way to handle this is to make use of a smart pointer. The STL provides one, Boost provides one, and OpenCV has its own (and it’s own memory management scheme).

Except smart pointers are an unworkable solution in C for a vast number of reasons, the first of which is a lack of templating. So you immediately have to give up type safety of any kind, which means you need to know the type inside the smart pointer, and there’s no compiler inference to help you. And you automatically lose anything like a destructor to automatically free the memory when the pointer goes out of scope. Well, sort of. You could keep that by leaving that in the C++ subset of the code base, and assuming its existence in the C subset.

But even that doesn’t translate into Haskell where memory is managed by a garbage collector provided by the runtime. You can’t rely on values going out of scope and deallocating from the stack – hell, anything declared on the stack is already gone by the time Haskell gets the value. Then once you have the value, if you’ve stuck it in a smart pointer, you’ve got this unityped dynamic value that relies on its own implementation of reference counting. Even that might be workable with some Dynamic and Typeable magic. But the fact that the garbage collector is going to use free on the pointers once they’re garbage collected makes this even more difficult. Maybe someone smarter than me can come up with a solution to this problem that preserves the C++-ism of smart pointers, but I dropped them immediately as an unworkable path.

Instead, I’ve made the raw pointers available in Haskell and rely on the garbage collector to manage the memory for me. Of course, the freeing behavior is now decidedly wrong, but it’s fixable. I’ll cover how in a later post, since this one has digressed far enough, and it’s time to get back to the main topic.

So in sum, we drop any idea of managing memory from the C or C++ sides and rely on the Haskell runtime to deal with it instead. Hence we force heap allocation and return a raw pointer, making absolutely everybody cringe.

Templated Types in C++

Ok, so now we’ve got a somewhat usable std::string within Haskell. That’s nice, but it’s far from the only type from the STL that we need to work with. The STL being what it is, the Standard Template Library, many of the most important types we need to work with are templated. This is difficult because while Haskell deals easily with these polymorphic types, the C++ implementation of them is irritating at best.

In short, the C++ compiler generates a new monomorphic function for every use of a templated function and therefore what amounts to a new type for every use of a templated type. This is done to maintain backwards compatibility with C. Since C also lacks overloaded functions, these generated functions can’t be compiled with the same name either, and so the compiler mangles the names of the function symbols before introducing them into compiled archive. This is the entire source of the difficulty of interoperating with C++ in the first place, and the whole reason we had to introduce a C wrapper around calls to C++ functions.

So when we want to work with a templated type, we have the difficult proposition of replicating the work the C++ compiler does. We have to refer to the monomorphic type, we have to generate the appropriate functions for that type, and we need to mangle the function names to get around the lack of overloading within C. In a future post, I’ll discuss how we can get back a lot of the features we lose by doing this within Haskell, via type classes and type families.

std::vector

So the canonical type we want access to is of course std::vector. It’s used everywhere, and even OpenCV with its plethora of arraylike types makes use of std::vector on a regular basis. Haskell has its own Vector type and some kind of conversion between the two would be nice.

Thankfully, a strong analogy exists between the std::string <--> String (or Text or Bytestring if you prefer) situation and the std::vector <--> Vector situation we’d like to tackle now. Just like character arrays (CString) provide an intermediary type that both Haskell and C++ understand, arbitrary arrays provide an intermediary type that can be referenced by both.

Let’s start by considering the monomorphic case of vector<int>, and we’ll begin on the C++ -> C side this time.

typedef std::vector<int> vector_int   //make a monomorphic type synonym we can reference in C and Haskell
vector_int* create_std_vectori();
vector_int* carray_to_std_vectori(int* a);
int* std_vectori_to_carray(vector_int* v);
size_t std_vectori_length(vector_int* v);

So this largely looks like the same story as std::string! We can convert to and from the equivalent C array type, we can create empty vectors, and we can get the length of the vector. Name mangling is accomplished by embedding a little type info into the name of the function. We could use the full name of the type but that would be tedious and unnecessary. Some useful and unique suffix is sufficient. I chose i for int, f for float, and p2f for Point2_<float>.

On the Haskell side we can use the identical approach we used in the std::string case.

#opaque_t vector_int
#ccall create_std_vectori, IO (Ptr <vector_int>)
#ccall carray_to_std_vectori, Ptr CInt -> IO (Ptr <vector_int>)
-- and so on and so forth

Wonderful. This also works just fine when compiled. And with GHC 7.8, this works within ghci as well! No luck with ghc-mod though. (Anyone else having issues with C++ dependencies and ghc-mod? See my post on linking issues.)

Switching to a Macro-based Approach

Of course, this is insane to actually try and use. Every time you want to declare a new vector type, you have to copy paste these blocks and fix all the types, suffixes, etc. This is exactly why macros exist, so let’s write one to simplify the process of adding new vector types.

This is pretty straightforward on the C side:

#define ADD_VECTOR_HEADERS(t, tn) \
    typedef vector< t > vector_##t; \
    vector_##t * create_std_vector##tn(); \
    vector_##t * carray_to_std_vector##tn( t * a, size_t len ); \
    t * std_vector##tn##_to_carray( vector_##t * v ); \
    size_t std_vector##tn##_length( vector_##t * v);

with a similar macro, ADD_VECTOR_IMPL(t, tn), to generate the corresponding implementations. Both of these macros are defined within interop.hpp.

On the Haskell side, I put this together on a first pass for use with the CPP preprocessor:

#define declare_vector_funcs (t, tn, ct) \ 
    #opaque_t vector_##t \
    #ccall create_std_vector##tn , IO (Ptr <vector_##t##>) \
    #ccall carray_to_std_vector##tn , Ptr ct -> CSize -> IO (Ptr <vector_##t##>) \
    #ccall std_vector##tn##_to_carray , Ptr <vector_##t##> -> IO (Ptr ct) \
    #ccall std_vector##tn##_length , Ptr <vector_##t##> -> IO CSize

This of course, fails utterly to work because it expands into hsc macro declarations that the preprocessor sticks into the generated .hs file. Whoops. After asking around on StackOverflow I wound up directly calling the macros those hsc macros expand into. The net effect is that with a declaration like:

#declare_vector_funcs int, i, CInt

we can now generate the boilerplate that makes vector_int available in Haskell. Progress! Adding new vector types is slightly less painful.

Goals for a User-interface

Well, not really. As it stands, you’ll need to actually edit my library, or add its source code to your project, in order to actually make use of these vector types. That’s not really a workable solution, and it’s probably why no one has bothered to write this library before.

But I don’t think this is the end of the story. By sticking these macros into one C header, I can make these macros available to others. And so you’ll be able to depend on foreign-cpp and add just the macro calls you need to get access to the monomorphic vector types. And once I’ve added the type classes and type families that are necessary, adding a new vector type should only require editing two files.

As much as I’d like to do better than that and provide a C++-like capability to automatically generate the correct boilerplate, I’m not sure its possible even with tools like quasiquoting and Template Haskell at my disposal, since I don’t believe either of those tools would let me generate the necessary C code, compile it, and have it ready and available at link time. Perhaps I’m wrong though, and I or someone else will come up with a way to do exactly that. If it’s possible, it’d be a neat trick that would be useful for both this library and my opencv bindings. And I’m sure others would find it incredibly useful as well.

comments powered by Disqus