Today I was still having a few issues getting the RowStream class to return a result from the MongoDB query. The problem that I was having was that I needed to keep track of the results from the query but every time I tried to save the result object in memory it would get deleted by the garbage collector and I would run into a segmentation error. My main goal was to find the source of this because up until now I really wasn't sure where the problem was.
I looked through the code with the debugger and the best I could tell is that there is no simple way to copy the object so what I did was add the result documents to a StringArray and I was able to create and save that in memory just fine. Once I did that I was able to see that MongoDB was returning results and my queries were connecting to MongoDB which means I was close.
Once I got the results from the MongoDB query I was ready to hand them off to the next stage of the execution. The MongoDBRowStream class goes through each result row and creates an ECL row using the MongoDBRowBuilder class. Then the ECL rows get written to the disk of the workunit and can be seen by the user. It was mostly copy and pasting the MongoDBRowBuilder and MongoDBRowStream classes and repurposing the ones from the couchbase plugin. I was able to get a dataset to return from a MongoDB query and that felt pretty good. I immediately started testing and quickly figured out that the current way of building documents is grossly insufficient. Currently I am only capable of building super simple documents that only contain simple key value pairs that must be passed in as function parameters.
I started creating a simple parser that would go through the embedded script and look for symbols that I could use to build a document. I started off by looking for a few basic identifiers. First, the key in every key-value pair is follow by a colon. That means I could just look for any character that can be in a MongoDB key and then go to the colon to get a pointer to the beginning and end of the key. This was easy to do in a loop and once I found a key I just had to look for a value. Sometimes a document or an array can be opened after a key is passed. These were both special cases we would have to take care of later. First I wanted to get simple key value pairs inserted into the builder. Since parameters can be surrounded by quotation marks I needed a case for that. A value can also be passed in that is the argument of the function where the embed statement was defined in. All these cases needed to be handled and to do that I had to write quite a bit of code. The couchbase plugin doesn't have to worry about this as much because the query function just takes a query string as an argument instead of in MongoDB's case where we have to pass documents into functions.
Once I was able to get simple key-value pairs inserted and allowed for users to pass in parameters through the function arguments I needed to move onto supporting subdocuments and subarrays. I was having quite a lot of trouble with this as I was using the bsoncxx::stream::builder and it was not trivial to configure to work the way I wanted it. The way I envisioned the stream builder working is that I could just read an open bracket in the script and open a document and close it when I read a close bracket. This didn't exactly work because it expects a key with every document that is opened, so you can't just look for an open bracket you have to look for a key and then an open bracket. I was able to get it half working with subdocuments seeming to work well, but when I wanted to add subarray I ran into a whole different problem.
Comments