Week 4

delvja01
Jun 14, 2022
10 min read

Updated: Jun 21, 2022

Day 1

Today my main focus was understanding the interfaces that the couchbase plugin uses to get and return the data in the records that are passed to it. I was able to differentiate between a few important classes that I decided to focus on first. I was interested in the RowBuilder and RecordBinder interfaces that were being used. Going over where they were defined in the eclhelper and eclrtl files was helpful, but it didn't give me a lot of context about when they get used.

The first thing that I did was take the method stubs and create two classes that inherited the RowBuilder and RecordBinder interfaces and added all their method stubs to the mongodbembed.cpp/hpp files. Once I finished that, I rebuilt the plugin and installed it again. Then I ran it to test my ecl file on it again and ensured that it connected to MongoDB and inserted a document. I wanted to able to add mongodb commands into EMBED block of code and in order to write some ECL code for that to be called I had to understand it a little better. I wasn't sure what function would be called when that code gets seen by the engine. Something that Dan suggested was to set breakpoints using gdb for a specific binary file and pid to see when or if it is being invoked by the engine. I tried getting it set up to run and I wasn't sure if it was working because everytime I tried to set the breakpoint for my code it couldn't find the file. It would ask me if it was part of a shared library that got linked during runtime which I'm pretty sure is what happens with the libmongodbembed.so file. I tried that and then tried to set a breakpoint for the MongoDBEmbedFunctionContext function that called the mongodb code just to make sure that the code was actually running.

When I tested the same ecl code that had worked before it suddenly wasn't running. It would compile it but the job would never complete. The only way the get them to run was to stop the cluster from running and start it again. There wasn't any output from the debugger, so I didn't think it had stopped at the breakpoint. Even after I closed out of the debugger it wouldn't complete the jobs that had been queued up. I thought it was just my novice level understanding of gdb and I was probably just using it wrong. I looked over some tutorials and the documentation for gdb. I was probably just pointing to the wrong binary and needed to try something else. When I restarted the engine again after using the debugger I noticed that the code that I had been trying actually returned a result based on the type of ECL return type that I passed in the function. That meant that the return functions that I had a stub for were actually getting called by the engine. I tried it out with a few different datatypes to make sure that I was right and the EmbedFunctionContext class was the one being called. After this I tried to get the debugger working again, but I was unable to get it to set any breakpoints. My goal for tomorrow will be to follow up with Dan and get some breakpoints set in order to get a better idea of what functions are being called and when.

Day 2

Today my goal was to make some progress debugging the HPCC-Platform code and gain some understanding of how it works and when it makes calls to the plugin. I spent a lot of time messing around with gdb at the command line and wasn't able to figure much out. I would run gdb and point it the /opt/HPCCSystems/bin/agentexec and provide the pid of the process that was running. I was trying to set some breakpoints for functions in the MongoDB plugin, but it wasn't able to find them. The first reason for this issue is that the shared library for the plugin only gets used on runtime, so it wasn't able to find any of the functions that I was trying to locate. I tried lot of things to get it to look in the right place like including the file path along with the command or pointing it to the location of the shared library file. After getting stumped I realized it would probably be better to configure it to work through VS Code as opposed to using the command line. This in the end was fairly simple, but there is little documentation available for the launch.json commands that you need to include in you configuration and very few guides that actually worked. I started by installing gdb on my local machine. This was necessary because VS Code doesn't have any built in debugger for C++ you have to install a few extensions and tell it what debugger to use before you can use the VS Code debug environment.

In order to install gdb on windows I installed MinGW and used the installation manager to install gdb for me. This seemed like the only way to do it and the other ways that I saw were a little more involved. Once I had gdb on my local machine I installed gdbserver on the remote machine where the build was running. It already had gdb installed, but I wanted to make sure gdbserver was there. The first thing that I realized was I was going to have to open another port on the VM for gdbserver to use and communicate with VS Code. Once that was done I started messing with the launch.json file to configure the debug setting in vscode properly. Even though it is technically remote the VS Code is already running on the VM using ssh, so after a little bit of messing around with it I realized that the actual address for the server that I needed to include in the launch.json file was 'localhost:PORT' and not the ip address of the VM. Once the port was open and I had the correct address for the server I was ready to try debugging again. In order to start the whole process you have to run gdbserver on the remote machine and in order to attach it to an already running program you have to run this command:

gdbserver --attach :PORT PID

Since the process that I was trying to debug runs as root I had to use sudo to get access to it. Once the gdbserver process is running on the remote machine I was able to connect to it by going to Run and Debug in VS Code and running the configuration that I added to launch.json. Once I was connected I tried adding a breakpoint for MongoDBEmbedFunctionContext::getBooleanResult because I noticed that function was being called and returning true. Since the function doesn't actually do anything and just returns immediatley I was able to see that all of the get functions in the MongoDBEmbedFunctionContext class were actually being called based on the return type I passed. It wasn't finding any of the functions that I was trying to look at more closely even though I am positive they are being called somewhere. When I was going through the gdb in the command line I notice a function called executeWorkunit that took a workunit id as an argument and did somethings to it before eventually forking a child and exiting. I added that as a breakpoint to see if I could get the debugger to actually stop at something and that actually worked. That means that my debugger setup is in fact correct and not the cause of not finding the other functions. I stepped through a lot of the code when I got into the executeWorkunit functions and stepped into something but I was still not finding where the plugin methods were being called from which was puzzling. Dan had something earlier in the day about my ecl code being too simple and the optimization of the ECL engine essentially 'folding' up my code and running it differently. This means that my next goal was to beef up the ecl code to include more datatypes and more functions. I began looking through the ECL Language Reference Manual on the HPCC Systems website and some example ecl code from couchbase. There is also code examples in the playground on the ECL Watch app.

I have a lot to review as it has been several months since I did the introductory trainings about ECL and working with that language. My goal for tomorrow will be to refamiliarize myself with it and write some more code to test my plugin with.

Day 3

My plan for today was to work on my knowledge of the ECL language so that I can write some more complicated code to run the HPCC cluster. The problem I was coming across while debugging is that my code is getting folded and not getting fully called where I can see it on the debugger. I was not quite sure how this was happening, but if I was able to make my code more complex with multiple datatypes for the arguments and return types then the system wouldn't be able to do as much optimization on it and I could see more of the execution. Before I started writing the ECL code I spent a little bit of time just going through the debugger and trying different breakpoints to see if I could find anything that related back to MongoDB. Unfortunately, I wasn't able to gain much information by looking at the eclagent binary even after stepping through and into a lot of the function calls it made I still wasn't getting anywhere.

So now it was time to work on the ECL code to change it up a bit and add some more variation to the calls. What I did was I took a dataset with only two entries and created a function that would take it in as an argument and return a boolean. Then I created another function that took a string as an argument and returned a dataset with the same layout as the one that got passed in the argument. Then I called them both. Now I had not one but two functions and a lot more types to work with. When I went to run the job it failed almost immediately. The error that it threw from the ECL Watch app was that there cannot be more than one active instance object. The instance object is what mongodb uses to encapsulate the driver and therefore it must be active for the entire time you are using the driver. This was honestly pretty good since it gave me a new direction to go in and this will be important to solve early as the rest of the plugin will depend upon this feature of multiple processes all being able to access the cluster at will.

Couchbase uses something similar for handling its connection pool, so I wanted to look through the code to see if I could understand how they do it. They use a ConnectionCache object to keep track of all the active connections. That is a little different from MongoDB as they have an already made class for this called mongocxx::pool. I wanted to read into this a little more, so I looked at some documentation from MongoDB for setting up this kind of thing. They have an example file that shows a way of creating a heap allocated mongocxx::instance that can be shared by all of the threads that are currently running. There example was a little hard to follow because they didn't show very much and there wasn't any description for what they were doing. They have documentation for the methods that they used, but that was confusing too. I think it will take a little longer to be able to understand what pool actually does and how best to utilize that. I wanted to keep looking over the CouchbaseConnection and CouchbaseConnectionCache objects because there is a lot of good information for how I can implement my MongoDBConnection class. Those classes are one of the only ones that don't really implement any big interfaces from the HPCC System, so it will be very tailored to couchbase, but it will be good to see how they organized there calls to couchbase.

I started to create a class that would handle the mongocxx::instance and I pretty much just copied it from the tutorial, but I don't think it will work that easily. I will need to modify it a little to work with my specific use case. Tomorrow my goal will be to get that class working and have multiple threads connecting to the MongoDB atlas using the same instance object.

Day 4 & 5

Today my main goal was to get the MongoDBConnection class working so that multiple threads could share a single mongocxx:instance reference. The first thing that I did was finish creating the class from the example and to add the functionality into the EmbedFunctionContext method so that it uses a shared reference. Originally this didn't work and it took a lot of messing around with definitions to get it to function properly. It has a static function in the class that creates the instance reference and then there is a configure method that will configure it for a certain uri and add a client to a pool. This way all the seperate threads could have different connection strings but all share one instance and execute concurrently. There is a configure method that initializes the connection pool and to get it working I needed to use std::call_once to ensure that the instance only gets configured the first time embed context is called.

After I got all the kinks worked out I did some testing to ensure everything was working the way I thought it would. The code for inserting the documents gets called when the EmbedFunctionContext method gets called and it gets called everytime a new embed statement is used in a function definition. I made sure it could add two documents at the same time and then tested it with three. It worked perfectly and I didn't see any reason why more than that would cause any new issues. Once I was sure that was working I really wanted to get working on taking input from the ecl code. I tried adding the RowBuilder and RowBinder classes to my plugin, but they only caused more issues. I wasn't really sure what I was doing I was just trying to trace the code a little to see what methods get called after EmbedFunctionContext method call. I am pretty sure it I have it figured out but next will be focused mainly on trying to get the execution steps down to see how the data gets exchanged between methods.

Week 4

Day 1

Day 2

Day 3

Day 4 & 5

Recent Posts

Comments