Tackling iBatis N+1 Problem

Let’s say you have a list of objects and each of those objects contains a list of another object. Filling these objects from your database involves a select statement to gather the first parent list of objects, followed by another select statement for every object in that list (N select statements). Add those together and you’re left with N+1 SQL statements to build one object. Now, this may not necessarily be view as a bad thing if you must use every sub-list in the objects, but, if you’re anything like me, you’ll use two of them. So now you’ve hit your database N+1 times to get the functionality that could be handled in one call. That’s a problem.

Luckily, most object/relational mapping frameworks (like iBatis for example) provide a solution. There are several ways of getting around this excess of statements. The first is to merge your two select statements into one bigger statement using joins and telling iBatis what should be a part of the sub-list of objects. This provides a nice option if you have simple hierarchies of objects, but your statement will quickly get messy if you have four or more levels of sublists. The second option is to map your list object to a select statement and use lazy joining. What this means is that the parent object’s sublist will be filled with unloaded select statements that will only be run when you try to access something in one of the subobjects.

Before I give the solutions, let me first provide our starting functionality (ridden with N+1 statements).

<sqlMap namespace="someNamespace">

<resultMap
id="parentResult"
class="com.adamdoyle.project.domain.ParentObject">
<result property="id" column="id" />
<result property="name" column="name" />
</resultMap>

<resultMap
id="childResult"
class="com.adamdoyle.project.domain.ChildObject">
<result property="id" column="id" />
<result property="parentId" column="parent_id" />
</resultMap>

<select
id="selectParent"
parameterClass="map"
resultMap="parentResult">
SELECT * FROM parent_table;
</select>

<select
id="selectChildrenForParent"
parameterClass="int"
resultMap="childResult">
SELECT * FROM child_table
WHERE parent_id = #parentId:NUMERIC#;
</select>

</sqlMap>

And then somewhere in your java code you would need something similar to the following:

List parents =
executeQueryForList("someNamespace.selectParent", null);

for (int parentCount = 0;
parentCount < parents.size();
parentCount++) {

ParentObject parent =
(ParentObject)parents.remove(parentCount);
List children =
executeQueryForList("someNamespace.selectChildrenForParent",
parent.getId())
parent.setChildList(children);
parents.add(parentCount, parent);
}

So now not only are you using N+1 select statements, you are also doing extra work to map your objects to your database. Unnecessary extra work, that is.

Prerequisite:

Before the following two solutions will work, you’ll need to make sure that you have lazy loading enabled in iBatis. In your main sqlMap.xml file, verify that lazyLoadingEnabled=”true” is present in the settings portion.

Solution one:

For this solution we are going to merge our two select statements into one all encompassing statement using joins. Here is the new SQL map.

<sqlMap namespace="someNamespace">

<resultMap
id="parentResult"
class="com.adamdoyle.project.domain.ParentObject"
groupBy="id">
<result property="id" column="id" />
<result property="name" column="name" />
<result property="childList"
resultMap="someNamespace.childResult"/>
</resultMap>

<resultMap
id="childResult"
class="com.adamdoyle.project.domain.ChildObject">
<result property="id" column="child_id" />
<result property="parentId" column="parent_id" />
</resultMap>

<select
id="selectParent"
parameterClass="map"
resultMap="parentResult">
SELECT * FROM parent_table
LEFT JOIN child_table
ON parent_table.id = child_table.parent_id
GROUP BY id;
</select>

</sqlMap>

In your java code you will only need this now:

List parents =
executeQueryForList("someNamespace.selectParent", null);

Simple, no? Every parent object inside the list will now have a SQL statement lazily loaded for its children list, ready to run as soon as you try and access it. The points of interest with this new method is the groupBy parameter of the parentResult. This prevents loading identical objects into the original parent list. Second is the childList property of the parentResult map which tells which variable (childList) is defined by which resultMap (childResult). Other than that the only change is the actual syntax involved in the SQL statement.

Although the above code handles the problem, I don’t really like it. First, as I stated earlier, your SQL statement can start to get incredibly bloated, ugly, and illegible (job security – yes, headaches later – also yes). Also, I’ve had issues with children-less parents giving errors. There’s probably some way to fix this, I just don’t know it (someone?).

I prefer the second solution to this problem, because it looks much more elegant in your SQL map files and has no real issues that I can find. Here it is:

<sqlMap namespace="someNamespace">

<resultMap
id="parentResult"
class="com.adamdoyle.project.domain.ParentObject">
<result property="id" column="id" />
<result property="name" column="name" />
<result property="childList" column="id"
select="someNamespace.selectChildrenForParent"/>
</resultMap>

<resultMap
id="childResult"
class="com.adamdoyle.project.domain.ChildObject">
<result property="id" column="id" />
<result property="parentId" column="parent_id" />
</resultMap>

<select
id="selectParent"
parameterClass="map"
resultMap="parentResult">
SELECT * FROM parent_table;
</select>

<select
id="selectChildrenForParent"
parameterClass="int"
resultMap="childResult">
SELECT * FROM child_table
WHERE parent_id = #parentId:NUMERIC#;
</select>

</sqlMap>

The java code will be identical to the first solution, the only thing that changes is how things are handled in your SQL map. What this does is maps your list (named childList in this example) to a select statement. In the column attribute you place the variable that you would like to pass onto the child select statement (in this case the parent’s id). In my opinion, this works stupendously. All of your SQL statements are broken up cleanly yet you still only run one SQL statement (further SQL statements will be run later if you need to access the data). Fantastic.

But what if you need to pass two variables to your child select statement? Luckily, that is also handled by iBatis. In your resultMap for parentResult, simply change column=”id” to column=”{parentId=id,someOtherId=some_other_id}”. Inside your child select statement, you can then use #parentId# and #someOtherId# to further narrow your search. Double fantastic.

Both solutions can handle more complex examples where your child objects have their own children, which have their own children, which have their own children, etc… Handy dandy.

If you noticed any glaringly bad errors in my code or explanation, please don’t hesitate to let me know. Thanks.

On a side note, if any knows of an easy way for me to display my code samples, I would surely appreciate it. I don’t just want to escape the problem characters (like <) but would enjoy a box around it with line numbers and whatnot (I’m sure it’s there – I’ve seen it – I’m just new to the whole WordPress thing). That would be amazing. Let me know.

Reinvent the wheel

Everyone always says that you shouldn’t reinvent the wheel. I don’t necessarily believe this to be a concrete statement, although in some instances it works. I didn’t feel like spending the time/effort to create a blog and knew that people had already gone through the trouble, exo facto I used WordPress (sidenote: I’m not sure if “exo facto” is an actual phrase or not, I just like using it). However, in other instances I think it is absolutely crucial to reinvent the wheel to a certain level, usually when the preexisting wheel, for lack of a better word, sucks.

An interesting aspect involves not reinventing the wheel while reinventing the bigger wheel. By this I mean focus should be placed on making your code (or whatever) reusable, so that when you’re doing something very similar a short while down the time line you can speed up the process a little bit. In addition to the time benefit, you will also significantly cut down on stupid errors (ie: typos) that will keep you up a little later into the night.

My usual rule of thumb for turning a small piece of code into a function happens when I begin to type that code (or something very similar) in a different location for the second time. At this point I take a few moments, turn the code into a function, update the original location to use the function, and add the function call to where I’m currently working. Tada. Suddenly I never have to worry about implementing that functionality again. If something is similar but different enough that you can’t do this (as is the case with iBatis SQL maps), I avoid reinventing the wheel by copying a similar SQL map, and changing just what I need to change in order to work with my new object. If I were to try and retype the entire map from memory I would surely fail at it; firstly, because I have no idea what the exact syntax for the entire map is, and, secondly, I’m notorious for making a typo in a location that isn’t completely destructive but will prevent something from working in precisely three days time.

Because of these reasons I am definitely a fan of reinventing the wheel when the time calls for it. If noone ever invented the wheel because they thought the current process was good enough, we would be at a lack for several important aspects of our society. I would give some examples here, but I don’t want to. So just believe me. Thanks.

I suppose that in the final moments of my first post I should explain a little bit about myself. I am a Computer Science student at the University of Kansas. I’m currently six hours from graduating, at which point I’ll transition to the life of a Graduate student. I recently quit my job creating web applications for a local power company and took up a job creating web applications for a website that provides education and tools to secondary education teachers (www.transitioncoalition.org).

In addition to my day job (that pays the bills), I work as a co-developer on a start-up called Foodpatio.com. Foodpatio more-or-less provides its users with the ability to order food from local restaurants with a level of simplicity and ease that should be required. If you think the idea sounds familiar, you’re semi-right (I suppose that’s why I started this post on why you *should* reinvent the wheel). Foodpatio has a couple of selling points that make it unique and worthwhile. First of all, you can order from multiple restaurants on the same order, and we take care of splitting it up for you. Secondly, we’re working to provide countless ways to access and order food so that it is always a convenient process. Thirdly, specials are entirely integrated into the normal ordering process so that you will always receive the best price for what you want. Foodpatio is the product of close to a year of late nights and countless design changes and is nearing its release into the world debutante style. I’m pretty excited about it.