Slicing a ResultSet breaks subsequent count() calls.

Bug #620508 reported by Leonard Richardson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Storm
Fix Released
High
Thomas Herve

Bug Description

After making an optimization to lazr.batchnavigator I was blindsided by this backtrace in a Launchpad test:

      File "/home/leonardr/canonical/lp-sourcedeps/eggs/lazr.batchnavigator-1.2.1-py2.6.egg/lazr/batchnavigator/z3batching/batch.py", line 69, in trueSize
        if self.start >= self.listlength:
      File "/home/leonardr/canonical/lp-sourcedeps/eggs/zope.cachedescriptors-3.5.0-py2.6.egg/zope/cachedescriptors/property.py", line 71, in __get__
        value = func(inst)
      File "/home/leonardr/canonical/lp-sourcedeps/eggs/lazr.batchnavigator-1.2.1-py2.6.egg/lazr/batchnavigator/z3batching/batch.py", line 48, in listlength
        self._listlength = self._get_length(self.list)
      File "/home/leonardr/canonical/lp-sourcedeps/eggs/lazr.batchnavigator-1.2.1-py2.6.egg/lazr/batchnavigator/z3batching/batch.py", line 43, in _get_length
        return len(results)
      File "/home/leonardr/canonical/lp-branches/optimized-length/lib/canonical/launchpad/webapp/batching.py", line 32, in __len__
        return self.context.count()
      File "/home/leonardr/canonical/lp-sourcedeps/eggs/storm-0.17-py2.6-linux-i686.egg/storm/store.py", line 1241, in count
        return int(self._aggregate(lambda expr: Count(expr, distinct), expr))
      File "/home/leonardr/canonical/lp-sourcedeps/eggs/storm-0.17-py2.6-linux-i686.egg/storm/store.py", line 1228, in _aggregate
        subquery = replace_columns(self._get_select(), columns)
      File "/home/leonardr/canonical/lp-sourcedeps/eggs/storm-0.17-py2.6-linux-i686.egg/storm/store.py", line 1771, in replace_columns
        "__contains__() does not yet support set "
    FeatureError: __contains__() does not yet support set expressions that combine ORDER BY with LIMIT/OFFSET

It would be nice if this query worked; it would let us use the batchnavigator optimization in Launchpad. As it is, I'm afraid to use that optimization because I'm afraid it'll cause OOPSes on Launchpad. (Yesterday I was able to get OOPSes on Launchpad with simple web service requests, which is why I'm scrambling to undo the optimization, but today it seems to be working.)

I'm very inexperienced with Storm and I can't figure out how to find the specific SQL query that is causing the problem. To me it looks like a complicated data structure with no string representation. I'm also not sure why a 'count' query even needs a LIMIT or OFFSET--seems like that could be stripped at some level.

Presumably this feature isn't supported because it's difficult to implement; but there must be other solutions to my problem besides just making this code work. I'd appreciate some advice on how to avoid this problem, and I'll provide whatever information you ask for.

Related branches

summary: - "__contains__() does not yet support set expressions that combine ORDER
- BY with LIMIT/OFFSET"
+ Slicing a ResultSet breaks subsequent len() calls.
Revision history for this message
Leonard Richardson (leonardr) wrote : Re: Slicing a ResultSet breaks subsequent len() calls.

OK, here is my real problem. 'foo' is a complex ResultSet.

>>> len(foo)
9
>>> tmp = foo[1:3]
>>> len(foo)
Traceback (most recent call last):
...
FeatureError: __contains__() does not yet support set expressions that combine ORDER BY with LIMIT/OFFSET

Here's the problem. When I slice the ResultSet, ResultSet.__getitem__ makes a copy of the ResultSet and configures the copy with the desired offset and limit. But the original and the copy share the same .select member. (This is an object from the storm.expr module.) When I call len(foo), ._select.limit and ._select.offset are populated for both the original and the copy.

When I call len() the second time. some code in _get_select() runs that looks like it should normalize the data.

        if self._select is not Undef:
            if self._order_by is not Undef:
                self._select.order_by = self._order_by
            if self._limit is not Undef: # XXX UNTESTED!
                self._select.limit = self._limit
            if self._offset is not Undef: # XXX UNTESTED!
                self._select.offset = self._offset
            return self._select

But it doesn't work in this case. This code only overrides .limit or .offset if the new value is not Undef. So if I slice the list again:

>>> tmp2 = foo[1:6]

The old .limit '2' will be replaced by '5' and I will get the correct slice. But when I call len(), the new limit is Undef, so the old value for .limit is left in place, and I get the FeatureError.

There are two solutions. Either change _get_select() to override a non-Undef value with Undef, or change the copy() operation to copy ._select.

Revision history for this message
Leonard Richardson (leonardr) wrote :

It looks like this bug was fixed in Storm 0.17?

Revision history for this message
Leonard Richardson (leonardr) wrote :

Never mind, it's not fixed on 0.17. I was looking at a hacked version of 0.17 on my own computer. This bug is still a big problem. I do have a workaround which I will apply to Launchpad.

Revision history for this message
Gary Poster (gary) wrote :

This is biting us again on bug 627442. If fixing the problem is not a lot more work for us than continuing to work around it, we'd be interested. Would someone offer some guidance, per Leonard's comments? Thank you!

Jamu Kakar (jkakar)
Changed in storm:
importance: Undecided → High
milestone: none → 0.18
status: New → Triaged
summary: - Slicing a ResultSet breaks subsequent len() calls.
+ Slicing a ResultSet breaks subsequent count() calls.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

<niemeyer> lifeless: I see
<niemeyer> lifeless: Looks like an obvious bug indeed
<lifeless> are the solutions discussed in the bug appropriate ?
<niemeyer> lifeless: Probably not entirely
<niemeyer> lifeless: Discussing:
<niemeyer> """
<niemeyer> There are two solutions. Either change _get_select() to override a non-Undef value with Undef, or change the copy() operation to copy ._select.
<niemeyer> """
<niemeyer> The later probably means change copy() to *not* copy select (since it's already copying it nowadays)
<niemeyer> But it'd be wrong even then
<niemeyer> _select in this case is used exclusively on set operations
<niemeyer> So if it's not copied, the copy becomes incorrect immediately
<niemeyer> Turning _select into Undef would have the same result
<niemeyer> lifeless: That additional complexity is used specifically by _set_expr()
<niemeyer> lifeless: The right fix is likely in the direction of copying the outer expression contained in _select during copy(), or something similar (and less hackish)

Revision history for this message
Thomas Herve (therve) wrote :

I have written a test for the problem, and a possible fix: http://pastebin.ubuntu.com/486640/

I may not get all the implications, but it seems like a reasonable change.

Revision history for this message
Jamu Kakar (jkakar) wrote :

The test and code in the paste look good to me.

Thomas Herve (therve)
Changed in storm:
assignee: nobody → Thomas Herve (therve)
Thomas Herve (therve)
Changed in storm:
status: Triaged → Fix Committed
Gary Poster (gary)
Changed in storm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.