[gecode-users] Cloning problems

Thu Feb 12 14:36:03 CET 2009

Yes, 32 bits.

Filip

> One quick question: 32 or 64 bits? Looks as it 32, right?
> 
> Christian
> 
> --
> Christian Schulte, www.it.kth.se/~cschulte/
> 
> -----Original Message-----
> From: users-bounces at gecode.org [mailto:users-bounces at gecode.org] On Behalf
> Of Filip Konvicka
> Sent: Thursday, February 12, 2009 2:21 PM
> To: users at gecode.org
> Cc: Luboš Moric
> Subject: [gecode-users] Cloning problems
> 
> Hi,
> 
> [Sorry, this is a looong message...]
> 
> we're hunting a serious bug that occurs during space cloning in 2.2.0. 
> The bug occurs very rarely, but we have a testcase that triggers this 
> behavior.
> 
> We have many constraints in the problem instance and the solver should 
> post as many propagators as possible. We have a custom branching for 
> this, which posts one propagator at a time in commit(), while the 
> alternative is not to post the propagator (i.e. a no-op). Because we're 
> only looking for the first solution, in the case of a failure we no 
> longer need the path back to the root in the recomputation tree, so we 
> decided to use our own simple search engine for this. The standard DFS 
> search engine exhibits exactly the same behavior (both with 
> recomputation on and off), and we don't see any problems with our search 
> engine.
> 
> Everything seems to work for the vast majority of the test cases, but 
> there are a few instances that cause problems (probably) during cloning 
> (can be probably also be caused by some earlier bad subscibe or 
> unsubscribe). From our point of view, there is nothing wrong or special 
> about the instances. The crashes occur at the same location both on 
> Linux and Windows, in both release and debug builds. Changing memory 
> management (e.g. never deleting Spaces in the search engine) can cause 
> the crash to occur at slightly different places (e.g. some propagation 
> during status() after clone() finishes).
> 
> One particular case we're looking at now crashes at core.icc:2270, where 
> f[0] is a bad pointer (0xfeeefeee at Windows). We're not sure how this 
> can happen - we know that in this case n==2 at core.icc:2255, so idx[0] 
> is bad pointer at core.icc:2252. This is also what Valgrind says on 
> Linux (bad read of size 4).
> 
> When we were trying to debug the other cases, we found out that the 
> subscription list in a variable in the cloned space contained an actor 
> link that was probably copied incorrectly as it seemed as a pure 
> ActorLink like Space::a_actors, having a totally different address than 
> the rest of the actors (probably belonging to the original space 
> object). When we tried to find out when this actor link entered the 
> list, we ended up in VarImp<VIC>::update again.
> 
> We're (of course:-)) using FloatVars in the model, and we eliminated all 
> other kinds of variables and propagators. In our case, pc_max==1 and 
> free_bits==0.
> 
> We find it difficult to understand what is happening during cloning. We 
> would appreciate if someone explaned the basic idea. We only have 
> floatvars, propagators and one branching (no advisors or other types of 
> actors/branchings/advisors).
> 
> We know how VarImp<VIC>::resize works, that's easy. In 
> VarImp<VIC>::enter, we can't see why you do "--idx[0];" as the first 
> iteration of the for cycle overwrites it (as long as pc>0, of course). 
> May be just optimization of course. As for VarImp<VIC>::update, we only 
> guess...we suspect that a) the original x->idx[0] is destroyed somewhere 
> so it needs to get restored from a memcpy backup at idx[0], b) 
> ActorLink::_prev is probably used to map old actors to new ones (thus 
> the "->prev()". We did not dig deep enough to be sure though, so we'd 
> welcome some guidance here.
> 
> Cheers,
> Filip
> 
> 
> _______________________________________________
> Gecode users mailing list
> users at gecode.org
> https://www.gecode.org/mailman/listinfo/gecode-users