[gecode-users] Cloning problems
Filip Konvička
filip.konvicka at logis.cz
Thu Feb 12 14:36:03 CET 2009
Yes, 32 bits.
Filip
> One quick question: 32 or 64 bits? Looks as it 32, right?
>
> Christian
>
> --
> Christian Schulte, www.it.kth.se/~cschulte/
>
> -----Original Message-----
> From: users-bounces at gecode.org [mailto:users-bounces at gecode.org] On Behalf
> Of Filip Konvicka
> Sent: Thursday, February 12, 2009 2:21 PM
> To: users at gecode.org
> Cc: Luboš Moric
> Subject: [gecode-users] Cloning problems
>
> Hi,
>
> [Sorry, this is a looong message...]
>
> we're hunting a serious bug that occurs during space cloning in 2.2.0.
> The bug occurs very rarely, but we have a testcase that triggers this
> behavior.
>
> We have many constraints in the problem instance and the solver should
> post as many propagators as possible. We have a custom branching for
> this, which posts one propagator at a time in commit(), while the
> alternative is not to post the propagator (i.e. a no-op). Because we're
> only looking for the first solution, in the case of a failure we no
> longer need the path back to the root in the recomputation tree, so we
> decided to use our own simple search engine for this. The standard DFS
> search engine exhibits exactly the same behavior (both with
> recomputation on and off), and we don't see any problems with our search
> engine.
>
> Everything seems to work for the vast majority of the test cases, but
> there are a few instances that cause problems (probably) during cloning
> (can be probably also be caused by some earlier bad subscibe or
> unsubscribe). From our point of view, there is nothing wrong or special
> about the instances. The crashes occur at the same location both on
> Linux and Windows, in both release and debug builds. Changing memory
> management (e.g. never deleting Spaces in the search engine) can cause
> the crash to occur at slightly different places (e.g. some propagation
> during status() after clone() finishes).
>
> One particular case we're looking at now crashes at core.icc:2270, where
> f[0] is a bad pointer (0xfeeefeee at Windows). We're not sure how this
> can happen - we know that in this case n==2 at core.icc:2255, so idx[0]
> is bad pointer at core.icc:2252. This is also what Valgrind says on
> Linux (bad read of size 4).
>
> When we were trying to debug the other cases, we found out that the
> subscription list in a variable in the cloned space contained an actor
> link that was probably copied incorrectly as it seemed as a pure
> ActorLink like Space::a_actors, having a totally different address than
> the rest of the actors (probably belonging to the original space
> object). When we tried to find out when this actor link entered the
> list, we ended up in VarImp<VIC>::update again.
>
> We're (of course:-)) using FloatVars in the model, and we eliminated all
> other kinds of variables and propagators. In our case, pc_max==1 and
> free_bits==0.
>
> We find it difficult to understand what is happening during cloning. We
> would appreciate if someone explaned the basic idea. We only have
> floatvars, propagators and one branching (no advisors or other types of
> actors/branchings/advisors).
>
> We know how VarImp<VIC>::resize works, that's easy. In
> VarImp<VIC>::enter, we can't see why you do "--idx[0];" as the first
> iteration of the for cycle overwrites it (as long as pc>0, of course).
> May be just optimization of course. As for VarImp<VIC>::update, we only
> guess...we suspect that a) the original x->idx[0] is destroyed somewhere
> so it needs to get restored from a memcpy backup at idx[0], b)
> ActorLink::_prev is probably used to map old actors to new ones (thus
> the "->prev()". We did not dig deep enough to be sure though, so we'd
> welcome some guidance here.
>
> Cheers,
> Filip
>
>
> _______________________________________________
> Gecode users mailing list
> users at gecode.org
> https://www.gecode.org/mailman/listinfo/gecode-users
More information about the gecode-users
mailing list