[gecode-users] Cloning problems

Filip Konvička filip.konvicka at logis.cz
Fri Feb 13 15:49:46 CET 2009


Hi,

we just discussed the very same idea. It seems that during cloning, an 
actor is present in the variable subscription list, but not in the 
Space::a_actors list, so we thought that maybe an actor disappeared 
without unsubscribing. We're using ES_SUBSUMED(this, home) everywhere in 
floatvar propagators to indicate entailment, I'll check whether that is 
done everywhere.

Thanks,
Filip

> Hi Mikael and I just had one more thing you could check: the behaviour could
> also be due to a propagator that says subsumption but does not delete all of
> its dependencies via cancel.
>
> Christian
>
> --
> Christian Schulte, www.ict.kth.se/~cschulte/
>
>
> -----Original Message-----
> From: Christian Schulte [mailto:cschulte at kth.se] 
> Sent: Friday, February 13, 2009 11:12 AM
> To: 'Filip Konvička'; 'users at gecode.org'
> Cc: 'Luboš Moric'
> Subject: RE: [gecode-users] Cloning problems
>
> Hi,
>
> that sounds ugly ;-) Before I start with rambling about cloning, a tip:
> Gecode supports unary branchings (with a single alternatiuve only). So you
> could have a unary branching and don't have to have a mock-up empty branch.
> The search engines even optimize that case. If you are looking for an
> example, check the assign function in gecode/int.hh for assigning variables.
>
> Okay, cloning it is. So, there are two stages: copying and updating. During
> copying you first create a new space and copy all actors. While copying all
> actors you copy each variable and its domain used by an actor and collect a
> list of all these variables (actually a list for each variable type). The
> only thing that you do not do is copying the dependency array of a variable
> (the array of pointers that point to propagators that depend on a variable).
> Then in updating, you update the dependency arrays (and also deal with
> advisors).
>
> Updating a dependency array just copies for each old entry the new entry
> (stored via a forwarding pointer in each actor) and restore some information
> for variables that has been stored in the copy of the dependency array.
> After that you cleanup the space, the actors, and you are done.
>
> In VarImp<VIC>::enter, the --idx[0] makes room for an additional entry. Then
> you keep moving actors to a free entry and by this create a new free entry
> for the propagation condition i. When the loop is done, you know that there
> is a free entry for propagation condition pc and that's where the propagator
> is entered.
>
> Your guess about x->idx[0] in VarImp<VIC>::update is correct: this actually
> stores the forwarding pointer for a variable implementation. Also your
> suspicion about ActorLink::prev is correct: there the forwarding pointer for
> actors is stored.
>
> Let me come up with some idea where the trouble could be:
>  - you forget to update a variable or view at some place. Either in your
> branching, propagator, or model. 
>    Then you can see an invalid pointer in the dependency array.
>    You might want to change the default constructor of VarBase in
> gecode/kernel/var.icc and the default constructor
>    of VarViewBase in gecode/kernel/view.icc so that they set varimp to NULL
> (actually, I just did that yesterday in the trunk as well). Then you will be
> sure that when you forget to update, you'll see a NULL pointer.
>  - you commit a heinous crime in that you change the dependencies of a
> variable with cancel/subscribe during 
>    cloning. That is illegal and will for sure break everything. Note that
> creating a propagator during
>    cloning will commit that crime, too.
>
> It is very very unlikely (but not impossible, of course) that the cloning in
> Gecode has a snatch but, as said, very unlikely given that its used by
> anything else.
>
> Hope that helps. Please do not hesitate to ask some more
> Christian
>
> --
> Christian Schulte, www.ict.kth.se/~cschulte/
>
>
> -----Original Message-----
> From: users-bounces at gecode.org [mailto:users-bounces at gecode.org] On Behalf
> Of Filip Konvicka
> Sent: Thursday, February 12, 2009 2:21 PM
> To: users at gecode.org
> Cc: Luboš Moric
> Subject: [gecode-users] Cloning problems
>
> Hi,
>
> [Sorry, this is a looong message...]
>
> we're hunting a serious bug that occurs during space cloning in 2.2.0. 
> The bug occurs very rarely, but we have a testcase that triggers this 
> behavior.
>
> We have many constraints in the problem instance and the solver should 
> post as many propagators as possible. We have a custom branching for 
> this, which posts one propagator at a time in commit(), while the 
> alternative is not to post the propagator (i.e. a no-op). Because we're 
> only looking for the first solution, in the case of a failure we no 
> longer need the path back to the root in the recomputation tree, so we 
> decided to use our own simple search engine for this. The standard DFS 
> search engine exhibits exactly the same behavior (both with 
> recomputation on and off), and we don't see any problems with our search 
> engine.
>
> Everything seems to work for the vast majority of the test cases, but 
> there are a few instances that cause problems (probably) during cloning 
> (can be probably also be caused by some earlier bad subscibe or 
> unsubscribe). From our point of view, there is nothing wrong or special 
> about the instances. The crashes occur at the same location both on 
> Linux and Windows, in both release and debug builds. Changing memory 
> management (e.g. never deleting Spaces in the search engine) can cause 
> the crash to occur at slightly different places (e.g. some propagation 
> during status() after clone() finishes).
>
> One particular case we're looking at now crashes at core.icc:2270, where 
> f[0] is a bad pointer (0xfeeefeee at Windows). We're not sure how this 
> can happen - we know that in this case n==2 at core.icc:2255, so idx[0] 
> is bad pointer at core.icc:2252. This is also what Valgrind says on 
> Linux (bad read of size 4).
>
> When we were trying to debug the other cases, we found out that the 
> subscription list in a variable in the cloned space contained an actor 
> link that was probably copied incorrectly as it seemed as a pure 
> ActorLink like Space::a_actors, having a totally different address than 
> the rest of the actors (probably belonging to the original space 
> object). When we tried to find out when this actor link entered the 
> list, we ended up in VarImp<VIC>::update again.
>
> We're (of course:-)) using FloatVars in the model, and we eliminated all 
> other kinds of variables and propagators. In our case, pc_max==1 and 
> free_bits==0.
>
> We find it difficult to understand what is happening during cloning. We 
> would appreciate if someone explaned the basic idea. We only have 
> floatvars, propagators and one branching (no advisors or other types of 
> actors/branchings/advisors).
>
> We know how VarImp<VIC>::resize works, that's easy. In 
> VarImp<VIC>::enter, we can't see why you do "--idx[0];" as the first 
> iteration of the for cycle overwrites it (as long as pc>0, of course). 
> May be just optimization of course. As for VarImp<VIC>::update, we only 
> guess...we suspect that a) the original x->idx[0] is destroyed somewhere 
> so it needs to get restored from a memcpy backup at idx[0], b) 
> ActorLink::_prev is probably used to map old actors to new ones (thus 
> the "->prev()". We did not dig deep enough to be sure though, so we'd 
> welcome some guidance here.
>
> Cheers,
> Filip
>
>
> _______________________________________________
> Gecode users mailing list
> users at gecode.org
> https://www.gecode.org/mailman/listinfo/gecode-users
>
>   


-- 
Filip Konvička

LOGIS, s.r.o.
tel. +420 556 841 100
mob. +420 736 758 714





More information about the gecode-users mailing list