[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patches] Track resolv.conf, avoid need for explicit res_init() or nscd in long-running applications



On Jun 20, 2011, at 9:38 PM, Paul Pluzhnikov wrote:

> On Wed, May 25, 2011 at 6:47 PM, Paul Pluzhnikov <ppluzhnikov@xxxxxxxxxx> wrote:
>> On Wed, May 25, 2011 at 6:20 PM, Paul Pluzhnikov <ppluzhnikov@xxxxxxxxxx> wrote:
>> 
>>> We've just tracked a bug to this patch:
>>> http://www.eglibc.org/cgi-bin/viewcvs.cgi/trunk/libc/resolv/res_libc.c?rev=9102&r1=8608&r2=9102
>> 
>> Oops. I see this has already been noted:
>> http://www.eglibc.org/archives/patches/msg00901.html
>> 
>> Sorry about the noise.
> 
> It has been pointed out to me that the fix above is still buggy :-(
> 
> Effectively it's doing double-checked locking. Consider:
> 
> 1. two threads (A and B) use the resolver, loading some version of resolv.conf
> 2. the threads go idle
> 3. some time later, resolv.conf changes
> 4. a long time -- hours, maybe -- elapses, with no further changes or activity
> 5. both threads wake up, and both call __res_maybe_init() at the same time
> 6. both threads stat() the file and see the new timestamp
> 7. thread A updates last_mtime to the new timestamp
> 8. thread B checks last_mtime, sees that it's current
> 9. thread B checks __res_initstamp, which has not yet changed
> 10. thread B goes ahead using the hours-old resolv.conf data
> 11. thread A now updates __res_initstamp, but it's too late
> 
> The next time thread B calls __res_maybe_init(), it will _probably_ get
> the right data.  (Theoretically, an arbitrary amount of time and number
> of requests can elapse after thread A updates last_time and before thread
> A updates __res_initstamp, all of which will use the stale resolv.conf data.)

The analysis and the patch look correct to me.  I'll apply the fix in couple of days to give Joseph and everyone else time to comment.

One point that is not immediately clear from your patch, is why you're making declaration of 'lock' unconditional.  IIUC, this is because 'lock' is now guarding access and modification of 'last_mtime', while previously it was used only to guard '__res_initstamp'.

Thank you for fixing this,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics