nscd crashed with SIGSEGV in start_thread()

Bug #256157 reported by Daniel J Blueman
168
This bug affects 5 people
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Fix Released
Undecided
Unassigned
Intrepid
Fix Released
Undecided
Unassigned

Bug Description

I've seen nscd being reported as crashing as number of times, sometimes a fair time after logging in.

Perhaps this relates to the state stored in /var/cache/nscd/{group,passwd,services}. I note that the defaults /etc/nscd.conf states the hosts cache to be persistent, however there is no /var/cache/nscd/hosts, which is surprising...

ProblemType: Crash
Architecture: amd64
CrashCounter: 1
Dependencies:
 libgcc1 1:4.3.1-8ubuntu3
 gcc-4.3-base 4.3.1-8ubuntu3
 findutils 4.4.0-2ubuntu3
 libc6 2.8~20080505-0ubuntu6
DistroRelease: Ubuntu 8.10
ExecutablePath: /usr/sbin/nscd
Package: nscd 2.8~20080505-0ubuntu6
ProcAttrCurrent: unconfined
ProcCmdline: /usr/sbin/nscd
ProcEnviron: PATH=/sbin:/usr/sbin:/bin:/usr/bin
Signal: 11
SourcePackage: glibc
StacktraceTop:
 ?? () from /usr/sbin/nscd
 ?? () from /usr/sbin/nscd
 ?? () from /usr/sbin/nscd
 start_thread () from /lib/libpthread.so.0
 clone () from /lib/libc.so.6
Title: nscd crashed with SIGSEGV in start_thread()
Uname: Linux 2.6.26-4-generic x86_64
UserGroups:

Revision history for this message
Daniel J Blueman (danielblueman) wrote :
Revision history for this message
Apport retracing service (apport) wrote : Symbolic stack trace

StacktraceTop:?? () from /usr/sbin/nscd
?? () from /usr/sbin/nscd
?? () from /usr/sbin/nscd
start_thread () from /lib/libpthread.so.0
clone () from /lib/libc.so.6

Revision history for this message
Apport retracing service (apport) wrote : Symbolic threaded stack trace
Revision history for this message
Daniel J Blueman (danielblueman) wrote :
Download full text (18.0 KiB)

Having traced this a number of times, I have consistently seen this assertion fail:

nscd: mem.c:335: gc: Assertion `off_alloc <= db->head->first_free' failed.

This occurs during garbage collection, when the code is calculating slack space in it's tree layout:

      ref_t off_alloc = (byte * BITS + cnt) * BLOCK_ALIGN;
      assert (off_alloc <= db->head->first_free);

...it detects an overlap. This always fires after a new thread has been created:

# valgrind --trace-children=yes /usr/sbin/nscd -d
==7565== Memcheck, a memory error detector.
==7565== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==7565== Using LibVEX rev 1854, a library for dynamic binary translation.
==7565== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==7565== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
==7565== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==7565== For more details, rerun with: -v
==7565==
7565: handle_request: request received (Version = 2) from PID 7063
7565: GETFDGR
7565: provide access to FD 6, for group
7565: handle_request: request received (Version = 2) from PID 7586
7565: GETFDPW
7565: provide access to FD 4, for passwd
7565: handle_request: request received (Version = 2) from PID 7600
7565: GETFDPW
7565: provide access to FD 4, for passwd
7565: handle_request: request received (Version = 2) from PID 7603
7565: GETFDPW
7565: provide access to FD 4, for passwd
7565: handle_request: request received (Version = 2) from PID 7651
7565: GETFDPW
7565: provide access to FD 4, for passwd
7565: handle_request: request received (Version = 2) from PID 7651
7565: GETFDHST
7565: handle_request: request received (Version = 2) from PID 7651
7565: GETHOSTBYNAME (sony)
7565: Reloading "root" in password cache!
7565: Reloading "gdm" in password cache!
7565: Reloading "haldaemon" in password cache!
7565: Reloading "messagebus" in password cache!
7565: Reloading "polkituser" in password cache!
7565: Reloading "daniel" in password cache!
7565: Reloading "sshd" in password cache!
7565: Reloading "ntp" in password cache!
7565: Reloading "nobody" in password cache!
7565: Reloading "7" in password cache!
7565: Reloading "1" in password cache!
7565: Reloading "110" in password cache!
7565: Reloading "115" in password cache!
7565: Reloading "root" in password cache!
7565: Reloading "gdm" in password cache!
7565: Reloading "haldaemon" in password cache!
7565: Reloading "messagebus" in password cache!
7565: Reloading "polkituser" in password cache!
7565: Reloading "daniel" in password cache!
7565: Reloading "sshd" in password cache!
7565: Reloading "ntp" in password cache!
7565: Reloading "nobody" in password cache!
7565: Reloading "7" in password cache!
7565: Reloading "1" in password cache!
7565: Reloading "110" in password cache!
7565: Reloading "115" in password cache!
7565: Reloading "root" in password cache!
7565: Reloading "gdm" in password cache!
7565: Reloading "haldaemon" in password cache!
7565: Reloading "messagebus" in password cache!
7565: Reloading "polkituser" in password cache!
7565: Reloading "daniel" in password cache!
7565: Reloading "sshd" in password cache!
7565: Reload...

Changed in glibc:
status: New → Confirmed
Revision history for this message
Daniel J Blueman (danielblueman) wrote :

This patch in Fedora 9 looks like the fix, by Ulrich:

2008-06-11 Ulrich Drepper <email address hidden>
 * nscd/mem.c (gc): Initialize obstack earlier so that if we jump
 out we don't use uninitialized memory.

http://cvs.fedoraproject.org/viewvc/rpms/glibc/F-9/glibc-nscd.patch?view=co

I'll try and isolate the specific fix, which will make an SRU easier.

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

Various fixes have been made to the garbage collector in mem.c:

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=history;f=nscd/mem.c;h=e821729dab3575c698129243951b91d3a1e55d18;hb=HEAD

This patch includes the various fixes which the current Intrepid nscd's garbage collector doesn't have:

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blobdiff;f=nscd/mem.c;h=e821729dab3575c698129243951b91d3a1e55d18;hp=21f2ae821dcb62d248e0ab9958779ab49564e78c;hb=HEAD;hpb=563574c13dfd9bb1069761d7ca5ccf65f1dae6c9

Thus, this is what needs patching in. Is anyone able to help?

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

I backported Ulrich Drepper's (glibc maintainer) upstream nscd fixes, and have been testing them for a week.

I've attached my debdiff with this, confirming problem resolution on i686 and amd64 on Ubuntu 9.10.

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

debdiff fix attached

Changed in glibc:
status: Confirmed → Fix Committed
Revision history for this message
Daniel J Blueman (danielblueman) wrote :

I have verified this bug is fixed in jaunty in glibc-2.9; I marked this bug-report as 'fix-committed', since I've attached the tested fix - perhaps this isn't the right state?

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Daniel has confirmed via e-mail that this bug is fixed in Jaunty, so setting to Fix Released.

Changed in glibc:
status: Fix Committed → Fix Released
Revision history for this message
Daniel J Blueman (danielblueman) wrote :

SRU justification:

impact:
 1. in use, nscd can frequently crash, thus denying caching, leading to higher network load and latency on lookups
  -> therefore high impact in a NFS and/or multi-user environment
 2. user may experience notification that nscd has crashed, compromising desktop experience

resolution:
 -> backported Ulrich Drepper's (glibc maintainer) upstream fixes for nscd corruption and crashing

patch:
 -> minimal patch attached

testcase:
 1. enable nscd in multi-user and/or NFS environment where multiple passwd and group lookups will occur
 2. generate lookups
 3. nscd may hit assertion after 2x garbage-collection intervals (~60s), else continue lookups

potential regression:
 -> none identifiable

Revision history for this message
Matthias Klose (doko) wrote :

uploaded, waiting for approval by SRU. renamed the patch to any/cvs-...
please recheck when the package is built in the archive.

Changed in glibc:
status: New → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Waiting for verification of bug 305901 before processing this, to avoid stacking too many changes on top of each other.

Revision history for this message
Martin Pitt (pitti) wrote :

Accepted glibc into intrepid-proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Mickaël Carlier (mickael-carlier) wrote :

Hi
no more crash since update!!
Good job!

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

I've been testing nscd on x86-64 for a few days on two separate systems, and the fix looks good. I need more exposure on i686 before marking this verified, which will be done in the next few days.

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

I've been unable to trigger any crashes or regressions with this updated nscd in the last 8 days in multiple environments, so am happy it is in shape for general release. Tags updates.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.8~20080505-0ubuntu9

---------------
glibc (2.8~20080505-0ubuntu9) intrepid-proposed; urgency=low

  [Daniel J Blueman]
  * Add debian/patches/any/cvs-nscd-crash-fix.diff: address nscd
    daemon crashing in mem.c (LP: #256157).

 -- Matthias Klose <email address hidden> Mon, 19 Jan 2009 09:38:23 +0100

Changed in glibc:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.