fails to start in unprivileged container

Bug #1918735 reported by Dan Streetman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
corosync (Debian)
Fix Released
Unknown
corosync (Ubuntu)
Fix Released
Medium
Dan Streetman
Focal
Fix Released
Medium
Dan Streetman
Groovy
Fix Released
Medium
Dan Streetman
Hirsute
Fix Released
Medium
Dan Streetman

Bug Description

[impact]

fails to start in container

[test case]

install corosync in container and check status

root@corosync-f:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2021-03-11 21:28:52 UTC; 17s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 14552 (code=exited, status=15)

Mar 11 21:28:51 corosync-f corosync[14552]: [TOTEM ] Initializing transport (Kronosnet).
Mar 11 21:28:52 corosync-f corosync[14552]: [TOTEM ] knet_handle_new failed: File name too long (36)
Mar 11 21:28:52 corosync-f corosync[14552]: [KNET ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Mar 11 21:28:52 corosync-f corosync[14552]: [KNET ] transport: Unable to set local socketpair receive buffer: File name too long
Mar 11 21:28:52 corosync-f corosync[14552]: [KNET ] handle: Unable to initialize internal hostsockpair: File name too long
Mar 11 21:28:52 corosync-f corosync[14552]: [MAIN ] Can't initialize TOTEM layer
Mar 11 21:28:52 corosync-f corosync[14552]: [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1531.
Mar 11 21:28:52 corosync-f systemd[1]: corosync.service: Main process exited, code=exited, status=15/n/a
Mar 11 21:28:52 corosync-f systemd[1]: corosync.service: Failed with result 'exit-code'.
Mar 11 21:28:52 corosync-f systemd[1]: Failed to start Corosync Cluster Engine.

[regression potential]

any regression would likely result in failure to start, or reduced performance of corosync

[scope]

this is needed in f/g/h

corosync starts in a bionic container

opened upstream PR
https://github.com/corosync/corosync/pull/623

this is also related to bug 1911904 and bug 1828228

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I can confirm the issue comparing LXD to VM guest.
But given that the errors are about corosync being unable to set some device options - and not having an obvious "container mode" switch I wonder if that will end up needing upstream work.

Also FYI this seems to be a bit of a re-surface of bug 1828228
=> https://bugs.launchpad.net/auto-package-testing/+bug/1828228/comments/8

@Dan - are you intending to work-on/fix this with your SEG hat on or did you just spot and wanted to report it for us?

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Also FYI this seems to be a bit of a re-surface of bug 1828228

yes, and that was unfortunately 'fixed' by just ignoring the error :(

> are you intending to work-on/fix this

yep, already fixed bug 1911904 upstream and will include this with sru for that

Changed in corosync (Ubuntu Hirsute):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Groovy):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Hirsute):
importance: Undecided → Medium
Changed in corosync (Ubuntu Groovy):
importance: Undecided → Medium
Changed in corosync (Ubuntu Focal):
importance: Undecided → Medium
Changed in corosync (Ubuntu Hirsute):
status: New → In Progress
Changed in corosync (Ubuntu Focal):
status: New → In Progress
Changed in corosync (Ubuntu Groovy):
status: New → In Progress
Dan Streetman (ddstreet)
description: updated
Dan Streetman (ddstreet)
description: updated
description: updated
Changed in corosync (Debian):
status: Unknown → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 3.1.0-2ubuntu2

---------------
corosync (3.1.0-2ubuntu2) hirsute; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)
  * d/p/lp1918735-try-unprivileged-knet-handle-new.patch:
    - Retry knet_handle_new without privileged flag (LP: #1918735)
  * d/t: don't skip tests now that we fixed crashing in container

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 12:55:26 -0500

Changed in corosync (Ubuntu Hirsute):
status: In Progress → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Dan, or anyone else affected,

Accepted corosync into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/3.0.3-2ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in corosync (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-groovy
Changed in corosync (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Dan, or anyone else affected,

Accepted corosync into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/3.0.3-2ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed-focal
Revision history for this message
Dan Streetman (ddstreet) wrote :
Download full text (3.4 KiB)

root@lp1918735-g:~# dpkg -l|grep corosync
ii corosync 3.0.3-2ubuntu3 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu3 amd64 cluster engine common library
root@lp1918735-g:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-03-24 13:10:31 UTC; 17s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2886 (code=exited, status=15)

Mar 24 13:10:31 lp1918735-g corosync[2886]: [TOTEM ] Initializing transport (Kronosnet).
Mar 24 13:10:31 lp1918735-g corosync[2886]: [TOTEM ] knet_handle_new failed: File name too long (36)
Mar 24 13:10:31 lp1918735-g corosync[2886]: [KNET ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Mar 24 13:10:31 lp1918735-g corosync[2886]: [KNET ] transport: Unable to set local socketpair receive buffer: File name too long
Mar 24 13:10:31 lp1918735-g corosync[2886]: [KNET ] handle: Unable to initialize internal hostsockpair: File name too long
Mar 24 13:10:31 lp1918735-g corosync[2886]: [MAIN ] Can't initialize TOTEM layer
Mar 24 13:10:31 lp1918735-g corosync[2886]: [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1531.
Mar 24 13:10:31 lp1918735-g systemd[1]: corosync.service: Main process exited, code=exited, status=15/n/a
Mar 24 13:10:31 lp1918735-g systemd[1]: corosync.service: Failed with result 'exit-code'.
Mar 24 13:10:31 lp1918735-g systemd[1]: Failed to start Corosync Cluster Engine.

root@lp1918735-g:~# dpkg -l|grep corosync
ii corosync 3.0.3-2ubuntu3.1 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu3.1 amd64 cluster engine common library
root@lp1918735-g:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-03-24 13:12:16 UTC; 7s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 4880 (corosync)
      Tasks: 9 (limit: 76563)
     Memory: 84.2M
     CGroup: /system.slice/corosync.service
             └─4880 /usr/sbin/corosync -f

Mar 24 13:12:16 lp1918735-g corosync[4880]: [QUORUM] Members[0]:
Mar 24 13:12:16 lp1918735-g corosync[4880]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Mar 24 13:12:16 lp1918735-g corosync[4880]: [QB ] server name: votequorum
Mar 24 13:12:16 lp1918735-g corosync[4880]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Mar 24 13:12:16 lp1918735-g corosync[4880]: [QB ] server name: quorum
Mar 24 13:12:16 lp1918735-g corosync[4880]: [TOTEM ] A new membership (1.5) was formed. Members joined: 1
Mar 24 13:12:16 lp1918735-g corosync[4880]: [CPG ] downlist left_list: 0 received
Mar 24 13:12:16 lp1...

Read more...

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Dan Streetman (ddstreet) wrote :
Download full text (3.4 KiB)

root@lp1918735-f:~# dpkg -l | grep corosync
ii corosync 3.0.3-2ubuntu2 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu2 amd64 cluster engine common library
root@lp1918735-f:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-03-24 13:14:36 UTC; 12s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 15358 (code=exited, status=15)

Mar 24 13:14:36 lp1918735-f corosync[15358]: [TOTEM ] Initializing transport (Kronosnet).
Mar 24 13:14:36 lp1918735-f corosync[15358]: [TOTEM ] knet_handle_new failed: File name too long (36)
Mar 24 13:14:36 lp1918735-f corosync[15358]: [KNET ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Mar 24 13:14:36 lp1918735-f corosync[15358]: [KNET ] transport: Unable to set local socketpair receive buffer: File name too long
Mar 24 13:14:36 lp1918735-f corosync[15358]: [KNET ] handle: Unable to initialize internal hostsockpair: File name too long
Mar 24 13:14:36 lp1918735-f corosync[15358]: [MAIN ] Can't initialize TOTEM layer
Mar 24 13:14:36 lp1918735-f corosync[15358]: [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1531.
Mar 24 13:14:36 lp1918735-f systemd[1]: corosync.service: Main process exited, code=exited, status=15/n/a
Mar 24 13:14:36 lp1918735-f systemd[1]: corosync.service: Failed with result 'exit-code'.
Mar 24 13:14:36 lp1918735-f systemd[1]: Failed to start Corosync Cluster Engine.

root@lp1918735-f:~# dpkg -l | grep corosync
ii corosync 3.0.3-2ubuntu2.1 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 3.0.3-2ubuntu2.1 amd64 cluster engine common library
root@lp1918735-f:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-03-24 13:15:50 UTC; 6s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 16869 (corosync)
      Tasks: 9 (limit: 76563)
     Memory: 84.0M
     CGroup: /system.slice/corosync.service
             └─16869 /usr/sbin/corosync -f

Mar 24 13:15:50 lp1918735-f corosync[16869]: [QUORUM] Members[0]:
Mar 24 13:15:50 lp1918735-f corosync[16869]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Mar 24 13:15:50 lp1918735-f corosync[16869]: [QB ] server name: votequorum
Mar 24 13:15:50 lp1918735-f corosync[16869]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Mar 24 13:15:50 lp1918735-f corosync[16869]: [QB ] server name: quorum
Mar 24 13:15:50 lp1918735-f corosync[16869]: [TOTEM ] A new membership (1.5) was formed. Members joined: 1
Mar 24 13:15:50 lp1918735-f corosync[16869]: [CPG ] downlist left_list: 0...

Read more...

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 3.0.3-2ubuntu3.1

---------------
corosync (3.0.3-2ubuntu3.1) groovy; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)
  * d/p/lp1918735-try-unprivileged-knet-handle-new.patch:
    - Retry knet_handle_new without privileged flag (LP: #1918735)
  * d/t: don't skip tests now that we fixed crashing in container

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 12:58:00 -0500

Changed in corosync (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for corosync has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 3.0.3-2ubuntu2.1

---------------
corosync (3.0.3-2ubuntu2.1) focal; urgency=medium

  * d/p/lp1911904-Don-t-lock-all-current-and-future-memory-if-can-t-in.patch:
    - Don't mlockall() if setrlimit() fails (LP: #1911904)
  * d/p/lp1918735-try-unprivileged-knet-handle-new.patch:
    - Retry knet_handle_new without privileged flag (LP: #1918735)
  * d/t: don't skip tests now that we fixed crashing in container

 -- Dan Streetman <email address hidden> Wed, 10 Mar 2021 13:00:12 -0500

Changed in corosync (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in corosync (Debian):
status: New → Fix Released
Revision history for this message
Hybrid512 (walid-moghrabi) wrote :
Download full text (12.4 KiB)

Doesn't seem to be fixed in Focal (deploying Masakari Charm on Openstack with Juju 2.9.5) :

Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync notice [MAIN ] Corosync Cluster Engine 3.0.3 starting up
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync warning [MAIN ] Could not increase RLIMIT_MEMLOCK, not locking memory: Operation not permitted (1)
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync notice [TOTEM ] Initializing transport (Kronosnet).
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync warning [TOTEM ] knet_handle_new failed, trying unprivileged: File name too long (36)
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync info [TOTEM ] totemknet initialized
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Unable to set local socketpair receive buffer: File name too long
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] handle: Unable to initialize internal hostsockpair: File name too long
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Failed to set socket buffer via option 8 to value 8388608: capped at 425984
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Continuing regardless, as the handle is not privileged. Expect poor performance!
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Failed to set socket buffer via option 7 to value 8388608: capped at 425984
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Continuing regardless, as the handle is not privileged. Expect poor performance!
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Failed to set socket buffer via option 8 to value 8388608: capped at 425984
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Continuing regardless, as the handle is not privileged. Expect poor performance!
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Failed to set socket buffer via option 7 to value 8388608: capped at 425984
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Continuing regardless, as the handle is not privileged. Expect poor performance!
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Failed to set socket buffer via option 8 to value 8388608: capped at 425984
Jul 05 13:15:57 [15946] juju-97b917-3-lxd-3 corosync error [KNET ] transport: Continuing regardless, as the handle is not privileged. Expect poor performance!
Jul 05 13:15:57 [159...

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Doesn't seem to be fixed in Focal

> Continuing regardless, as the handle is not privileged. Expect poor performance!

that is fixed

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.