[PUP10218] PUPPET INCORRECTLY DETECTING STALE PIDFILE CREATED 20200107 UPDATED

[PUP10218] PUPPET INCORRECTLY DETECTING STALE PIDFILE CREATED 20200107 UPDATED






[#PUP-10218] Puppet incorrectly detecting stale pidfile

[PUP-10218] Puppet incorrectly detecting stale pidfile Created: 2020/01/07  Updated: 2021/05/19  Resolved: 2020/02/13

Status:

Resolved

Project:

Puppet

Component/s:

None

Affects Version/s:

PUP 6.11.1

Fix Version/s:

PUP 6.13.0


Type:

Bug

Priority:

Normal

Reporter:

Marcin Deranek

Assignee:

Luchian Nemes

Resolution:

Fixed

Votes:

0

Labels:

resolved-issue-added

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified


Attachments:

[PUP10218] PUPPET INCORRECTLY DETECTING STALE PIDFILE CREATED 20200107 UPDATED pidlock.patch    

Template:

PUP Bug Template

Agent OS:

CentOS 7

Master OS:

CentOS 7

Team:

Night's Watch

Story Points:

2

Sprint:

NW - 2020-02-05, NW - 2020-02-19

Method Found:

Needs Assessment

Release Notes:

Bug Fix

Release Notes Summary:

Fixed pidfile lock removal for when Puppet Agent is started as a LightWeight Process and is incorrectly terminated on POSIX operating systems.

QA Risk Assessment:

Needs Assessment


 Description 

 

Puppet Version: 6.11.1
Puppet Server Version: 6.11.1
OS Name/Version: CentOS 7

When Puppet agent is incorrectly terminated (eg. killed by KILL signal) it might have a problem in detecting stale PID file. The code in question is this:

puppet/lib/ruby/vendor_ruby/puppet/util/pidlock.rb

def clear_if_stale
    begin
      Process.kill(0, lock_pid)
    rescue *errors
      return @lockfile.unlock
    end
    if Puppet.features.posix?
      procname = Puppet::Util::Execution.execute(["ps", "-p", lock_pid, "-o", "comm="]).strip
      args     = Puppet::Util::Execution.execute(["ps", "-p", lock_pid, "-o", "args="]).strip
      @lockfile.unlock unless procname =~ /ruby/ && args =~ /puppet/ || procname =~ /puppet(-.*)?$/
    elsif Puppet.features.microsoft_windows?
      # On Windows, we're checking if the filesystem path name of the running
      # process is our vendored ruby:
      exe_path = Puppet::Util::Windows::Process::get_process_image_name_by_pid(lock_pid)
      @lockfile.unlock unless exe_path =~ /\\bin\\ruby.exe$/
    end

Process.kill(0, pid) tries to find out if process with certain pid exists. The problem is that Process.kill checks regular processes as well as LightWeight Processes (LWP), so it will verify if certain process or lightweight process currently exists. If it exists it will try to find the name of the command for it. Unfortunately ps -p command only cares about processes and not LightWeight Processes, so if stale file contains PID of LWP Puppet will never be able to recover (unless we remove stale lock file) as LWP usually are spawned by long running daemons. Please find an attached patch (tested on CentOS7) which addresses the above issue: it makes sure ps command also considers LWPs otherwise you might run into error shown below.

Desired Behavior:

Puppet agent starts up and runs correctly.

Actual Behavior:

# puppet agent --test
Error: Could not run Puppet configuration client: Execution of 'ps -p 2181 -o comm=' returned 1: 




 Comments 

 

Comment by Josh Cooper [ 2020/01/07 ]

Thanks for the patch Marcin Deranek! Couple of notes. The code will need to account for posix platforms that don't support -q, e.g. mac osx reports that q is an illegal option. If you want to make a contribution to puppet, could you submit a pull request? If not, that's fine too, we can make the fix ourselves.

Comment by Ciprian Badescu [ 2020/02/06 ]

Marcin Deranek, can you provide us the steps to reproduce the issue? How did you start puppet process as LWP?

Comment by nobody [ 2021/05/18 ]

The same trouble with `puppet-agent-5.5.22-1.el7.x86_64` , can someone reopen the issue?

Comment by Ciprian Badescu [ 2021/05/19 ]

Puppet 5 is EOL.

Can you try with Puppet 6 or manually apply the patch for the fix: https://github.com/puppetlabs/puppet/pull/7958/files#diff-566af3cdae401e0290417d4d75a10bb7b2d2fb60d801ffd9923fc6878eb0c0bf?

Comment by nobody [ 2021/05/19 ]

Sad to hear that [PUP10218] PUPPET INCORRECTLY DETECTING STALE PIDFILE CREATED 20200107 UPDATED  Ok, understood, thx 4 ur time.

Generated at Sat Dec 04 03:38:15 PST 2021 using Jira 8.13.2#813002-sha1:c495a97c0445fc6ed348f0d5238c2bc2e2c2ef37.





Tags: pidfile, incorrectly, stale, 20200107, [pup10218], created, puppet, updated, detecting