pam_unix hanging but nsswitch.conf says files

This is a discussion on pam_unix hanging but nsswitch.conf says files within the unix-admin forums in Operating Systems category; Sorry for the multi-post, I ended up deciding I should have cross-posted. Folks, I have an Oracle EBS system that has hung a couple of days in a row. At first I saw references in /var/log/messages that were new and followed them to power management so I planned on working in the BIOS the next time it happened. The problem is now it's starting to hang again and the only entries near all 3 hangs are about pam_unix. So I started digging on new sessions authenticating. Sure enough - New cron jobs stop happening severa hours ...

Go Back   Database Forum > Operating Systems > unix-admin

Database Forums

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 07-23-2008, 04:56 PM
Default pam_unix hanging but nsswitch.conf says files

Sorry for the multi-post, I ended up deciding I should have
cross-posted.

Folks,

I have an Oracle EBS system that has hung a couple of days
in a row. At first I saw references in /var/log/messages that
were new and followed them to power management so I
planned on working in the BIOS the next time it happened.

The problem is now it's starting to hang again and the only
entries near all 3 hangs are about pam_unix. So I started
digging on new sessions authenticating. Sure enough -
New cron jobs stop happening severa hours before anyone
notices a problem. Existing oracle sessions keep running
SQL but new ones hang at the creation point. New ssh
sessions hang just after asking for my password but
existing ones work - I have a few root shells running on the
console so I can do any sort of debugging I want.

The load average is consistant with a guess of the number
of hung sessions. Load around 12-15 but CPU utilitization
no more than 10%. Neither netstat nor lsof suggest there
are hung network sessions.

The very puzzling part is /etc/nsswitch.conf has

passwd: files

as its only option. No NIS, LDAP or any other sort of
network authentication. No sign of disk problems this
hang or the previous though there was a 30 second burp
on the FCAL line to the array an hour before the first hang
was noticed.

The only processes that used network authentication
were smd and nmbd so I did chkconfig smdb off and
service smbd stop to remove that possibility.

At this point I figure I'll need to reboot the box by the end
of the day and I really want a plan of action by then to
keep it from having the same problem again tomorrow.

Has anyone seen pam cause a hang with passwd: files?

Thanks in advance!
Reply With Quote
  #2  
Old 08-01-2008, 02:04 PM
Default Re: pam_unix hanging but nsswitch.conf says files

Doug Freyburger wrote:
> Sorry for the multi-post, I ended up deciding I should have
> cross-posted.
>
> Folks,
>
> I have an Oracle EBS system that has hung a couple of days
> in a row. At first I saw references in /var/log/messages that
> were new and followed them to power management so I
> planned on working in the BIOS the next time it happened.
>
> The problem is now it's starting to hang again and the only
> entries near all 3 hangs are about pam_unix. So I started
> digging on new sessions authenticating. Sure enough -
> New cron jobs stop happening severa hours before anyone
> notices a problem. Existing oracle sessions keep running
> SQL but new ones hang at the creation point. New ssh
> sessions hang just after asking for my password but
> existing ones work - I have a few root shells running on the
> console so I can do any sort of debugging I want.
>
> The load average is consistant with a guess of the number
> of hung sessions. Load around 12-15 but CPU utilitization
> no more than 10%. Neither netstat nor lsof suggest there
> are hung network sessions.
>
> The very puzzling part is /etc/nsswitch.conf has
>
> passwd: files
>
> as its only option. No NIS, LDAP or any other sort of
> network authentication. No sign of disk problems this
> hang or the previous though there was a 30 second burp
> on the FCAL line to the array an hour before the first hang
> was noticed.
>
> The only processes that used network authentication
> were smd and nmbd so I did chkconfig smdb off and
> service smbd stop to remove that possibility.
>
> At this point I figure I'll need to reboot the box by the end
> of the day and I really want a plan of action by then to
> keep it from having the same problem again tomorrow.
>
> Has anyone seen pam cause a hang with passwd: files?
>
> Thanks in advance!


It's time for installing some OS-patches.
A work-around is maybe to disable nscd/pwgrd (caching),
or disable some line in /etc/pam.conf

--
echo imhcea\.lophc.tcs.hmo |
sed 's2\(....\)\(.\{5\}\)2\2\122;s1\(.\)\(.\)1\2\11g;1 s;\.;::;2'
Reply With Quote
  #3  
Old 08-03-2008, 07:53 AM
Default Re: pam_unix hanging but nsswitch.conf says files

On Aug 1, 7:04*pm, Michael Tosch
wrote:
> Doug Freyburger wrote:
> > Sorry for the multi-post, I ended up deciding I should have
> > cross-posted.

>
> > Folks,

>
> > I have an Oracle EBS system that has hung a couple of days
> > in a row. *At first I saw references in /var/log/messages that
> > were new and followed them to power management so I
> > planned on working in the BIOS the next time it happened.

>
> > The problem is now it's starting to hang again and the only
> > entries near all 3 hangs are about pam_unix. *So I started
> > digging on new sessions authenticating. *Sure enough -
> > New cron jobs stop happening severa hours before anyone
> > notices a problem. *Existing oracle sessions keep running
> > SQL but new ones hang at the creation point. *New ssh
> > sessions hang just after asking for my password but
> > existing ones work - I have a few root shells running on the
> > console so I can do any sort of debugging I want.

>
> > The load average is consistant with a guess of the number
> > of hung sessions. *Load around 12-15 but CPU utilitization
> > no more than 10%. *Neither netstat nor lsof suggest there
> > are hung network sessions.

>
> > The very puzzling part is /etc/nsswitch.conf has

>
> > passwd: files

>
> > as its only option. *No NIS, LDAP or any other sort of
> > network authentication. *No sign of disk problems this
> > hang or the previous though there was a 30 second burp
> > on the FCAL line to the array an hour before the first hang
> > was noticed.

>
> > The only processes that used network authentication
> > were smd and nmbd so I did chkconfig smdb off and
> > service smbd stop to remove that possibility.

>
> > At this point I figure I'll need to reboot the box by the end
> > of the day and I really want a plan of action by then to
> > keep it from having the same problem again tomorrow.

>
> > Has anyone seen pam cause a hang with passwd: files?

>
> > Thanks in advance!

>
> It's time for installing some OS-patches.
> A work-around is maybe to disable nscd/pwgrd (caching),
> or disable some line in /etc/pam.conf
>
> --
> echo imhcea\.lophc.tcs.hmo |
> sed 's2\(....\)\(.\{5\}\)2\2\122;s1\(.\)\(.\)1\2\11g;1 s;\.;::;2'


Review the limits in number of process(threads) or max number of open
files with:
ulimit -a
Maybe there is a problem with the resources, i.e. client sessions to
Oracle.
Try to count the # of process with:
ps -ef|wc -l

And also review with top if there are high consumption of processor of
Waits i/o calls.
Reply With Quote
  #4  
Old 08-03-2008, 05:13 PM
Default Re: pam_unix hanging but nsswitch.conf says files

Michael Tosch wrote:
> Doug Freyburger wrote:
>
> > Has anyone seen pam cause a hang with passwd: files?

>
> It's time for installing some OS-patches.
> A work-around is maybe to disable nscd/pwgrd (caching),
> or disable some line in /etc/pam.conf


Thank you everyone that sent suggestions on group or by
e-mail!

Here's the bug that apparently caused the problem -

Standard behavior of "audit" is to stop logging when its logging
directory hits 80% full. On Red Hat Enterprise Linus 3 there's
a bug that it stops logins rather than logging. Turn off the
"audit" subsystem, stop tickling the bug.

On this particular system /var isn't separated from /. It's one
of my first health check issues on any production system tp
separate /var and /tmp to isolate them from disk full problems
causing hang but I'd only been taking care of this particular
client for negative one week when this happened. They
rushed to authorize the hours starting a week early.

So now I need to -

1) Confirm that's the problem by waiting a week without
seeing the problem come back.

2) Trim logs to get / well under 80%. Install canned log
trimming scripts that use Red Hat utilities as well as
find.

3) Get a new LUN on the SAN ready to go as /mnt and
migrate to it as /var at the next maintenance reboot in a
couple of weeks.

4) Use up2date to get all kernel and non-kernel modules
up to date at the next maintenance reboot in a couple of
weeks.

5) Then and only then turn audit back on. Note to self
to read any white papers on the UNIX auditting systems.
I started in engineering support and much of my
production experience has been on small drive systems
or high load systems that needed to turn audit off so I
do not know it well enough.

6) Do what I consider good health check audits on the
hosts at this client and start working on recommandations.
Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT -4. The time now is 08:50 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Integrated by bbpixel2008 :: jvbPlugin R1013.368.1

Search Engine Friendly URLs by vBSEO 3.1.0
vB Ad Management by =RedTyger=
In an effort to better serve ads to our visitors, cookies are used on Mydatabasesupport.com. For more information, check out our Privacy Policy.