| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
|
| I reported a core file from our product but it turns out there are 2 core dumps. The second (which I reported) is overwriting the /core file. Rather than rename an existing core file before writing another one, or simply using a unique name when creating the core file, HP-UX instead just overwrites an existing "core" named file with another "core" named file. The real problem was caused and identified by the first core file that got stepped on by the second core file. There are several daemons running for the product which could produce core files. If more than one of them cores, they should get different filenames in which to save the dump. On Solaris, the core file gets named "core. After each test, we check for core files in several places and save them elsewhere. Then the core files are deleted so any new ones found after the next test are known to have been generated during that test. We wrote a script that scans for core files and will append a timestamp string onto the file name. So, for example, if "/core" is found then it gets renamed to "/core., the script doesn't consume the CPU). It seems to work when I tested it but it is possible that another core file gets reproduced within the 2-second window (or a 1-second window if we had the script wait only that long). It's not yet a startup script so we just run "nohup session. Seems there should be a more elegant scheme of saving core files than having them all use the same name and overwrite the last one. Is there a better way? |
|
#2
|
| "Vanguard" > I reported a core file from our product but it turns out there are 2 > core dumps. The second (which I reported) is overwriting the /core > file. This is how all UNIX machines used to work. If you don't want that, create a unique directory for each daemon and chdir() into it. > On Solaris, the core file gets named "core. Does not. On newer solaris machines, core filename can be configured (as you described above, or in many other ways; "man coreadm"). > Seems there should be a more elegant scheme of saving core files than > having them all use the same name and overwrite the last one. Is there > a better way? See above for one possible (and portable) solution. Another solution (writing core into core.pid) can be found in this thread: http://forums1.itrc.hp.com/service/f...hreadId=898593 I'll repeat it here (in case the above page goes away): By A. Clay Stephenson Jun 6, 2005 17:36:25 GMT The only option for HP-UX is to optionally append the PID to "core" but the file is always written to the current working directory. echo " core_addpid/W1" | adb -w /stand/vmunix /dev/kmem This will ONLY change the memory image of the running kernel and leave the object file, /stand/vmunix, untouched. It's a safe command and this is a common practice for changing kernel values that otherwise can't be modified. You could also force the write to the object file but I prefer to simply change the image in /dev/kmem and if you want this to be a "permanent" change rather than writing the object file, I prefer to setup a startup script in /sbin/init.d. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email. |
|
#3
|
| "Paul Pluzhnikov" news:m3slh369g5.fsf-at-somewhere.in.california.localh ost... > "Vanguard" > >> I reported a core file from our product but it turns out there are >> 2 >> core dumps. The second (which I reported) is overwriting the /core >> file. > > This is how all UNIX machines used to work. > > If you don't want that, create a unique directory for each daemon > and chdir() into it. Unfortunately for QA, I have to test the product as it is packaged. Also, many of the scripts have relative paths so it wouldn't work for me to rearrange the hierarchy of the directories. I'll have to look at the start script used to load all the daemons. Even if I changed it to cd to the subdirectories before loading each daemon, several reside in the same directory. They'd still step on each other's core file. Just to be sure, is the core file put in the current working directory at the time it gets produced? Or does it get saved in whatever was the current directory at the time the daemon was loaded? If it is the current directory at the time the daemon got loaded then maybe I can modify the startup script for the daemons to make and change to a subdirectory under daemon. Then the core file would get put into the log subdirectory by that daemon's name. Of course, this won't help when the product or test scripts force a reload of the failed daemon for the next test in the long testlist and there is another core dump by the same daemon which would end up overwriting the previous one produced earlier by the same daemon. The separate log subdirectory for each daemon would eliminate other deamons from stepping atop core files for different daemons but not for it stepping atop its own core file. The enterprise product will attempt to recover (i.e., daemons watch each other) and will restart a failed and required daemon but if it fails again then the previous core file gets stepped on. >> On Solaris, the core file gets named "core. > > Does not. On newer solaris machines, core filename can be configured > (as you described above, or in many other ways; "man coreadm"). On the 3 Solaris boxes that I get to use, after a test has completed and if a core file was produced, the filename is as I mentioned. I suppose it is possible that someone defined a rename script like we are now trying on HP-UX but I doubt it since it is not described in the procedures when setting up the host after reimaging it (i.e., it such a script is being used, it won't be after we reimage the host). I've looked in the Perl scripts that we used for the automated testing and haven't found that they do the rename. From what the developer said, and also from what I've Googled, there is no coreadm on HP-UX. I did see mention of savecrash and some config file (forgot its path but didn't see anything in the config file that would dictate the filenaming scheme for core files). I did try scanning the man page for coreadm just for background but got interrupted, so I didn't see how a scheme is specified for the filenaming of core files. I'll have to check tomorrow if I get time. |
|
#4
|
| "Vanguard" >> If you don't want that, create a unique directory for each daemon >> and chdir() into it. > > Unfortunately for QA, I have to test the product as it is > packaged. Also, many of the scripts have relative paths so it wouldn't What I meant is: have each daemon create a directory for itself: char buf[1024]; sprintf(buf, "/tmp/%s.%d", DAEMON_NAME, getpid()); if (-1 == mkdir(buf, 0700)) { ... error handling ... } if (-1 == chdir(buf)) { ... error handling ... } Do this after the daemon has read all of its config files, but before it starts "servicing requests". > Just to be sure, is the core file put in the current working directory > at the time it gets produced? Yes. >>> On Solaris, the core file gets named "core. >> >> Does not. On newer solaris machines, core filename can be configured >> (as you described above, or in many other ways; "man coreadm"). > > On the 3 Solaris boxes that I get to use, after a test has completed > and if a core file was produced, the filename is as I mentioned. Because *default* Solaris coreadm.conf does that. But if you count on that, your "enterprize" system will spectacularly fail to find cores on a machine that was reconfigured to save core files elsewhere. > From what the developer said, and also from what I've Googled, there > is no coreadm on HP-UX. I did see mention of savecrash and some > config file savecrash is *not* what you are interested in. It is used to save crash data when the system itself panics. It has nothing to do with user-level core files. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email. |
|
#5
|
| Paul Pluzhnikov wrote: > "Vanguard" > > >> If you don't want that, create a unique directory for each daemon > >> and chdir() into it. > > > > Unfortunately for QA, I have to test the product as it is > > packaged. Also, many of the scripts have relative paths so it wouldn't > > What I meant is: have each daemon create a directory for itself: > > char buf[1024]; > sprintf(buf, "/tmp/%s.%d", DAEMON_NAME, getpid()); > if (-1 == mkdir(buf, 0700)) { ... error handling ... } > if (-1 == chdir(buf)) { ... error handling ... } > > Do this after the daemon has read all of its config files, but > before it starts "servicing requests". I'll pass this on to the developers about adding code in the daemon so it manages where its core file gets saved and its name. They already have an daemon. Thanks. |
![]() |
| Thread Tools | |
| Display Modes | |