| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
|
| Hi guys, we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 node cluster on 2 sites. OpenVMS V7.3 site 1: MP1 sys$sysroot = DSA200:[SYS0.] MP2 sys$sysroot = DSA200:[SYS1.] site 2: OP1 sys$sysroot = DSA100:[SYS0.] OP2 sys$sysroot = DSA100:[SYS1.] QRM sys$sysroot = DSA300:[SYS0.] our application runs on node MP1 and uses the following Database database DSA618:[DB_DISK001.Database]DB.RDB but for failover scenario we need to do a RMU/OPEN DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing this ? RDB is started on nodes MP1 and OP1 but in normal operations the Database database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 thanks for your answers N.Manser SYSMAN> do rmu/show system %SYSMAN-I-OUTPUT, command execution on node QRM %DCL-W-IVVERB, unrecognized command verb - check validity and spelling \RMU\ %SYSMAN-I-OUTPUT, command execution on node OP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node OP1 Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 - AIJ Log Server is active - 2 active database users - database also open on these nodes: MP1 %SYSMAN-I-OUTPUT, command execution on node MP2 %DCL-W-ACTIMAGE, error activating image RDMPRV -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 -SYSTEM-F-PROTINSTALL, protected images must be installed %SYSMAN-I-OUTPUT, command execution on node MP1 Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" database DSA618:[DB_DISK001.Database]DB.RDB;1 - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) * database is opened by an operator - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 - global buffer count is 30000; 22250 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 156 active database users database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) - current after-image journal file is DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 |
|
#2
|
| I'd guess the RDB software startup has not been run as it installs those images. |
|
#3
|
| Nazim wrote: > > Hi guys, > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > node cluster on 2 sites. make sure that you've executed RMUSTART70 (or RMONSTART) on all thenodes. If you are using multi-version Rdb, then you'll probably need to execute SYS$SHARE:RDB$SETVER prior to using RMU. If this is the case, you could build a little DCL procedure to do: $ @SYS$SHARE:RDB$SETVER 70 $ RMU/SHOW SYSTEM and execute that procedure from SYSMAN. > OpenVMS V7.3 > > site 1: > > MP1 sys$sysroot = DSA200:[SYS0.] > MP2 sys$sysroot = DSA200:[SYS1.] > > site 2: > > OP1 sys$sysroot = DSA100:[SYS0.] > OP2 sys$sysroot = DSA100:[SYS1.] > QRM sys$sysroot = DSA300:[SYS0.] > > our application runs on node MP1 and uses the following Database > database DSA618:[DB_DISK001.Database]DB.RDB > > but for failover scenario we need to do a RMU/OPEN > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > this ? > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > thanks for your answers > > N.Manser > > SYSMAN> do rmu/show system > %SYSMAN-I-OUTPUT, command execution on node QRM > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > \RMU\ > %SYSMAN-I-OUTPUT, command execution on node OP2 > %DCL-W-ACTIMAGE, error activating image RDMPRV > -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > -SYSTEM-F-PROTINSTALL, protected images must be installed > %SYSMAN-I-OUTPUT, command execution on node OP1 > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > - current after-image journal file is > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > - AIJ Log Server is active > - 2 active database users > - database also open on these nodes: > MP1 > %SYSMAN-I-OUTPUT, command execution on node MP2 > %DCL-W-ACTIMAGE, error activating image RDMPRV > -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > -SYSTEM-F-PROTINSTALL, protected images must be installed > %SYSMAN-I-OUTPUT, command execution on node MP1 > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > database DSA618:[DB_DISK001.Database]DB.RDB;1 > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > * database is opened by an operator > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > - global buffer count is 30000; 22250 global buffers free > - maximum global buffer count per user is 100 > - global section resides in system space > - AIJ Log Server is active > - 156 active database users > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > - current after-image journal file is > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > - AIJ Log Server is active > - 2 active database users > - database also open on these nodes: > OP1 -- - - - - - opinions expressed here are mine and mine alone and certainly are not intended in any way to express or represent any opinions or commitment of oracle corporation. norman lastovica / oracle rdb engineering |
|
#4
|
| Norman Lastovica schrieb: > Nazim wrote: > > > > Hi guys, > > > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > > node cluster on 2 sites. > > make sure that you've executed RMUSTART70 (or RMONSTART) on all > thenodes. > > If you are using multi-version Rdb, then you'll probably need to execute > SYS$SHARE:RDB$SETVER prior to using RMU. If this is the case, you could > build a little DCL procedure to do: > > $ @SYS$SHARE:RDB$SETVER 70 > $ RMU/SHOW SYSTEM > > and execute that procedure from SYSMAN. > > > OpenVMS V7.3 > > > > site 1: > > > > MP1 sys$sysroot = DSA200:[SYS0.] > > MP2 sys$sysroot = DSA200:[SYS1.] > > > > site 2: > > > > OP1 sys$sysroot = DSA100:[SYS0.] > > OP2 sys$sysroot = DSA100:[SYS1.] > > QRM sys$sysroot = DSA300:[SYS0.] > > > > our application runs on node MP1 and uses the following Database > > database DSA618:[DB_DISK001.Database]DB.RDB > > > > but for failover scenario we need to do a RMU/OPEN > > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > > this ? > > > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > > > thanks for your answers > > > > N.Manser > > > > SYSMAN> do rmu/show system > > %SYSMAN-I-OUTPUT, command execution on node QRM > > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > > \RMU\ > > %SYSMAN-I-OUTPUT, command execution on node OP2 > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > %SYSMAN-I-OUTPUT, command execution on node OP1 > > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > > - current after-image journal file is > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > > - AIJ Log Server is active > > - 2 active database users > > - database also open on these nodes: > > MP1 > > %SYSMAN-I-OUTPUT, command execution on node MP2 > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > %SYSMAN-I-OUTPUT, command execution on node MP1 > > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > > database DSA618:[DB_DISK001.Database]DB.RDB;1 > > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > > * database is opened by an operator > > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > > - global buffer count is 30000; 22250 global buffers free > > - maximum global buffer count per user is 100 > > - global section resides in system space > > - AIJ Log Server is active > > - 156 active database users > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > > - current after-image journal file is > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > > - AIJ Log Server is active > > - 2 active database users > > - database also open on these nodes: > > OP1 > for our site RDB needs to run only on MP1 and OP1 nodes, so on both nodes the startup seq is ..... $ @SYS$STARTUP:RMONSTART ! RDB V7.0-6 $ @SYS$STARTUP:SQLSRV$STARTUP71 $ @SYS$STARTUP DAL$START_TR_MON.COM DISK$DTC_COMMON:[DDAL.DATABASE] -......... my question is, are there any issues when opening a database with RMU/OPEN/WAIT/ACCESS=unrestricted node MP1 is down. regards, Nazim Manser > -- > - - - - - > opinions expressed here are mine and mine alone > and certainly are not intended in any way to > express or represent any opinions or commitment > of oracle corporation. > > norman lastovica / oracle rdb engineering |
|
#5
|
| Hi Nazim, If this system runs anything other "Mom and Dad's corner Deli VAT return" then I suspect that you (or the company you support) are in in big trouble! Get yourself a professional DBA and pay them what they ask to do the job properly. The questions turning up here (and more so in the ITRC) about Rdb are truly frightening. I wish I could find out who these companies are and turn up to their next risk-assessment or shareholders meeting :-( Anyway no one can answer your question directly unless they know a bit more about MP and OP. I suggest "yes" but if you've never tried a failover before then what are the extra machines there for. The fact that you appear to be running Data Distributor raises an eyebrow, but my advice is to open the database on *all* nodes and use them *all* *all* of the time in possibly a wide-are cluster configuration. Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved Row-Ca$h, but don't let that bother you. Regards Richard Maher "Nazim" news:1162918356.022076.305610-at-b28g2000cwb.googlegr oups.com... > Hi guys, > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > node cluster on 2 sites. > OpenVMS V7.3 > > > site 1: > > MP1 sys$sysroot = DSA200:[SYS0.] > MP2 sys$sysroot = DSA200:[SYS1.] > > site 2: > > OP1 sys$sysroot = DSA100:[SYS0.] > OP2 sys$sysroot = DSA100:[SYS1.] > QRM sys$sysroot = DSA300:[SYS0.] > > our application runs on node MP1 and uses the following Database > database DSA618:[DB_DISK001.Database]DB.RDB > > but for failover scenario we need to do a RMU/OPEN > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > this ? > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > thanks for your answers > > N.Manser > > > > SYSMAN> do rmu/show system > %SYSMAN-I-OUTPUT, command execution on node QRM > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > \RMU\ > %SYSMAN-I-OUTPUT, command execution on node OP2 > %DCL-W-ACTIMAGE, error activating image RDMPRV > -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > -SYSTEM-F-PROTINSTALL, protected images must be installed > %SYSMAN-I-OUTPUT, command execution on node OP1 > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > - current after-image journal file is > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > - AIJ Log Server is active > - 2 active database users > - database also open on these nodes: > MP1 > %SYSMAN-I-OUTPUT, command execution on node MP2 > %DCL-W-ACTIMAGE, error activating image RDMPRV > -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > -SYSTEM-F-PROTINSTALL, protected images must be installed > %SYSMAN-I-OUTPUT, command execution on node MP1 > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > database DSA618:[DB_DISK001.Database]DB.RDB;1 > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > * database is opened by an operator > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > - global buffer count is 30000; 22250 global buffers free > - maximum global buffer count per user is 100 > - global section resides in system space > - AIJ Log Server is active > - 156 active database users > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > - current after-image journal file is > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > - AIJ Log Server is active > - 2 active database users > - database also open on these nodes: > OP1 > |
|
#6
|
| Richard Maher schrieb: > Hi Nazim, > > If this system runs anything other "Mom and Dad's corner Deli VAT return" > then I suspect that you (or the company you support) are in in big trouble! > Get yourself a professional DBA and pay them what they ask to do the job > properly. The questions turning up here (and more so in the ITRC) about Rdb > are truly frightening. I wish I could find out who these companies are and > turn up to their next risk-assessment or shareholders meeting :-( that is why i was assigned the task to ensure correct failover strategy. > > Anyway no one can answer your question directly unless they know a bit more > about MP and OP. I suggest "yes" but if you've never tried a failover before > then what are the extra machines there for. The fact that you appear to be > running Data Distributor raises an eyebrow, but my advice is to open the > database on *all* nodes and use them *all* *all* of the time in possibly a > wide-are cluster configuration. > MP1 and OP1 are on 2 sites but share the samefile system. to be precise the file layout of the RDB stuff is as follows: root file location : dsa618:[db_disk001.Database] RDA & SNP files: dsa618:[db_disk001.Database] dsa618:[db_disk002.Database] dsa618:[db_disk003.Database] AIJ files: dsa616:[db_diskA01.Database] dsa616:[db_diskA02.Database] RUJ files dsa617:[rdms$ruj] MP1>sh dev dsa618 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA618: Mounted 0 DMG_DB 32582436 7696 4 $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() MP1>sh dev dsa621 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA621: Mounted 0 DMG_DB2 12936924 5 4 $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() MP1>sh dev dsa616 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA616: Mounted 0 DMG_AIJ 8673228 100 4 $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() MP1>sh dev dsa617 Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA617: Mounted 0 DMG_RUJ 17359776 165 4 $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() usually a RMU/open on that Database is done only on MP1. i would like to know what happens, when in case of failover (MP1 crashes) i do a RMU/open on OP1 node. as it is a mission critical production Database, i want to be sure 100% before updating our documentation. so as you say, the RMU/open should be done on both MP1 and OP1 as soon as they reboot. correct ? i an new (2 months) and i inherited, the task to support the application and its underklying RDB. regards, Nazim Manser > Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved > Row-Ca$h, but don't let that bother you. > > Regards Richard Maher > > "Nazim" > news:1162918356.022076.305610-at-b28g2000cwb.googlegr oups.com... > > Hi guys, > > > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > > node cluster on 2 sites. > > OpenVMS V7.3 > > > > > > site 1: > > > > MP1 sys$sysroot = DSA200:[SYS0.] > > MP2 sys$sysroot = DSA200:[SYS1.] > > > > site 2: > > > > OP1 sys$sysroot = DSA100:[SYS0.] > > OP2 sys$sysroot = DSA100:[SYS1.] > > QRM sys$sysroot = DSA300:[SYS0.] > > > > our application runs on node MP1 and uses the following Database > > database DSA618:[DB_DISK001.Database]DB.RDB > > > > but for failover scenario we need to do a RMU/OPEN > > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > > this ? > > > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > > > thanks for your answers > > > > N.Manser > > > > > > > > SYSMAN> do rmu/show system > > %SYSMAN-I-OUTPUT, command execution on node QRM > > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > > \RMU\ > > %SYSMAN-I-OUTPUT, command execution on node OP2 > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > %SYSMAN-I-OUTPUT, command execution on node OP1 > > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > > - current after-image journal file is > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > > - AIJ Log Server is active > > - 2 active database users > > - database also open on these nodes: > > MP1 > > %SYSMAN-I-OUTPUT, command execution on node MP2 > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > %SYSMAN-I-OUTPUT, command execution on node MP1 > > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > > database DSA618:[DB_DISK001.Database]DB.RDB;1 > > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > > * database is opened by an operator > > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > > - global buffer count is 30000; 22250 global buffers free > > - maximum global buffer count per user is 100 > > - global section resides in system space > > - AIJ Log Server is active > > - 156 active database users > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > > - current after-image journal file is > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > > - AIJ Log Server is active > > - 2 active database users > > - database also open on these nodes: > > OP1 > > |
|
#7
|
| Hi Nazim, > that is why i was assigned the task to ensure correct failover > strategy. And you're a contractor right? (Or you boss is a contractor?) Let's hope the customers not reading this eh :-) I'd love to know how much the contract's for, but then it's Cologne and not Munich and it's none of my business. Anyway, is there not a UAT or other test environment that this can be tested in first? I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do they say number of cluster nodes is "1"? If they do then you'll have to make sure the databases are closed on MP1 before trying to open them on OP1. If not just open them up on both nodes and fire up the application on both nodes (if it's cluster tolerant) and get the application testing people involved. > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > as they reboot. correct ? No, I was suggesting that the beauty of VMS clusters and Rdb is that you don't have to "fail-over" because, personally, I would open the database and the application on all of the nodes all of the time. If MP1 goes down then there would be a pregnant-pause followed by MP1 users having to log in again, but that's it. The cluster took a lickin' but it kept on tickin'. With Rdb partitioned lock trees and all the work VMS engineering has been doing with the DLM *and* the new interconnect stuff coming along, I see no point in restricting a database to one node. (Never have :-) The fact that you're using Data Distributor (why?) leeds me to suspect that not all disks are accessible cluster wide or there's something dodgy with the application. Our DR used to be copying RBFs over to the mirror machine and restoring them and rolloing forward AIJs. Once every couple of years we'd be forced to run in DR for a week and then switch back with no loss of data. They were *never* able to get the Unix systems to achieve the same thing! (They'd just get someone to log on and that would be that. i.e. production never shifted) VMS guys were moving to a Disaster Tolerant set up when I left. My *guess* is everything will be ok except for DNS cache flushes and hard-coded SQL/Services server names. (But then, if I was getting paid to do it, I'd make sure :-) Regards Richard Maher "Nazim" news:1162981579.381769.299700-at-k70g2000cwa.googlegr oups.com... > > Richard Maher schrieb: > > > Hi Nazim, > > > > If this system runs anything other "Mom and Dad's corner Deli VAT return" > > then I suspect that you (or the company you support) are in in big trouble! > > Get yourself a professional DBA and pay them what they ask to do the job > > properly. The questions turning up here (and more so in the ITRC) about Rdb > > are truly frightening. I wish I could find out who these companies are and > > turn up to their next risk-assessment or shareholders meeting :-( > > that is why i was assigned the task to ensure correct failover > strategy. > > > > > Anyway no one can answer your question directly unless they know a bit more > > about MP and OP. I suggest "yes" but if you've never tried a failover before > > then what are the extra machines there for. The fact that you appear to be > > running Data Distributor raises an eyebrow, but my advice is to open the > > database on *all* nodes and use them *all* *all* of the time in possibly a > > wide-are cluster configuration. > > > > MP1 and OP1 are on 2 sites but share the samefile system. > to be precise > > the file layout of the RDB stuff is as follows: > > root file location : dsa618:[db_disk001.Database] > RDA & SNP files: dsa618:[db_disk001.Database] > dsa618:[db_disk002.Database] > dsa618:[db_disk003.Database] > AIJ files: dsa616:[db_diskA01.Database] > dsa616:[db_diskA02.Database] > RUJ files dsa617:[rdms$ruj] > > > MP1>sh dev dsa618 > > Device Device Error Volume Free > Trans Mnt > Name Status Count Label Blocks > Count Cnt > DSA618: Mounted 0 DMG_DB 32582436 > 7696 4 > $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() > $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() > MP1>sh dev dsa621 > > Device Device Error Volume Free > Trans Mnt > Name Status Count Label Blocks > Count Cnt > DSA621: Mounted 0 DMG_DB2 12936924 > 5 4 > $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() > $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() > MP1>sh dev dsa616 > > Device Device Error Volume Free > Trans Mnt > Name Status Count Label Blocks > Count Cnt > DSA616: Mounted 0 DMG_AIJ 8673228 > 100 4 > $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() > $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() > MP1>sh dev dsa617 > > Device Device Error Volume Free > Trans Mnt > Name Status Count Label Blocks > Count Cnt > DSA617: Mounted 0 DMG_RUJ 17359776 > 165 4 > $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() > $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() > > > usually a RMU/open on that Database is done only on MP1. > i would like to know what happens, when in case of failover (MP1 > crashes) i do a RMU/open on OP1 node. > > as it is a mission critical production Database, i want to be sure 100% > before updating our documentation. > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > as they reboot. correct ? > > i an new (2 months) and i inherited, the task to support the > application and its underklying RDB. > > regards, > > Nazim Manser > > > Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved > > Row-Ca$h, but don't let that bother you. > > > > Regards Richard Maher > > > > "Nazim" > > news:1162918356.022076.305610-at-b28g2000cwb.googlegr oups.com... > > > Hi guys, > > > > > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > > > node cluster on 2 sites. > > > OpenVMS V7.3 > > > > > > > > > site 1: > > > > > > MP1 sys$sysroot = DSA200:[SYS0.] > > > MP2 sys$sysroot = DSA200:[SYS1.] > > > > > > site 2: > > > > > > OP1 sys$sysroot = DSA100:[SYS0.] > > > OP2 sys$sysroot = DSA100:[SYS1.] > > > QRM sys$sysroot = DSA300:[SYS0.] > > > > > > our application runs on node MP1 and uses the following Database > > > database DSA618:[DB_DISK001.Database]DB.RDB > > > > > > but for failover scenario we need to do a RMU/OPEN > > > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > > > this ? > > > > > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > > > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > > > > > thanks for your answers > > > > > > N.Manser > > > > > > > > > > > > SYSMAN> do rmu/show system > > > %SYSMAN-I-OUTPUT, command execution on node QRM > > > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > > > \RMU\ > > > %SYSMAN-I-OUTPUT, command execution on node OP2 > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > -CLI-E-IMGNAME, image file DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > %SYSMAN-I-OUTPUT, command execution on node OP1 > > > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > > > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > > > - current after-image journal file is > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > > > - AIJ Log Server is active > > > - 2 active database users > > > - database also open on these nodes: > > > MP1 > > > %SYSMAN-I-OUTPUT, command execution on node MP2 > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > -CLI-E-IMGNAME, image file DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > %SYSMAN-I-OUTPUT, command execution on node MP1 > > > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > > > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > > > database DSA618:[DB_DISK001.Database]DB.RDB;1 > > > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > > > * database is opened by an operator > > > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > > > - global buffer count is 30000; 22250 global buffers free > > > - maximum global buffer count per user is 100 > > > - global section resides in system space > > > - AIJ Log Server is active > > > - 156 active database users > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > > > - current after-image journal file is > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > > > - AIJ Log Server is active > > > - 2 active database users > > > - database also open on these nodes: > > > OP1 > > > > |
|
#8
|
| Richard Maher schrieb: > Hi Nazim, > > > that is why i was assigned the task to ensure correct failover > > strategy. > > And you're a contractor right? (Or you boss is a contractor?) Let's hope the > customers not reading this eh :-) I'd love to know how much the contract's > for, but then it's Cologne and not Munich and it's none of my business. > it is neither cologne nor munich. yes i am contractor, my boss is permanent and only since 1 year, so he inherited the stuff as it is. My role is to implement the failover scenario of our app, including the underlying RDB. the RDB stuff was implemented long time ago, and the team left since and the handover was not done correctly to my boss. (since he was there all worked fine, last time the Database was opened is over a year ago. MP1>rmu/show system sql$database Oracle Rdb V7.0-61 on node EVAMP1 10-OCT-2006 11:22:47.97 - monitor started 6-NOV-2004 11:12:09.67 (uptime 703 00:10:38) - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;77" database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 - first opened 11-NOV-2005 14:18:37.78 (elapsed 332 21:04:10) * database is opened by an operator - current after-image journal file is TPZH_DAL_DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1559 - AIJ Log Server is active - 2 active database users - database also open on these nodes: OP1 this is our prod Database database DSA618:[DB_DISK001.Database]DMG_DB.RDB;1 - first opened 15-MAY-2005 07:08:43.83 (elapsed 513 04:14:04) * database is opened by an operator - current after-image journal file is DB_DISKA02:[AIJ]AIJ18.AIJ;1 - global buffer count is 30000; 20550 global buffers free - maximum global buffer count per user is 100 - global section resides in system space - AIJ Log Server is active - 190 active database users > Anyway, is there not a UAT or other test environment that this can be tested > in first? > unfortunately the UAT environment is on a standalone VMS machine > I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do they > say number of cluster nodes is "1"? If they do then you'll have to make sure > the databases are closed on MP1 before trying to open them on OP1. If not > just open them up on both nodes and fire up the application on both nodes > (if it's cluster tolerant) and get the application testing people involved. on the RMU/DUMP/HEADER there is no reference of cluster nodes, the only reference is in the case of the DDAL database is also open on OP1. it is the node numbers. MP1>search ddal_dump.txt node Maximum node count is 16 - WARNING: Maximum node count is 16 instead of 1 MP1>search dmg_dump.txt node Maximum node count is 1 ----> yes. but what if MP1 crashes ? is there any danger to open the database on the other node ? our application is designed to be run only on 1 node at a time, but the RDB can be opened also on OP1 as a standby solution. OK before doing this i must close Database on MP1, then open on MP1 and OP1. i have to implement the application failover scenario on the VMS side, and the testing activities can only be done in a very restricted window on the week end. i have first to implement the theoretical stuff, then schedule a test plan. > > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > > as they reboot. correct ? > > No, I was suggesting that the beauty of VMS clusters and Rdb is that you > don't have to "fail-over" because, personally, I would open the database and > the application on all of the nodes all of the time. If MP1 goes down then > there would be a pregnant-pause followed by MP1 users having to log in > again, but that's it. The cluster took a lickin' but it kept on tickin'. > With Rdb partitioned lock trees and all the work VMS engineering has been > doing with the DLM *and* the new interconnect stuff coming along, I see no > point in restricting a database to one node. (Never have :-) this was done by other team, and they did not document why they did that like this. > > The fact that you're using Data Distributor (why?) leeds me to suspect that > not all disks are accessible cluster wide or there's something dodgy with > the application. Our DR used to be copying RBFs over to the mirror machine > and restoring them and rolloing forward AIJs. Once every couple of years > we'd be forced to run in DR for a week and then switch back with no loss of > data. They were *never* able to get the Unix systems to achieve the same > thing! (They'd just get someone to log on and that would be that. i.e. > production never shifted) VMS guys were moving to a Disaster Tolerant set up > when I left. do you mean by data distributor the DDAL$TR_DB.RDB ? all the DSAn disks are accessible clustewide. > > My *guess* is everything will be ok except for DNS cache flushes and > hard-coded SQL/Services server names. (But then, if I was getting paid to do > it, I'd make sure :-) the application specific sqlservices are setup identically on MP1 and OP1 DNS cache switch needs also be checked with the downstram applications which connects to our RDB, but thats another story. regards, Nazim Manser > > Regards Richard Maher > > "Nazim" > news:1162981579.381769.299700-at-k70g2000cwa.googlegr oups.com... > > > > Richard Maher schrieb: > > > > > Hi Nazim, > > > > > > If this system runs anything other "Mom and Dad's corner Deli VAT > return" > > > then I suspect that you (or the company you support) are in in big > trouble! > > > Get yourself a professional DBA and pay them what they ask to do the job > > > properly. The questions turning up here (and more so in the ITRC) about > Rdb > > > are truly frightening. I wish I could find out who these companies are > and > > > turn up to their next risk-assessment or shareholders meeting :-( > > > > that is why i was assigned the task to ensure correct failover > > strategy. > > > > > > > > Anyway no one can answer your question directly unless they know a bit > more > > > about MP and OP. I suggest "yes" but if you've never tried a failover > before > > > then what are the extra machines there for. The fact that you appear to > be > > > running Data Distributor raises an eyebrow, but my advice is to open the > > > database on *all* nodes and use them *all* *all* of the time in possibly > a > > > wide-are cluster configuration. > > > > > > > MP1 and OP1 are on 2 sites but share the samefile system. > > to be precise > > > > the file layout of the RDB stuff is as follows: > > > > root file location : dsa618:[db_disk001.Database] > > RDA & SNP files: dsa618:[db_disk001.Database] > > dsa618:[db_disk002.Database] > > dsa618:[db_disk003.Database] > > AIJ files: dsa616:[db_diskA01.Database] > > dsa616:[db_diskA02.Database] > > RUJ files dsa617:[rdms$ruj] > > > > > > MP1>sh dev dsa618 > > > > Device Device Error Volume Free > > Trans Mnt > > Name Status Count Label Blocks > > Count Cnt > > DSA618: Mounted 0 DMG_DB 32582436 > > 7696 4 > > $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() > > $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() > > MP1>sh dev dsa621 > > > > Device Device Error Volume Free > > Trans Mnt > > Name Status Count Label Blocks > > Count Cnt > > DSA621: Mounted 0 DMG_DB2 12936924 > > 5 4 > > $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() > > $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() > > MP1>sh dev dsa616 > > > > Device Device Error Volume Free > > Trans Mnt > > Name Status Count Label Blocks > > Count Cnt > > DSA616: Mounted 0 DMG_AIJ 8673228 > > 100 4 > > $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() > > $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() > > MP1>sh dev dsa617 > > > > Device Device Error Volume Free > > Trans Mnt > > Name Status Count Label Blocks > > Count Cnt > > DSA617: Mounted 0 DMG_RUJ 17359776 > > 165 4 > > $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() > > $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() > > > > > > usually a RMU/open on that Database is done only on MP1. > > i would like to know what happens, when in case of failover (MP1 > > crashes) i do a RMU/open on OP1 node. > > > > as it is a mission critical production Database, i want to be sure 100% > > before updating our documentation. > > > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > > as they reboot. correct ? > > > > i an new (2 months) and i inherited, the task to support the > > application and its underklying RDB. > > > > regards, > > > > Nazim Manser > > > > > Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved > > > Row-Ca$h, but don't let that bother you. > > > > > > Regards Richard Maher > > > > > > "Nazim" > > > news:1162918356.022076.305610-at-b28g2000cwb.googlegr oups.com... > > > > Hi guys, > > > > > > > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > > > > node cluster on 2 sites. > > > > OpenVMS V7.3 > > > > > > > > > > > > site 1: > > > > > > > > MP1 sys$sysroot = DSA200:[SYS0.] > > > > MP2 sys$sysroot = DSA200:[SYS1.] > > > > > > > > site 2: > > > > > > > > OP1 sys$sysroot = DSA100:[SYS0.] > > > > OP2 sys$sysroot = DSA100:[SYS1.] > > > > QRM sys$sysroot = DSA300:[SYS0.] > > > > > > > > our application runs on node MP1 and uses the following Database > > > > database DSA618:[DB_DISK001.Database]DB.RDB > > > > > > > > but for failover scenario we need to do a RMU/OPEN > > > > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > > > > this ? > > > > > > > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > > > > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > > > > > > > thanks for your answers > > > > > > > > N.Manser > > > > > > > > > > > > > > > > SYSMAN> do rmu/show system > > > > %SYSMAN-I-OUTPUT, command execution on node QRM > > > > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > > > > \RMU\ > > > > %SYSMAN-I-OUTPUT, command execution on node OP2 > > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > > -CLI-E-IMGNAME, image file > DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > > %SYSMAN-I-OUTPUT, command execution on node OP1 > > > > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > > > > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > > > > - current after-image journal file is > > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > > > > - AIJ Log Server is active > > > > - 2 active database users > > > > - database also open on these nodes: > > > > MP1 > > > > %SYSMAN-I-OUTPUT, command execution on node MP2 > > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > > -CLI-E-IMGNAME, image file > DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > > %SYSMAN-I-OUTPUT, command execution on node MP1 > > > > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > > > > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > > > > database DSA618:[DB_DISK001.Database]DB.RDB;1 > > > > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > > > > * database is opened by an operator > > > > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > > > > - global buffer count is 30000; 22250 global buffers free > > > > - maximum global buffer count per user is 100 > > > > - global section resides in system space > > > > - AIJ Log Server is active > > > > - 156 active database users > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > > > > - current after-image journal file is > > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > > > > - AIJ Log Server is active > > > > - 2 active database users > > > > - database also open on these nodes: > > > > OP1 > > > > > > |
|
#9
|
| Hi Nazim, > it is neither cologne nor munich. Frankfurt? How's the Spring workload shaping up? :-) Anyway, what downtime window do you have? All I can suggest, on the information that you've given, is that you go in early one Sunday morning and shut down the application on MP1 followed by a close of all the databases. Then *before anything else* do a full off-line backup of all databases (probably followed by a complete rmu/verify if you haven't been doing them. On second thoughts, best not to ask too many questions eh :-) Then open the database(s) and applications up on OP1 and let the testers do their work. If the System Startups/UAFs/logicals/configs and specs are the same then I forsee no problems. Does the RDMS$RUJ logical point to the same place on all nodes? Anything in sys$specific? In summary Nazim, apart from the suck-it-and-see approach, I see no way forward. The one question I'd be sure to ask yourself before attempting the fail-over is "when was the last time that I've had to do a production restore in anger?". If the answer ends up "Buggered if I know!" then I suggest that you practice restoring the database to the test box, maybe rolling forward AIJs, enabling AIJs again. Are you running circular AIJs or single/extensible? ALS? You don't say you're running hot-standby but you are running DDAL; what transfers will stop when you switch over? Do you have a support contract? If so call Oracle Rdb support for help. If not, someone should bring this to the attention of the manager of the dickhead that made that decision! Probably the same dickhead that sacked all the real DBAs in the first place :-( You're on your own. Good-Luck. Regards Richard Maher $ pipe rmu/dump/head mf_personnel | sea sys$pipe node Maximum node count is 16 - WARNING: Maximum node count is 16 instead of 1 "Nazim" news:1162990172.492625.213000-at-f16g2000cwb.googlegr oups.com... > > Richard Maher schrieb: > > > Hi Nazim, > > > > > that is why i was assigned the task to ensure correct failover > > > strategy. > > > > And you're a contractor right? (Or you boss is a contractor?) Let's hope the > > customers not reading this eh :-) I'd love to know how much the contract's > > for, but then it's Cologne and not Munich and it's none of my business. > > > > it is neither cologne nor munich. > > yes i am contractor, my boss is permanent and only since 1 year, so he > inherited the stuff as it is. > My role is to implement the failover scenario of our app, including the > underlying RDB. > the RDB stuff was implemented long time ago, and the team left since > and the handover was not done correctly to my boss. (since he was there > all worked fine, last time the Database was opened is over a year ago. > > > MP1>rmu/show system sql$database > Oracle Rdb V7.0-61 on node EVAMP1 10-OCT-2006 11:22:47.97 > - monitor started 6-NOV-2004 11:12:09.67 (uptime 703 00:10:38) > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;77" > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > - first opened 11-NOV-2005 14:18:37.78 (elapsed 332 21:04:10) > * database is opened by an operator > - current after-image journal file is > TPZH_DAL_DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1559 > - AIJ Log Server is active > - 2 active database users > - database also open on these nodes: > OP1 > > this is our prod Database > > database DSA618:[DB_DISK001.Database]DMG_DB.RDB;1 > - first opened 15-MAY-2005 07:08:43.83 (elapsed 513 04:14:04) > * database is opened by an operator > - current after-image journal file is DB_DISKA02:[AIJ]AIJ18.AIJ;1 > - global buffer count is 30000; 20550 global buffers free > - maximum global buffer count per user is 100 > - global section resides in system space > - AIJ Log Server is active > - 190 active database users > > > > Anyway, is there not a UAT or other test environment that this can be tested > > in first? > > > > unfortunately the UAT environment is on a standalone VMS machine > > > > I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do they > > say number of cluster nodes is "1"? If they do then you'll have to make sure > > the databases are closed on MP1 before trying to open them on OP1. If not > > just open them up on both nodes and fire up the application on both nodes > > (if it's cluster tolerant) and get the application testing people involved. > > > on the RMU/DUMP/HEADER there is no reference of cluster nodes, the only > reference is > in the case of the DDAL database is also open on OP1. > > it is the node numbers. > > > MP1>search ddal_dump.txt node > Maximum node count is 16 > - WARNING: Maximum node count is 16 instead of 1 > MP1>search dmg_dump.txt node > Maximum node count is 1 ----> yes. > > > but what if MP1 crashes ? is there any danger to open the database on > the other node ? > > > our application is designed to be run only on 1 node at a time, but the > RDB can be opened also on OP1 as a standby solution. > OK before doing this i must close Database on MP1, then open on MP1 and OP1. > > i have to implement the application failover scenario on the VMS side, > and the testing activities can only be done in a very restricted window > on the week end. > > i have first to implement the theoretical stuff, then schedule a test > plan. > > > > > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > > > as they reboot. correct ? > > > > No, I was suggesting that the beauty of VMS clusters and Rdb is that you > > don't have to "fail-over" because, personally, I would open the database and > > the application on all of the nodes all of the time. If MP1 goes down then > > there would be a pregnant-pause followed by MP1 users having to log in > > again, but that's it. The cluster took a lickin' but it kept on tickin'. > > With Rdb partitioned lock trees and all the work VMS engineering has been > > doing with the DLM *and* the new interconnect stuff coming along, I see no > > point in restricting a database to one node. (Never have :-) > > this was done by other team, and they did not document why they did > that like this. > > > > > The fact that you're using Data Distributor (why?) leeds me to suspect that > > not all disks are accessible cluster wide or there's something dodgy with > > the application. Our DR used to be copying RBFs over to the mirror machine > > and restoring them and rolloing forward AIJs. Once every couple of years > > we'd be forced to run in DR for a week and then switch back with no loss of > > data. They were *never* able to get the Unix systems to achieve the same > > thing! (They'd just get someone to log on and that would be that. i.e. > > production never shifted) VMS guys were moving to a Disaster Tolerant set up > > when I left. > > do you mean by data distributor the DDAL$TR_DB.RDB ? > > all the DSAn disks are accessible clustewide. > > > > > My *guess* is everything will be ok except for DNS cache flushes and > > hard-coded SQL/Services server names. (But then, if I was getting paid to do > > it, I'd make sure :-) > > the application specific sqlservices are setup identically on MP1 and > OP1 > > DNS cache switch needs also be checked with the downstram applications > which connects to our RDB, but thats another story. > > > regards, > > Nazim Manser > > > > > Regards Richard Maher > > > > "Nazim" > > news:1162981579.381769.299700-at-k70g2000cwa.googlegr oups.com... > > > > > > Richard Maher schrieb: > > > > > > > Hi Nazim, > > > > > > > > If this system runs anything other "Mom and Dad's corner Deli VAT > > return" > > > > then I suspect that you (or the company you support) are in in big > > trouble! > > > > Get yourself a professional DBA and pay them what they ask to do the job > > > > properly. The questions turning up here (and more so in the ITRC) about > > Rdb > > > > are truly frightening. I wish I could find out who these companies are > > and > > > > turn up to their next risk-assessment or shareholders meeting :-( > > > > > > that is why i was assigned the task to ensure correct failover > > > strategy. > > > > > > > > > > > Anyway no one can answer your question directly unless they know a bit > > more > > > > about MP and OP. I suggest "yes" but if you've never tried a failover > > before > > > > then what are the extra machines there for. The fact that you appear to > > be > > > > running Data Distributor raises an eyebrow, but my advice is to open the > > > > database on *all* nodes and use them *all* *all* of the time in possibly > > a > > > > wide-are cluster configuration. > > > > > > > > > > MP1 and OP1 are on 2 sites but share the samefile system. > > > to be precise > > > > > > the file layout of the RDB stuff is as follows: > > > > > > root file location : dsa618:[db_disk001.Database] > > > RDA & SNP files: dsa618:[db_disk001.Database] > > > dsa618:[db_disk002.Database] > > > dsa618:[db_disk003.Database] > > > AIJ files: dsa616:[db_diskA01.Database] > > > dsa616:[db_diskA02.Database] > > > RUJ files dsa617:[rdms$ruj] > > > > > > > > > MP1>sh dev dsa618 > > > > > > Device Device Error Volume Free > > > Trans Mnt > > > Name Status Count Label Blocks > > > Count Cnt > > > DSA618: Mounted 0 DMG_DB 32582436 > > > 7696 4 > > > $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() > > > $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() > > > MP1>sh dev dsa621 > > > > > > Device Device Error Volume Free > > > Trans Mnt > > > Name Status Count Label Blocks > > > Count Cnt > > > DSA621: Mounted 0 DMG_DB2 12936924 > > > 5 4 > > > $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() > > > $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() > > > MP1>sh dev dsa616 > > > > > > Device Device Error Volume Free > > > Trans Mnt > > > Name Status Count Label Blocks > > > Count Cnt > > > DSA616: Mounted 0 DMG_AIJ 8673228 > > > 100 4 > > > $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() > > > $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() > > > MP1>sh dev dsa617 > > > > > > Device Device Error Volume Free > > > Trans Mnt > > > Name Status Count Label Blocks > > > Count Cnt > > > DSA617: Mounted 0 DMG_RUJ 17359776 > > > 165 4 > > > $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() > > > $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() > > > > > > > > > usually a RMU/open on that Database is done only on MP1. > > > i would like to know what happens, when in case of failover (MP1 > > > crashes) i do a RMU/open on OP1 node. > > > > > > as it is a mission critical production Database, i want to be sure 100% > > > before updating our documentation. > > > > > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > > > as they reboot. correct ? > > > > > > i an new (2 months) and i inherited, the task to support the > > > application and its underklying RDB. > > > > > > regards, > > > > > > Nazim Manser > > > > > > > Rdb engineering hates clusters 'cos Norm doesn't get to use his beloved > > > > Row-Ca$h, but don't let that bother you. > > > > > > > > Regards Richard Maher > > > > > > > > "Nazim" > > > > news:1162918356.022076.305610-at-b28g2000cwb.googlegr oups.com... > > > > > Hi guys, > > > > > > > > > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on a 5 > > > > > node cluster on 2 sites. > > > > > OpenVMS V7.3 > > > > > > > > > > > > > > > site 1: > > > > > > > > > > MP1 sys$sysroot = DSA200:[SYS0.] > > > > > MP2 sys$sysroot = DSA200:[SYS1.] > > > > > > > > > > site 2: > > > > > > > > > > OP1 sys$sysroot = DSA100:[SYS0.] > > > > > OP2 sys$sysroot = DSA100:[SYS1.] > > > > > QRM sys$sysroot = DSA300:[SYS0.] > > > > > > > > > > our application runs on node MP1 and uses the following Database > > > > > database DSA618:[DB_DISK001.Database]DB.RDB > > > > > > > > > > but for failover scenario we need to do a RMU/OPEN > > > > > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems doing > > > > > this ? > > > > > > > > > > RDB is started on nodes MP1 and OP1 but in normal operations the Database > > > > > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > > > > > > > > > thanks for your answers > > > > > > > > > > N.Manser > > > > > > > > > > > > > > > > > > > > SYSMAN> do rmu/show system > > > > > %SYSMAN-I-OUTPUT, command execution on node QRM > > > > > %DCL-W-IVVERB, unrecognized command verb - check validity and spelling > > > > > \RMU\ > > > > > %SYSMAN-I-OUTPUT, command execution on node OP2 > > > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > > > -CLI-E-IMGNAME, image file > > DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > > > %SYSMAN-I-OUTPUT, command execution on node OP1 > > > > > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > > > > > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 19:11:04) > > > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > > > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > > > > > - current after-image journal file is > > > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > > > > > - AIJ Log Server is active > > > > > - 2 active database users > > > > > - database also open on these nodes: > > > > > MP1 > > > > > %SYSMAN-I-OUTPUT, command execution on node MP2 > > > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > > > -CLI-E-IMGNAME, image file > > DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > > > %SYSMAN-I-OUTPUT, command execution on node MP1 > > > > > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > > > > > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > > > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > > > > > database DSA618:[DB_DISK001.Database]DB.RDB;1 > > > > > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > > > > > * database is opened by an operator > > > > > - current after-image journal file is DB_DISKA01:[AIJ]AIJ25.AIJ;1 > > > > > - global buffer count is 30000; 22250 global buffers free > > > > > - maximum global buffer count per user is 100 > > > > > - global section resides in system space > > > > > - AIJ Log Server is active > > > > > - 156 active database users > > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > > > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > > > > > - current after-image journal file is > > > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > > > > > - AIJ Log Server is active > > > > > - 2 active database users > > > > > - database also open on these nodes: > > > > > OP1 > > > > > > > > > |
|
#10
|
| Richard Maher schrieb: > Hi Nazim, > > > it is neither cologne nor munich. > Frankfurt? How's the Spring workload shaping up? :-) neither, located outside Germany. > > Anyway, what downtime window do you have? in production, the maintenance window must be scheduled in advance and involving a lot of other teams (downstream applications) before doing such test in prod, it must habe been tested successfully in the test env. but i have to do it first in the test environment, problem is that it consists of only 1 standalone machine. i have to ask responsables for configuring the test environment similar to prod (.ie. 2 node cluster with quorum duisk and shared storage) > All I can suggest, on the > information that you've given, is that you go in early one Sunday morning > and shut down the application on MP1 followed by a close of all the > databases. Then *before anything else* do a full off-line backup of all > databases (probably followed by a complete rmu/verify if you haven't been > doing them. we backup daily the RDB in hot backup mode. the Database remains open. if i do $rmu/verify/root is that sufficient ? it takes 13 min. > On second thoughts, best not to ask too many questions eh :-) > Then open the database(s) and applications up on OP1 and let the testers do > their work. If the System Startups/UAFs/logicals/configs and specs are the > same then I forsee no problems. i have checked system startups from a common source UAF , rightslist from a common area logicals and config are centralized, only a parameter decides on which node the app runs. > Does the RDMS$RUJ logical point to the same > place on all nodes? Anything in sys$specific? SYSMAN> do show log /all /full rdms$ruj %SYSMAN-I-OUTPUT, command execution on node QRM "RDMS$RUJ" [exec] = "DB_DISKRUJ" (LNM$SYSTEM_TABLE) 1 "DB_DISKRUJ" [exec] = "DSA617:[RDMS$RUJ]" [terminal] (LNM$SYSTEM_TABLE) %SYSMAN-I-OUTPUT, command execution on node OP2 "RDMS$RUJ" [exec] = "DB_DISKRUJ" (LNM$SYSTEM_TABLE) 1 "DB_DISKRUJ" [exec] = "DSA617:[RDMS$RUJ]" [terminal] (LNM$SYSTEM_TABLE) %SYSMAN-I-OUTPUT, command execution on node OP1 "RDMS$RUJ" [exec] = "DB_DISKRUJ" (LNM$SYSTEM_TABLE) 1 "DB_DISKRUJ" [exec] = "DSA617:[RDMS$RUJ]" [terminal] (LNM$SYSTEM_TABLE) %SYSMAN-I-OUTPUT, command execution on node MP2 "RDMS$RUJ" [exec] = "DB_DISKRUJ" (LNM$SYSTEM_TABLE) 1 "DB_DISKRUJ" [exec] = "DSA617:[RDMS$RUJ]" [terminal] (LNM$SYSTEM_TABLE) %SYSMAN-I-OUTPUT, command execution on node MP1 "RDMS$RUJ" [exec] = "DB_DISKRUJ" (LNM$SYSTEM_TABLE) 1 "DB_DISKRUJ" [exec] = "DSA617:[RDMS$RUJ]" [terminal] (LNM$SYSTEM_TABLE) > > In summary Nazim, apart from the suck-it-and-see approach, I see no way > forward. > > The one question I'd be sure to ask yourself before attempting the fail-over > is "when was the last time that I've had to do a production restore in > anger?". If the answer ends up "Buggered if I know!" then I suggest that you > practice restoring the database to the test box, maybe rolling forward AIJs, > enabling AIJs again. > on the test box AIJ is disabled. i have to enable it, when they agree to increae the disk space disk space status in test. root file , SNP and RDA ---> DSA1: 2,5 GB free of a total of 18 GB SNP and RDA ----> DSA2: 1,6 GB free of a total of 18 GB SNP and RDA -----> DSA3: 2,8 GB free of a total of 18 GB and RUJ disk space status on prod root file, RDA & SNP ------> DSA618 15,7 GB free of a total of 69,5 GB RDA & SNP ------> DSA621 6,3 GB free of a total of 8 GB AIJ -------> DSA616 4,2 GB free of a total of 8 GB RUJ --------> DSA617 almost all free of 8 GB total > Are you running circular AIJs or single/extensible? - After-image journaling is enabled - Database is configured for 70 journals - Reserved journal count is 70 - Available journal count is 36 - LogMiner is disabled - Journal switches to next available when full - 1 journal has been modified with transaction data - 34 journals can be created while database is active - Journal "AIJ35" is current - All journals are accessible - Shutdown time is 120 minutes - Backup operation is automatic via server - Backup uses no-quiet-point - Default backup filename edits are not used - Log server startup is AUTOMATIC - Operator notification is enabled for the following operators Central Cluster - Journal overwrite is disabled - AIJ cache on "electronic disk" is disabled - Default journal allocation is 250000 blocks - Default journal extension is 25000 blocks Default extension ignored because multiple journals active - Default journal initialization is 250000 blocks > ALS? OpenVMS V7.3 on node MP1 16-NOV-2006 19:51:47.90 Uptime 14 12:25:31 Pid Process Name State Pri I/O CPU Page flts Pages 23200430 RDMS_MONITOR LEF 15 160402 0 00:01:09.79 110471 88 23200561 RDM_ALS_1 HIB 15 211180 0 00:02:56.15 219 314 23200643 RDM_ALS_2 HIB 15 6253 0 00:01:18.45 402 491 >You don't say > you're running hot-standby but you are running DDAL; what transfers will > stop when you switch over? the DDAL is only replicating a subset of data destined to the public. its purpose is only selective nature not for availability. > > Do you have a support contract? i was engaged to do the work. :-) > If so call Oracle Rdb support for help. If > not, someone should bring this to the attention of the manager of the > dickhead that made that decision! Probably the same dickhead that sacked all > the real DBAs in the first place :-( > > You're on your own. Good-Luck. > > Regards Richard Maher > > $ pipe rmu/dump/head mf_personnel | sea sys$pipe node > Maximum node count is 16 > - WARNING: Maximum node count is 16 instead of 1 > > "Nazim" > news:1162990172.492625.213000-at-f16g2000cwb.googlegr oups.com... > > > > Richard Maher schrieb: > > > > > Hi Nazim, > > > > > > > that is why i was assigned the task to ensure correct failover > > > > strategy. > > > > > > And you're a contractor right? (Or you boss is a contractor?) Let's hope > the > > > customers not reading this eh :-) I'd love to know how much the > contract's > > > for, but then it's Cologne and not Munich and it's none of my business. > > > > > > > it is neither cologne nor munich. > > > > yes i am contractor, my boss is permanent and only since 1 year, so he > > inherited the stuff as it is. > > My role is to implement the failover scenario of our app, including the > > underlying RDB. > > the RDB stuff was implemented long time ago, and the team left since > > and the handover was not done correctly to my boss. (since he was there > > all worked fine, last time the Database was opened is over a year ago. > > > > > > MP1>rmu/show system sql$database > > Oracle Rdb V7.0-61 on node EVAMP1 10-OCT-2006 11:22:47.97 > > - monitor started 6-NOV-2004 11:12:09.67 (uptime 703 00:10:38) > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;77" > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > - first opened 11-NOV-2005 14:18:37.78 (elapsed 332 21:04:10) > > * database is opened by an operator > > - current after-image journal file is > > TPZH_DAL_DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1559 > > - AIJ Log Server is active > > - 2 active database users > > - database also open on these nodes: > > OP1 > > > > this is our prod Database > > > > database DSA618:[DB_DISK001.Database]DMG_DB.RDB;1 > > - first opened 15-MAY-2005 07:08:43.83 (elapsed 513 04:14:04) > > * database is opened by an operator > > - current after-image journal file is DB_DISKA02:[AIJ]AIJ18.AIJ;1 > > - global buffer count is 30000; 20550 global buffers free > > - maximum global buffer count per user is 100 > > - global section resides in system space > > - AIJ Log Server is active > > - 190 active database users > > > > > > > Anyway, is there not a UAT or other test environment that this can be > tested > > > in first? > > > > > > > unfortunately the UAT environment is on a standalone VMS machine > > > > > > > I'll assume not. When you do an RMU/DUMP/HEADER on the database(s) do > they > > > say number of cluster nodes is "1"? If they do then you'll have to make > sure > > > the databases are closed on MP1 before trying to open them on OP1. If > not > > > just open them up on both nodes and fire up the application on both > nodes > > > (if it's cluster tolerant) and get the application testing people > involved. > > > > > > on the RMU/DUMP/HEADER there is no reference of cluster nodes, the only > > reference is > > in the case of the DDAL database is also open on OP1. > > > > it is the node numbers. > > > > > > MP1>search ddal_dump.txt node > > Maximum node count is 16 > > - WARNING: Maximum node count is 16 instead of 1 > > MP1>search dmg_dump.txt node > > Maximum node count is 1 ----> yes. > > > > > > but what if MP1 crashes ? is there any danger to open the database on > > the other node ? > > > > > > our application is designed to be run only on 1 node at a time, but the > > RDB can be opened also on OP1 as a standby solution. > > OK before doing this i must close Database on MP1, then open on MP1 and OP1. > > > > i have to implement the application failover scenario on the VMS side, > > and the testing activities can only be done in a very restricted window > > on the week end. > > > > i have first to implement the theoretical stuff, then schedule a test > > plan. > > > > > > > > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > > > > as they reboot. correct ? > > > > > > No, I was suggesting that the beauty of VMS clusters and Rdb is that you > > > don't have to "fail-over" because, personally, I would open the database > and > > > the application on all of the nodes all of the time. If MP1 goes down > then > > > there would be a pregnant-pause followed by MP1 users having to log in > > > again, but that's it. The cluster took a lickin' but it kept on tickin'. > > > With Rdb partitioned lock trees and all the work VMS engineering has > been > > > doing with the DLM *and* the new interconnect stuff coming along, I see > no > > > point in restricting a database to one node. (Never have :-) > > > > this was done by other team, and they did not document why they did > > that like this. > > > > > > > > The fact that you're using Data Distributor (why?) leeds me to suspect > that > > > not all disks are accessible cluster wide or there's something dodgy > with > > > the application. Our DR used to be copying RBFs over to the mirror > machine > > > and restoring them and rolloing forward AIJs. Once every couple of years > > > we'd be forced to run in DR for a week and then switch back with no loss > of > > > data. They were *never* able to get the Unix systems to achieve the same > > > thing! (They'd just get someone to log on and that would be that. i.e. > > > production never shifted) VMS guys were moving to a Disaster Tolerant > set up > > > when I left. > > > > do you mean by data distributor the DDAL$TR_DB.RDB ? > > > > all the DSAn disks are accessible clustewide. > > > > > > > > My *guess* is everything will be ok except for DNS cache flushes and > > > hard-coded SQL/Services server names. (But then, if I was getting paid > to do > > > it, I'd make sure :-) > > > > the application specific sqlservices are setup identically on MP1 and > > OP1 > > > > DNS cache switch needs also be checked with the downstram applications > > which connects to our RDB, but thats another story. > > > > > > regards, > > > > Nazim Manser > > > > > > > > Regards Richard Maher > > > > > > "Nazim" > > > news:1162981579.381769.299700-at-k70g2000cwa.googlegr oups.com... > > > > > > > > Richard Maher schrieb: > > > > > > > > > Hi Nazim, > > > > > > > > > > If this system runs anything other "Mom and Dad's corner Deli VAT > > > return" > > > > > then I suspect that you (or the company you support) are in in big > > > trouble! > > > > > Get yourself a professional DBA and pay them what they ask to do the > job > > > > > properly. The questions turning up here (and more so in the ITRC) > about > > > Rdb > > > > > are truly frightening. I wish I could find out who these companies > are > > > and > > > > > turn up to their next risk-assessment or shareholders meeting :-( > > > > > > > > that is why i was assigned the task to ensure correct failover > > > > strategy. > > > > > > > > > > > > > > Anyway no one can answer your question directly unless they know a > bit > > > more > > > > > about MP and OP. I suggest "yes" but if you've never tried a > failover > > > before > > > > > then what are the extra machines there for. The fact that you appear > to > > > be > > > > > running Data Distributor raises an eyebrow, but my advice is to open > the > > > > > database on *all* nodes and use them *all* *all* of the time in > possibly > > > a > > > > > wide-are cluster configuration. > > > > > > > > > > > > > MP1 and OP1 are on 2 sites but share the samefile system. > > > > to be precise > > > > > > > > the file layout of the RDB stuff is as follows: > > > > > > > > root file location : dsa618:[db_disk001.Database] > > > > RDA & SNP files: dsa618:[db_disk001.Database] > > > > dsa618:[db_disk002.Database] > > > > dsa618:[db_disk003.Database] > > > > AIJ files: dsa616:[db_diskA01.Database] > > > > dsa616:[db_diskA02.Database] > > > > RUJ files dsa617:[rdms$ruj] > > > > > > > > > > > > MP1>sh dev dsa618 > > > > > > > > Device Device Error Volume Free > > > > Trans Mnt > > > > Name Status Count Label Blocks > > > > Count Cnt > > > > DSA618: Mounted 0 DMG_DB 32582436 > > > > 7696 4 > > > > $1$DGA230: (OP2) ShadowSetMember 0 (member of DSA618 ![]() > > > > $1$DGA430: (MP1) ShadowSetMember 0 (member of DSA618 ![]() > > > > MP1>sh dev dsa621 > > > > > > > > Device Device Error Volume Free > > > > Trans Mnt > > > > Name Status Count Label Blocks > > > > Count Cnt > > > > DSA621: Mounted 0 DMG_DB2 12936924 > > > > 5 4 > > > > $1$DGA214: (OP2) ShadowSetMember 0 (member of DSA621 ![]() > > > > $1$DGA414: (MP1) ShadowSetMember 0 (member of DSA621 ![]() > > > > MP1>sh dev dsa616 > > > > > > > > Device Device Error Volume Free > > > > Trans Mnt > > > > Name Status Count Label Blocks > > > > Count Cnt > > > > DSA616: Mounted 0 DMG_AIJ 8673228 > > > > 100 4 > > > > $1$DGA210: (OP2) ShadowSetMember 0 (member of DSA616 ![]() > > > > $1$DGA410: (MP1) ShadowSetMember 0 (member of DSA616 ![]() > > > > MP1>sh dev dsa617 > > > > > > > > Device Device Error Volume Free > > > > Trans Mnt > > > > Name Status Count Label Blocks > > > > Count Cnt > > > > DSA617: Mounted 0 DMG_RUJ 17359776 > > > > 165 4 > > > > $1$DGA211: (OP2) ShadowSetMember 0 (member of DSA617 ![]() > > > > $1$DGA411: (MP1) ShadowSetMember 0 (member of DSA617 ![]() > > > > > > > > > > > > usually a RMU/open on that Database is done only on MP1. > > > > i would like to know what happens, when in case of failover (MP1 > > > > crashes) i do a RMU/open on OP1 node. > > > > > > > > as it is a mission critical production Database, i want to be sure 100% > > > > before updating our documentation. > > > > > > > > so as you say, the RMU/open should be done on both MP1 and OP1 as soon > > > > as they reboot. correct ? > > > > > > > > i an new (2 months) and i inherited, the task to support the > > > > application and its underklying RDB. > > > > > > > > regards, > > > > > > > > Nazim Manser > > > > > > > > > Rdb engineering hates clusters 'cos Norm doesn't get to use his > beloved > > > > > Row-Ca$h, but don't let that bother you. > > > > > > > > > > Regards Richard Maher > > > > > > > > > > "Nazim" > > > > > news:1162918356.022076.305610-at-b28g2000cwb.googlegr oups.com... > > > > > > Hi guys, > > > > > > > > > > > > we are running RDB (Oracle Rdb V7.0-61), SQLSERVICES (v7.1-59) on > a 5 > > > > > > node cluster on 2 sites. > > > > > > OpenVMS V7.3 > > > > > > > > > > > > > > > > > > site 1: > > > > > > > > > > > > MP1 sys$sysroot = DSA200:[SYS0.] > > > > > > MP2 sys$sysroot = DSA200:[SYS1.] > > > > > > > > > > > > site 2: > > > > > > > > > > > > OP1 sys$sysroot = DSA100:[SYS0.] > > > > > > OP2 sys$sysroot = DSA100:[SYS1.] > > > > > > QRM sys$sysroot = DSA300:[SYS0.] > > > > > > > > > > > > our application runs on node MP1 and uses the following Database > > > > > > database DSA618:[DB_DISK001.Database]DB.RDB > > > > > > > > > > > > but for failover scenario we need to do a RMU/OPEN > > > > > > DSA618:[DB_DISK001.Database]DB.RDB on node OP1, are there any problems > doing > > > > > > this ? > > > > > > > > > > > > RDB is started on nodes MP1 and OP1 but in normal operations the > Database > > > > > > database DSA618:[DB_DISK001.Database]DB.RDB is opened only on node MP1 > > > > > > > > > > > > thanks for your answers > > > > > > > > > > > > N.Manser > > > > > > > > > > > > > > > > > > > > > > > > SYSMAN> do rmu/show system > > > > > > %SYSMAN-I-OUTPUT, command execution on node QRM > > > > > > %DCL-W-IVVERB, unrecognized command verb - check validity and > spelling > > > > > > \RMU\ > > > > > > %SYSMAN-I-OUTPUT, command execution on node OP2 > > > > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > > > > -CLI-E-IMGNAME, image file > > > DSA100:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > > > > %SYSMAN-I-OUTPUT, command execution on node OP1 > > > > > > Oracle Rdb V7.0-61 on node OP1 7-NOV-2006 17:40:14.74 > > > > > > - monitor started 8-APR-2006 22:29:10.31 (uptime 212 > 19:11:04) > > > > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;107" > > > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > > > > - first opened 8-APR-2006 22:30:00.82 (elapsed 212 19:10:13) > > > > > > - current after-image journal file is > > > > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1575 > > > > > > - AIJ Log Server is active > > > > > > - 2 active database users > > > > > > - database also open on these nodes: > > > > > > MP1 > > > > > > %SYSMAN-I-OUTPUT, command execution on node MP2 > > > > > > %DCL-W-ACTIMAGE, error activating image RDMPRV > > > > > > -CLI-E-IMGNAME, image file > > > DSA200:[SYS1.SYSCOMMON.][SYSLIB]RDMPRV.EXE;8 > > > > > > -SYSTEM-F-PROTINSTALL, protected images must be installed > > > > > > %SYSMAN-I-OUTPUT, command execution on node MP1 > > > > > > Oracle Rdb V7.0-61 on node MP1 7-NOV-2006 17:40:13.59 > > > > > > - monitor started 2-NOV-2006 07:36:25.92 (uptime 5 10:03:47) > > > > > > - monitor log filename is "SYS$SYSROOT:[SYSEXE]RDMMON.LOG;79" > > > > > > database DSA618:[DB_DISK001.Database]DB.RDB;1 > > > > > > - first opened 2-NOV-2006 08:32:01.47 (elapsed 5 09:08:12) > > > > > > * database is opened by an operator > > > > > > - current after-image journal file is > DB_DISKA01:[AIJ]AIJ25.AIJ;1 > > > > > > - global buffer count is 30000; 22250 global buffers free > > > > > > - maximum global buffer count per user is 100 > > > > > > - global section resides in system space > > > > > > - AIJ Log Server is active > > > > > > - 156 active database users > > > > > > database DSA0:[DDAL.DATABASE]DDAL$TR_DB.RDB;1 > > > > > > - first opened 2-NOV-2006 07:37:17.02 (elapsed 5 10:02:56) > > > > > > - current after-image journal file is > > > > > > DB_DISKA01:[AIJ]DDAL_AIJ001.AIJ;1578 > > > > > > - AIJ Log Server is active > > > > > > - 2 active database users > > > > > > - database also open on these nodes: > > > > > > OP1 > > > > > > > > > > > > |