一套Windows上的11.2.0.1单实例数据库在database open阶段出现了ORA-07445:core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []内部错误,具体的出错日志如下:
LOG CONTENT=======================ALERT.LOG============================Starting ORACLE instance (normal)LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 0Picked latch-free SCN scheme 2Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DESTARCH: Warning; less destinations available than specifiedby LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameterAutotune of undo retention is turned on.IMODE=BRILAT =842011-08-01 13:13:47.068000 +08:00LICENSE_MAX_USERS = 0SYS auditing is disabledStarting up:Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options.Using parameter settings in server-side spfile C:\APP\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEG11R2.ORASystem parameters with non-default values:  _spin_count              = 2000  processes                = 500  event                    = "10500 trace name context forever,level 8:10013 trace name context forever,level 10:10015 trace name context forever,level 10"  sga_max_size             = 600M  shared_pool_size         = 152M  large_pool_size          = 32M  java_pool_size           = 4M  streams_pool_size        = 0  _db_file_direct_io_count = 12  sga_target               = 0  memory_target            = 0  control_files            = "C:\APP\ORADATA\G11R2\CONTROLFILE\O1_MF_6VWCSH9J_.CTL"  control_files            = "C:\APP\FLASH_RECOVERY_AREA\G11R2\CONTROLFILE\O1_MF_6VWCSHNF_.CTL"  db_block_checksum        = "TRUE"  db_block_size            = 8192  db_cache_size            = 196M  _shared_io_pool_size     = 0  compatible               = "11.2.0.0.0"  log_archive_dest_2       = "service=stdby optional lgwr sync affirm valid_for=(online_logfiles,all_roles)"  log_buffer               = 10485760  db_create_file_dest      = "C:\app\oradata"  db_recovery_file_dest    = "C:\app\flash_recovery_area"  db_recovery_file_dest_size= 500000M  undo_tablespace          = "UNDOTBS1"  _kgl_bucket_count        = 2  remote_login_passwordfile= "EXCLUSIVE"  db_domain                = ""  session_cached_cursors   = 300  audit_file_dest          = "C:\APP\ADMIN\G11R2\ADUMP"  optimizer_features_enable= "10.2.0.4"  audit_trail              = "DB"  cell_offload_plan_display= "ALWAYS"  db_name                  = "G11R2"  open_cursors             = 3000  _optimizer_extended_cursor_sharing_rel= "NONE"  pga_aggregate_target     = 300M  diagnostic_dest          = "C:\APP"2011-08-01 13:13:48.164000 +08:00PMON started with pid=2, OS id=984VKTM started with pid=3, OS id=3656 at elevated priorityVKTM running at (10)millisec precision with DBRM quantum (100)msGEN0 started with pid=4, OS id=5824DIAG started with pid=5, OS id=5832DBRM started with pid=6, OS id=2784PSP0 started with pid=7, OS id=2500DIA0 started with pid=8, OS id=5320MMAN started with pid=9, OS id=4128DBW0 started with pid=10, OS id=5852LGWR started with pid=11, OS id=3960CKPT started with pid=12, OS id=4472SMON started with pid=13, OS id=5788RECO started with pid=14, OS id=6036MMON started with pid=15, OS id=5740MMNL started with pid=16, OS id=2112ORACLE_BASE from environment = C:\appalter database mount exclusive2011-08-01 13:13:52.390000 +08:00Sweep [inc][135908]: completedNSS2 started with pid=19, OS id=2728Sweep [inc][135901]: completedSuccessful mount of redo thread 1, with mount id 2704081164Database mounted in Exclusive Mode2011-08-01 13:13:53.413000 +08:00Lost write protection disabled2011-08-01 13:13:54.578000 +08:00Sweep [inc][135897]: completedSweep [inc2][135908]: completedSweep [inc2][135901]: completedSweep [inc2][135897]: completed2011-08-01 13:13:55.788000 +08:00Completed: alter database mount exclusivealter database openBeginning crash recovery of 1 threads parallel recovery started with 3 processes2011-08-01 13:13:56.959000 +08:00Started redo scanCompleted redo scan read 0 KB redo, 0 data blocks need recoveryStarted redo application at Thread 1: logseq 867, block 88140, scn 9122496Recovery of Online Redo Log: Thread 1 Group 3 Seq 867 Reading mem 0  Mem# 0: C:\APP\ORADATA\G11R2\ONLINELOG\O1_MF_3_6VWCSMPO_.LOG  Mem# 1: C:\APP\FLASH_RECOVERY_AREA\G11R2\ONLINELOG\O1_MF_3_6VWCSNGX_.LOGCompleted redo application of 0.00MBCompleted crash recovery at Thread 1: logseq 867, block 88140, scn 9142497 0 data blocks read, 0 data blocks written, 0 redo k-bytes read2011-08-01 13:13:58.738000 +08:00LGWR: STARTING ARCH PROCESSESARC0 started with pid=22, OS id=47842011-08-01 13:13:59.765000 +08:00ARC0: Archival startedLGWR: STARTING ARCH PROCESSES COMPLETEARC0: STARTING ARCH PROCESSESARC1 started with pid=24, OS id=2780ARC2 started with pid=25, OS id=1288ARC1: Archival startedLGWR: Primary database is in MAXIMUM AVAILABILITY modeARC2: Archival startedARC1: Becoming the 'no FAL' ARCHARC1: Becoming the 'no SRL' ARCHARC2: Becoming the heartbeat ARCHLGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWRARC3 started with pid=26, OS id=38762011-08-01 13:14:00.828000 +08:00ARC3: Archival startedARC0: STARTING ARCH PROCESSES COMPLETENSS2 started with pid=19, OS id=51562011-08-01 13:14:29.008000 +08:00ORA-16198: LGWR received timedout error from KSR2011-08-01 13:14:35.980000 +08:00Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_lgwr_3960.trc:ORA-16198: Timeout incurred on internal channel during remote archivalLGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZEDLGWR: Continuing...ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch2011-08-01 13:14:38.629000 +08:00Trying to expand controlfile section 11 for Oracle Managed FilesException [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_arc0_4784.trc  (incident=136091):ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136091\g11r2_arc0_4784_i136091.trc2011-08-01 13:14:40.283000 +08:00Trace dumping is performing id=[cdmp_20110801131440]2011-08-01 13:14:52.417000 +08:00Sweep [inc][136091]: completedSweep [inc2][136091]: completed2011-08-01 13:14:59.805000 +08:00ARC2: Detected ARCH process failureARC2: STARTING ARCH PROCESSESARC0 started with pid=19, OS id=50162011-08-01 13:15:00.836000 +08:00ARC0: Archival startedARC2: STARTING ARCH PROCESSES COMPLETE2011-08-01 13:15:36.689000 +08:00Deleted Oracle managed file C:\APP\FLASH_RECOVERY_AREA\G11R2\ARCHIVELOG\2011_08_01\O1_MF_1_866_73DFKWRK_.ARC2011-08-01 13:15:38.013000 +08:00Error 12154 received logging on to the standbyErrors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc:ORA-12154: TNS:could not resolve the connect identifier specifiedARCH: Error 12154 Creating archive log file to 'stdby'Trying to expand controlfile section 11 for Oracle Managed FilesException [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc  (incident=136051):ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc2011-08-01 13:15:39.680000 +08:00Trace dumping is performing id=[cdmp_20110801131539]2011-08-01 13:15:42.782000 +08:00PMON (ospid: 984): terminating the instance due to error 3972011-08-01 13:15:50.520000 +08:00Instance terminated by PMON, pid = 984=============================g11r2_ora_4852_i136051.trc=============================Dump file c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trcOracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsWindows NT Version V6.1 Service Pack 1CPU                 : 4 - type 586, 2 Physical CoresProcess Affinity    : 0x0x00000000Memory (Avail/Total): Ph:2122M/3566M, Ph+PgF:5413M/7130M, VA:1084M/2047MInstance name: g11r2Redo thread mounted by this instance: 1Oracle process number: 17Windows thread id: 4852, image: ORACLE.EXE (SHAD)*** 2011-08-01 13:15:38.527*** SESSION ID:(197.1) 2011-08-01 13:15:38.527*** CLIENT ID:() 2011-08-01 13:15:38.527*** SERVICE NAME:() 2011-08-01 13:15:38.527*** MODULE NAME:(oradim.exe) 2011-08-01 13:15:38.527*** ACTION NAME:() 2011-08-01 13:15:38.527Dump continued from file: c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trcORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []========= Dump for incident 136051 (ORA 7445 [kcflfi()+466]) ========----- Beginning of Customized Incident Dump(s) -----Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionProcess Id: 0x000010bc  Thread Id : 0x000012f4    Time : Mon Aug 01 13:15:38Excp. Code: 0xc0000094  Excp. Type: INT_DIVIDE    Flags: 0x00000000------------------- Registers ----------------------------eip = 0500282e esp = 0d9f525c ebp = 0d9f577c edi = 37eefe00 esi = 00000265eax = 00000265 ebx = 00000000 ecx = 089ee234 edx = 00000000ecs = 0000001b eds = 00000023 ees = 00000023 ess = 00000023egs = 00000000 efs = 0000003beflags = 00010246------------------- End of Registers ---------------------*** 2011-08-01 13:15:38.536dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----alter database open----------- messages from pre-loading .sym files:Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are differentSymbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different----------- end of messages from pre-loading .sym files----- Call Stack Trace -----calling              call     entry                argument values in hexlocation             type     point                (? means dubious value)-------------------- -------- -------------------- ----------------------------Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are differentSymbol file C:\app\product\11.2.0\dbhome_1\BIN\oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are differentEnumerateLoadedModules64 failed with error -1073741819Symbol file oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are differentSymbol file oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different__VInfreq__kcflfi()           00000000+466_kccrszf()+287       CALLrel  _kcflfi()            0 318345B8 34 31C0DD40 4000                                                   265 4 7FFFFFFF 1 0 0_kccrsd_expd()+1418  CALLrel  _kccrszf()           D9F7CEC 268 264_kccwnc_reuse_expan  CALLrel  _kccrsd_expd()       D9F7CEC B 38d()+640__VInfreq__kccwnc()  CALLrel  _kccwnc_reuse_expan  D9F7CEC B 26+235                          d()_krse_arc_complete(  CALLrel  _kccwnc()            D9F7CEC D9F6D38 B)+1615_krse_arc_driver_co  CALLrel  _krse_arc_complete(  D9F78ACre()+1307                     )_krse_arc_driver()+  CALLrel  _krse_arc_driver_co  D9F7CEC 1 D9F7C6C 0 0 D9F7CC8274                           re()                 0 0 0 0 0 0 0_krsq_arch_to_force  CALLrel  _krse_arc_driver()   D9F7CEC 1 D9F7C6C 0 0 D9F7CC8_switch()+196                                      0 0 0 0 0 0 0__VInfreq__kcttsc()  CALLrel  _krsq_arch_to_force  D9F7CEC 1+129                          _switch()_kcfopd()+1504       CALLrel  _kcttsc()            2_adbdrv()+16700      CALLrel  _kcfopd()            0 0 0 0 D9FBBF8_opiexe()+13594      CALLrel  _adbdrv()            4A C0000094 33644518 D9FBD38                                                   6D60697 2F3FC5F0_opiosq0()+6248      CALLrel  _opiexe()            4 0 D9FC704_kpooprx()+277       CALLrel  _opiosq0()           3 E D9FC970 A4 0_kpoal8()+632        CALLrel  _kpooprx()           D9FF074 D9FD3F8 13 1 0 A4_opiodr()+1248       CALLreg  00000000             5E 1C D9FF070___dyn_tls_init_cal  CALLreg  00000000             5E 1C D9FF070 1lback()+2935122_opitsk()+1404       CALL???  00000000             C9A10E8 5E D9FF070 0 D9FED00                                                   D9FF19C 53E52E 0 D9FF1C8_opiino()+980        CALLrel  _opitsk()            0 0_opiodr()+1248       CALLreg  00000000             3C 4 D9FFBC4_opidrv()+1201       CALLrel  _opiodr()            3C 4 D9FFBC4 0_sou2o()+55          CALLrel  _opidrv()            3C 4 D9FFBC4_opimai_real()+124   CALLrel  _sou2o()             D9FFBD4 3C 4 D9FFBC4_opimai()+125        CALLrel  _opimai_real()       2 D9FFBFC_OracleThreadStart@  CALLrel  _opimai()            2 D9FFF3C 0 70 FFFFFFFF4()+830                                            FFFFFFFF___dyn_tls_init_cal  CALLptr  00000000             901FF6C D9FFFD4 776437F5lback()+366382316                                  901FF6C 765D34CB 0___dyn_tls_init_cal  CALLreg  00000000             901FF6C 765D34CB 0 0 901FF6Clback()+367384440                                  0___dyn_tls_init_cal  CALLrel  ___dyn_tls_init_cal  401326 901FF6C 0 0 0 0lback()+367384392             lback()+36738440300000000             CALL???  00000000--------------------- Binary Stack Dump ---------------------..................
从以上日志中可以看到在”Trying to expand controlfile section 11 for Oracle Managed Files“扩扎控制文件过程中出现了 _kccwnc_reuse_expan->_kccrsd_expd->_kccrszf->_kcflfi->_VInfreq__kcflfi() 函数的7445错误,kcf意为(manages and coordinates operations on the control file(s),kcf.c),是在处理日志文件中引发了INT_DIVIDE_BY_ZERO除数为零的代码bug。 通过7445和kcflfi关键词在MOS上搜索没有太大的发现,说明该Bug的处罚几率非常低,正好让我碰到说明是某些特殊参数的设置引起了该问题。 目标锁定启动日志中的非默认隐藏参数"_db_file_direct_io_count",该参数决定了直接路径读写的IO大小,从9i开始该参数的单位调整为bytes而非原先的blocks,之前因为对该参数进行一些测试所以设置了一个较小值。
Parameter: DB_FILE_DIRECT_IO_COUNT~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Versions:	8.0 - 8.1                This parameter is hidden in 9.0 onwards. Parameter type:        integer Parameter class:       dynamic, scope = ALTER SYSTEM DEFERRED Default value:         64 Range of values:       operating system-dependentDescription:~~~~~~~~~~~~DB_FILE_DIRECT_IO_COUNT is used to specify the number of blocks to be usedfor IO operations done by backup, restore or direct path read and writefunctions. The IO buffer size is a product of DB_FILE_DIRECT_IO_COUNT andDB_BLOCK_SIZE. The IO buffer size cannot exceed max_IO_size for yourplatform.Assigning a high value to this parameter results in greater use of PGA orSGA memory.o In Oracle8i, minimize the number of I/O requests by setting the  DB_FILE_DIRECT_IO_COUNT instance parameter so that  DB_BLOCK_SIZE x DB_FILE_DIRECT_IO_COUNT = max_io_size of system  In Oracle8i the default for this is 64 blocks.  (In Oracle9i, it is replaced by _DB_FILE_DIRECT_IO_COUNT which governs   the size of direct I/Os in BYTES (not blocks). The default is 1Mb but   will be sized down if the max_io_size of the system is smaller.)ORA-19863 during RMAN duplicateApplies to:Oracle Server - Enterprise Edition - Version: 10.2.0.3This problem can occur on any platform.Symptoms-- Problem Statement:Duplicate failed during the datafile restore stage:Starting restore at 2008-Apr-09 09:28:24using channel ORA_AUX_DISK_1channel ORA_AUX_DISK_1: starting datafile backupset restorechannel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup setrestoring datafile 00003 to /u06/oradata/hcmprdc/sysaux01.dbf...restoring datafile 00121 to /u06/oradata/hcmprdc/waapp.dbfchannel ORA_AUX_DISK_1: reading from backup piece/u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of Duplicate Db command at 04/09/2008 09:28:28RMAN-03015: error occurred in stored script Memory ScriptORA-19870: error reading backup piece /u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1ORA-19863: device block size 1040384 is larger than max allowed: 262144CauseThe database parameter _db_file_direct_io_count in the target and auxiliary instance does not match.Solution-- To implement the solution:Ensure that parameter _db_file_direct_io_count on the target and auxiliary database the same_DB_FILE_DIRECT_IO_COUNT need to be set to the same value between the source databasewhere the backup was taken and the target database where the backup is being restored.2.0 Size of Input/Output Buffers================================a. input buffers----------------NOTE : DB_FILE_DIRECT_IO_COUNT is not available in Oracle9i onwards.       In Oracle9i, it is replaced by a hidden _DB_FILE_DIRECT_IO_COUNT which       governs the size of direct I/Os in BYTES (not blocks). The default is       1Mb butwill be sized down if the max_io_size of the system is smaller.The input buffer size is:  buffersize = db_block_size * db_file_direct_io_countAs there are 4 input buffers, the total input buffer memory use per channel is: memory(input) = #buffers * #files * buffersize               = 4 * #files * buffersizeFor example, if 2 channels are used, and each of these channels backs up 3files, then for each channel memory(input) = 4 * 3 * db_block_size * db_file_direct_io_countb. output buffers-----------------For disk channels, the output buffer size is:  buffersize = db_block_size * db_file_direct_io_countFor SBT_TAPE channels, the output buffer size in Oracle8/8i is o/s dependant. (On Solaris,this defaults to 64k) On 9i/10g it defaults to 256k for all platforms. The BLKSIZE argument to 'allocate channel...' can beused to override the default value.As there are 4 output buffers,  memory(output) = #buffers * buffersize                 = 4 * buffersize
一般来说使用该隐藏参数的默认值即可,通过重置该参数后修复启动问题:
SQL> select * from v$version;BANNER--------------------------------------------------------------------------------Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionPL/SQL Release 11.2.0.1.0 - ProductionCORE    11.2.0.1.0      ProductionTNS for 32-bit Windows: Version 11.2.0.1.0 - ProductionNLSRTL Version 11.2.0.1.0 - ProductionSQL> alter system reset "_db_file_direct_io_count" scope=spfile;System altered.SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y WHERE x.inst_id = USERENV ('Instance') AND y.inst_id = USERENV ('Instance') AND x.indx = y.indxAND x.ksppinm LIKE '%db_file_direct_io_count%'/NAME                           VALUE                DESCRIB------------------------------ -------------------- ------------------------------_db_file_direct_io_count       1048576              Sequential I/O buf sizewindows上的11gr2默认该参数为1MB