Friday, 22 August 2014

Prewarning for HA System Switchovers Triggered By Unexpected Restarts Of Non - Key Processes And for Dcserver And Porttrunk_agent Process Startup Failur

Problem 1: Unexpected Restart of Some Non-key Processes Triggers an Automatic Switchover of the HA System


Trigger condition:


An automatic switchover of the HA system is triggered if some non-key processes automatically and unexpectedly restart for over three consecutive times in the following U2000 versions running on Solaris/Linux:


U2000 V100R006C02SPC300


U2000 V100R006C02SPC301


U2000 V100R006C02SPC302


Symptom:


An automatic switchover of the HA system is performed.


Identification method:


Run the following command, in which the Toolkit process is used as an example, to view process configurations:


# cat /opt/HWENGR/NMSApp/flag/nonCriticalProcessList |grep ToolkitService


If the command output is empty, the configurations of the Toolkit process are incorrect andthe restoreProcessTool must be used to rectify the problem. If the command output is not empty, no action is required.


Root cause:


Some non-key processes are configured as key processes, which is incorrect. In this situation, the processes unexpectedly restart for over three consecutive times, causing the automatic switchover of the HA system.


Impact and risk:


An automatic switchover of the HA system may occur.


Workarounds:Use the restoreProcessTool to rectify the problem.


To obtain the tool, access and choose Software CenterControlled Tool (Mini-tool Software)Network OSS&Service(25)U2000 Maintenance Tools(16)Cross-domain Maintenance Tools(3)restoreProcessTool.


For details about how to use the restoreProcessTool to correct process configurations, see


restoreProcessTool User Guide.


Preventive measures:


This problem will be resolved in the U2000 V100R006C02SPC303 to be released in the late of September 2013.


Problem 2: DCServer and porttrunk_agent Processes Fail to Start After an HA System Switchover Is Performed When the nmsuser ID or nmsgroup ID of the Operating System on the Primary Site Is Different from That on the Secondary Site


Trigger condition:


This problem occurs when both of the following conditions are met:


􀁺 On the U2000 V100R006C02SPC302 (Solaris/Linux), the nmsuser ID or nmsgroup ID of the operating system on the primary site is different from that on the secondary site.


􀁺 An automatic or a manual switchover of the HA system is performed.


Symptom:


The DCServer and porttrunk_agent processes on the secondary site fail to start after the switchover is performed.


Identification method:


Step 1Verify that the DCServer and porttrunk_agent processes fail to start


Step 2 Run the following command to check the owner and group of the filesync, porttrunking, unitedmgr, and DCDB directories. If the owner and group of the directories are the nmsuser ID and nmsgroup ID on the original primary site (that is, the primary site before the switchover) or another user and user group, not nmsuser and nmsgroup, the problem occurs.


#ls -lrst /opt/sybase/data2 drwxr-x--- 3 100 100 512 May 27 01:37 filesync


2 drwxr-x--- 2 100 100 512 May 27 01:37 porttrunking


2 drwxr-x--- 4 100 100 512 May 27 01:38 unitedmgr


4 drwxr-x--- 2 100 100 1536 May 30 19:50 DCDB


In this example, the owner and group of the four directories are both 100 and equal to the nmsuser ID and nmsgroup ID on the original primary site. The problem identification must be based on the actual owner and group information in the command output.


Root cause:


After synchronization is performed between the primary and secondary sites in the HA system, the owner ID and group ID of the DC Server and porttrunk_agent process-related files on the secondary site become the nmsuser ID and nmsgroup ID on the primary site. If the nmsuser ID or nmsgroup ID of the operating system on the primary site is different from that on the secondary site, nmsuser and nmsgroup cannot be identified on the secondary site, causing the reading and writing failures of relevant processes.Impact and risk:


The DCServer and porttrunk_agent processes fail to start, affecting their functions.


Process


Function Description


DCServer


Provides NE software management and data disaster tolerance functions. You can use these functions to upgrade NE software, install patches, back up and restore data for disaster tolerance, and perform automatic upgrades of case-shaped devices using the plug-and-play function.


porttrunk_agent


Provides the function of aggregating independent ports into a port.


Workarounds:


Use the restoreNmsuserIdTool to rectify the problem.


To obtain the tool, access and choose Software CenterControlled Tool (Mini-tool Software)Network OSS&Service(25)U2000 Maintenance Tools(16)Cross-domain Maintenance Tools(3)restoreNmsuserIdTool.Maintenance Tools(16)Cross-domain Maintenance Tools(3)restoreNmsuserIdTool.Huawei Optical transmission product


For details about how to use the restoreNmsuserIdTool to correct the owner and group of the DC Server and porttrunk_agent process-related files, see restoreNmsuserIdTool User Guide.


Preventive measures:


Copyright �This problem will be resolved in the U2000 V100R006C02SPC303 to be released in the late of September 2013.


This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read the FAQ at http://ift.tt/jcXqJW.





No comments:

Post a Comment