Wednesday, October 3, 2012

SBL-NET-01201: Internal: connect() failed: %1





Applies to:


Siebel CRM - Version: 8.1.1 [21112] to 8.1.1 [21112] - Release: V8 to V8
Information in this document applies to any platform.



Symptoms


Customer reported the following:

In our new Siebel TEST environment, 2 Application Servers failing with Handshake failed error.



This is newly built Siebel environment, we have multiple servers

Ntsydasu304 - gateway and app server

Ntsydasu303 - App Server

Ntsydasu1186 - App Server

Ntsydwbu117 - Web Server



The Siebel gateway is up and running, all the application servers are
started. But 2 application servers Ntsydasu304 and Ntsydasu1186 are
failing with handshake Failed error




Cause



The issue seems to be caused by either port number being used by
other non Siebel process, or the mapping between the hostname of the
servers and the IP addresses. This analysis was based on the fact that
when srvrmgr tried to communicate with the ServerMgr, it threw errors
below:


- Handshake(siebel://ntsydasu1186:49162/es_obfstsb1/servermgr/ntsydasu1186) on conn 0x3121330 ok

- connect() to ntsydasu303:49168 failed (err=10060 | Connection timed out.

- connect() to ntsydasu304:49168 failed (err=10060 | Connection timed out.






Solution


For the benefit of other readers:

It was suggested to the customer to try the following:

telnet ntsydasu1186 49162
telnet ntsydasu303 49168
telnet ntsydasu304 49168

It
is expected that the first one should be successful. If the 2nd and 3rd
are also successful, please shutdown the Siebel servers and run the
telnet again on the last 2 servers, to verify if other non Siebel
process is listening on port 49168.

If the second and third fails
or not responding, please try telnet the IP address. If successful,
then there seems to be problem with the host-IP mapping, please verify.



By following the above steps, customer was able to identify the
cause of the problem, They opened up all the required ports and the
problem could be resolved.










Applies to:


Siebel System Software - Version: 7.7.2.1 SIA [18353] and later   [Release: V7 and later ]
z*OBSOLETE: Microsoft Windows Server 2003

Product Release: V7 (Enterprise)

Version: 7.7.2.1 [18353] Hi Tech

Database: Oracle 9.2.0.6

Application Server OS: Microsoft Windows 2003 Server SP1

Database Server OS: Sun Solaris 9



This document was previously published as Siebel SR 38-2264779307.



Symptoms


SBL-SMI-00033, SBL-NET-01201Hello,

In migrating from Siebel 6.3 to 7.7 the environment was taken from regional
servers one in the US and one in Sweden to a single Siebel server in Germany. The reason that I
have opened this SR is that we are having connectivity problems when users access Siebel from the
non European companies that did not exist in 6.3.

We are looking for suggestions on how to
trace this issue or solve this issue. Our users use VPN software to access the network and
therefore Siebel. We have problems with the connected clients but the are worse for Siebel Mobile
Web Clients. We have no Siebel Dedicated Web Clients.


Best regards-






Cause


Configuration/ Setup


Solution



Message 1


For the benefit of other readers. The customer found that during
synchronization over VPN several clients would experience disconnection.




Analysis of the log files on the client revealed the following:

SisnTcpIp    SisnSockWarning    2    0    2005-07-13 13:16:15     1380:
[TCPIP-client] connect() to 163.157.2.202:40400 failed (err=10060 |
Connection timed out.)

GenericLog    GenericError    1    0    2005-07-13
13:16:15    (commapi.cpp (298) err=1801201 sys=10060) SBL-NET-01201:
Internal: connect() failed: Connection timed out.

GenericLog    GenericError    1    0    2005-07-13
13:16:15    (commapi.cpp (298) err=1700175 sys=2) SBL-DCK-00175: Cannot
open connection to 163.157.2.202. The Synch Manager component on the
server is most likely unavailable.



These errors indicate a network related behavior and not a problem or defect with the Siebel Software.



After further troubleshooting we believed that possibly packets were being dropped due to size.



The customer was referred to SR # 38-743187351. Here is some additional information on this registry setting:

http://www.microsoft.com/resources/documentation/Windows/2000/server/reskit/en-us/Default.asp?url=/resources/documentation/Windows/2000/server/reskit/en-us/regentry/58752.asp





In addition the following suggestions were made and brought back to the Network group:



In order to test this theory out you can try a couple of things.



continued...


Message 2


continued...

1) Test if this may be an issue by changing the MTU setting as is
suggested in the SR on SW and the MS link provided. This is a client
side based fix. If the synch no longer fails after implementing the
change and a healthy connection can be established over and over then
you would have a good indication that this was the problem. A global fix
then would be required. This needs to be worked out by the Network
group in that case rather then going from Client PC to Client PC and
changing the setting.

2) You can try to run a 'netstat -a' during a synch session. A healthy
connection will have the value of ESTABLISHED or LISTENING. If you get a
value of SYN_SENT then this indicates a network connection problem and
most likely show that packets are being dropped.

3) DSL connections are more susceptible to this issue and it may be
possible to alter the MTU setting on the user's router. Again this is a
client side fix. You may get into the issue of user's having different
routers and setting so this approach is not as attractive.



Siebel Technical Support










Applies to:


Siebel System Software - Version 7.5.3 [16157] and later
z*OBSOLETE: Microsoft Windows 2000

Product Release: V7 (Enterprise)

Version: 7.5.3 [16157]

Database: Oracle 9i

Application Server OS: Microsoft Windows 2000 Advanced Server SP 4

Database Server OS: HP-UX 11i



This document was previously published as Siebel SR 38-1719219409.

***Checked for relevance on 11-NOV-2010***







Symptoms


SBL-NET-01201, SBL-SSM-00003


Hi,



We are experiencing occasional HTTP 500 errors with our Inbound HTTP EAI
interface. We have two web servers dedicated to EAI requests that are
load balanced using Windows 2000 NLB. These pass requests in a load
balanced Resonate environment to two application servers both of which
run the EAI Object Manager.



The vast majority of transactions are successful with each app server
processing 30,000+ tasks each day. However, both of our web server logs
show the following occasional error.







GenericLog
GenericError 1 2005-01-17 16:31:27 (smconn.cpp 5(367) err=1801201
sys=10060) SBL-NET-01201: Internal: connect() failed: Connection timed
out.

GenericLog GenericError 1 2005-01-17 16:31:27 (ssmsismgr.cpp 83(256)
err=5600003 sys=0) SBL-SSM-00003: Error opening SISNAPI connection

GenericLog GenericError 1 2005-01-17 16:31:27 Login failed for Login name : tibcoadmin

GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] Open Session failed (0x6ce5) after 22.9520 seconds.

GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE]
Impersonate failed. Login failed attempting to connect to %1

GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE]
Set Error Response (User: tibcoadmin Session: Error: 00027877 Message:
Login failed attempting to connect to
siebel.TCPIP.None.None://10.97.251.155:2320/sbl01p/EAIObjMgr_enu)

GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE]
Error Child Messages : <0> Login failed attempting to connect to
siebel.TCPIP.None.None://10.97.251.155:2320/sbl01p/EAIObjMgr_enu<1>
Login failed. SBL-SSM-00003: Error opening SISNAPI connection

GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE]
HTTP Status 500 : Error The service request could not be processed.
Please check that the user name and password are correct, and that the
request format is correct





If the problem persists, please contact the system administrator to get
more detailed information and to check the system configuration.

GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE]
Login failed. SBL-SSM-00003: Error opening SISNAPI connection




This error happens approximately 30 times a day on each web server
and although this is a small % of total transactions it is important we
eliminate these errors.


At the times the errors occur, we have found nothing in the app
servers logs, nothing indicating a problem in the Resonate Message
console, none of our hardware is under load (every server has an
abundance of CPU and memory available), no indications of network
problems exist



We are also curious as to the time period it takes for the timeout. The
22.9xxx seconds consistently appears in our logs and are wondering where
this timeout is set. Is this configurable or this an internal value
with the SWSE?









Cause


recommended TCP/IP settings were not implemented.



Solution




Customer implemented the TCP/IP registry changes described by the EVT
tool on their web servers and application servers. After these changes
the timeouts was almost completely disappeared.



HTTP_INACTIVE_CONN_TIMEOUT and SERVER_INACTIVE_CONN_TIMEOUT parameters
only need to be set in each node (Machine) that is running Resonate in a
Siebel Enterprise application but EVT recommended them on Web Servers
too.

Change request # 12-SWZLYA has been logged to address this.



TCP* parameters are suggested for Windows platform and the information
is available about these parameters on the Microsoft site or What is true benefit, if any, of changing TCP registry parameters as recommended by EVT? (Doc ID 499134.1)



These parameters are important as Siebel Server must have network access
to other Siebel components, such as the Siebel Gateway Name Server, and
the Siebel Database server and SWEApps.


References


NOTE:499134.1 - Benefit of Changing TCP Registry Parameters as Recommended by EVT Utility










Applies to:


Siebel eConfigurator - Version: 7.8.2.8 SIA [19237] and later   [Release: V7 and later ]
Information in this document applies to any platform.



Goal



The set up at the customer’s end which are all on Solaris OS :

- The Production Environment has 4 Remote eConfigurator Servers (PRD_ISS1, PRD_ISS2, PRD_ISS3, PRD_ISS4)

- The following Parameters have been set on each of the 5 Production OMs (PRD_OM1, PRD_OM2, PRD_OM3, PRD_OM4, PRD_OM5):

* Product Configurator - Remote Server Name - PRD_ISS1;PRD_ISS2;PRD_ISS3;PRD_ISS4

* Product Configurator - Use Remote Service - True

Issue:

- We had a hardware failure of PRD_ISS4. The physical server was off-line.
- The OMs were still attempting to send requests to PRD_ISS4.
- The request was hanging on PRD_ISS4 for 4 minutes. This was causing 25% of Production user sessions to hang.

Questions for this:
Is this normal behavior? This is causing OMs dependent on the IIS session to hang.
Can we dynamically change this parameter to remove an IIS server from the pool?
Are there additional settings required to have this work correctly?
Let me know what needs to be rectified to prevent the 4-minute delay when a single eConfigurator server is offline.

The
issue is that Siebel Callcenter AppServers keep sending requests to an
ISS server that is no longer available. We need to develop a plan to get
a fix or workaround for this issue. Immediate questions are:

1) What is the recommendation on timeout value change? Is the timeout value change possible?

2) What does Oracle recommend in the case of another similar failure with one of the ISS servers becoming unavailable?

3)
Even with a lower timeout value, I suspect that we will still see a
delay and error with Callcenter trying to connect to the failed server.
How can this situation be avoided or minimized? (i.e. a patch with
smarter load-balancing mechanism?)

What we are seeing however
differs if the configurator server is down, as opposed to having just
the eConfigurator Component disabled.
ie eProdCfgTimeOut is set on the AOM to 5 seconds.
-
If the eCfg Server is shutdown we get a 4 minute timeout (user sees
this as a hang) before rerouting configurator session to another server.

- If the sCfg Server is online, but the Configurator Component is offline, we see the 5 second timeout.



Solution



EProdCfgTimeOut is the setting in Seconds that determines the
time for which the application server would try to initiate a connection
with the remote configurator server before returning error to user.
However, irrespective of the timeout setting, the requests would still
get routed to the remote server. This setting only determines for how
long it should try to contact the remote server before returning an
error.

Answers :

Regarding the timeout we need to distinguish between 2 scenarios:

1)
Siebel server is down but operating system is still up and running. In
this case parameter eProdCfgTimeOut is used to check whether a
connection can be made within the defined timeframe (i.e. 5 seconds).
Please note that here we still have a running tcp/ip stack (OS is up and
running) which can accept a connection and returns an error because of a
missing port (eCfg OM is down). So this will always work.

2)
This is the bad situation. The whole machine is down, away, destroyed,
not reachable etc. In this case no tcp/ip stack is running on the other
side and a tcp/ip request will simply wait for a specific time before it
returns with an error message. This is different for the used OS. For
Windows you have to wait around a minute. For Solaris about 3-4 minutes.
Anyway it is a OS parameter which can be changed but it is not a Siebel
parameter.

That's why we have this problem at customer’s side as they are using Solaris. There are some approaches for this issue:
a)
using a hardware balancer. This is not documented or tested. So we
cannot tell whether this will work for remote eConfigurator

b) changing the Solaris OS parameter for the tcp/ip timeout setting
Parameter = tcp_time_wait_interval
default 240000 (2MSL according to RFC 1122) = 4 minutes

Please
discuss this setting with Sun. You could change this to 10000. You will
find a description with google, i.e.
http://www.sean.de/Solaris/soltune.html#tcp_time_wait_interval

Additional Comments :

Whenever you see the error message

SBL-NET-01201: Internal: connect() failed: Connection timed out

in
the log file, then we are dealing with a network issue and not Siebel
issue. In this case we always see the delay, which is a TCP/IP issue and
Operating System dependent. This happens if the machine is down which
should answer.

Therefore, in your case, Siebel works fine. The
current issue is that the delay seems not to be reactive on your OS
specifications.
Again whenever we see error message "SBL-NET-01201:
Internal: connect() failed: Connection timed out" then this is not a
Siebel issue but a network issue (i.e. machine is down). In this case
the delay is dependent on the TCP/IP stack and it's implementation. This
is out of the control of Siebel and this is not an eConfigurator issue
anymore.

Therefore, if our earlier suggested parameter does not
show the effect it should, then kindly address this with a service
request in the Core Server Technologies area and also with Sun, as it is
a Solaris issue.

One of the possible solutions we discussed was
changing TCP/IP parameters but this has effect for the whole machine,
OS, and all software running on this machine. So this should be directly
discussed with Sun and customer as we can only discuss the part for
Siebel but a change of this parameter has effect for the whole machine
and systems installed here. Changing this parameter would probably need a
check of their network etc as well.


We have also raised an Enhancement Request # 10559309
to see if there is any possibility of addressing this in the long term.
Enhancement Requests are reviewed, prioritized and if found viable,
implemented in a future release.

Additional suggestion:

In cases when a machine is down and the administrator knows it, you may consider the following approaches:


a.) Remove the machine from network and put in a simple PC with the
same IP address. This should avoid 4 minutes time out problem.


b.) Try with Administration - Product > Cache Admin. The idea is
to set up the cache without using the broken machine. This would avoid
the time outs as well.
For more information about the Configurator
Caching please refer to Performance Tuning Guide 7.8 > Tuning Siebel
Configurator for Performance > Administering Siebel Configurator
Caching > Cache Management for Siebel Configurator
(http://download.oracle.com/docs/cd/B31104_02/books/PerformTun/PerformTunConfigISS13.html#wp1063160)








Applies to:


Siebel CRM - Version: 8.1.1.5 [21229] and later   [Release: V8 and later ]
Information in this document applies to any platform.



Symptoms



On : 8.1.1.5 [21229] version, System Admin



After installation and configuring Siebel application

the following error occurs.



ERROR

-----------------------

Server Status is "Handshake failed"


The following error appears on srvrmgr.log file :


SessMgr
ConnOpen 3 000000084f3e0040:0 2012-02-17 16:12:21 1: [SESSMGR]
Open(siebel://machine1:49174/mbprod/servermgr/machine1, 60, -1)

SisnTcpIp
SisnSockWarning 2 000000084f3e0040:0 2012-02-17 16:13:36 1:
[TCPIP-client] connect() to machine1:49174 failed (err=78 | Connection
timed out)

SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17
16:13:36 1: [SMCONN] Failed to open connection to
(siebel://machine1:49174/mbprod/servermgr/machine1) in 75 sec(s)

GenericLog
GenericError 1 000000084f3e0040:0 2012-02-17 16:13:36 (smconn.cpp (284)
err=1180849 sys=78) SBL-NET-01201: Internal: connect() failed:
Connection timed out

SisnTcpIp SisnSockDetail 4
000000084f3e0040:0 2012-02-17 16:13:36 1: [TCPIP-client] socket()
closed descriptor = 5 from :0 to :55452

SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] Error closing connection object

SessMgr
MsgReceive 5 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR]
CB: conn 0x0, url NULL, mbuf 0x0, mlen 0, err 3670029

SessMgr
SessMgrGeneric 4 000000084f3e0040:0 2012-02-17 16:13:36 1:
[SESSMGR] conn 0x0: found error code (3670029), error info (NULL)

SessMgr
ConnClose 5 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR]
conn 0x0: ctx 0x20753b48, url 0x<?INT?> cleaned up

SisnapiLayerLog
Trace 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SISNAPI]:
releasing connection (0x20753e20), refCount = 0

SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] Open has taken 74.9 seconds so far, timing out

GenericLog
GenericError 1 000000084f3e0040:0 2012-02-17 16:13:36 (ssmsismgr.cpp
(544536816) err=0 sys=-752920388) SBL-GEN-00000: (ssmsismgr.cpp:
544536816) error code = 0, system error = -752920388, msg1 = (null),
msg2 = (null), msg3 = (null), msg4 = (null)




STEPS

-----------------------

The issue can be reproduced at will with the following steps:


1. install siebel siebel application

2. configure

3. status is Handshake failed


Cause



Issue was caused by incorrect settings on hosts file




Solution



Customer resolved the issue , and updated the SR with the following information :


"The issue was resolved after rectifying the hosts file. It had two entries with two different IPs for the same hostname."

Thank you.

Oracle Product Support - Siebel CRM

 


No comments:

Post a Comment