SNMP Perl question: Limits on gettable?

Discussion:

(too old to reply)

o***@LEFerguson.com

2016-05-30 01:21:59 UTC

Is this an appropriate venue to ask about the perl SNMP routines?

I am using the package version 5.7.3 on Ubuntu 15.40.

Generally it is working, but I have one routine that attempts to query a cisco table as below, and in most cases it works, but for one large ASA with a lot of tunnels, it fails to return all rows in the table, returning what appears to be a random subset of about half of them. Using the snmpwalk command works fine.

The code is as follows:

my %snmpparms;
$snmpparms{Community} = $community;
$snmpparms{DestHost} = inet_ntoa(inet_aton($IP));
$snmpparms{Version} = "2";
$snmpparms{UseSprintValues} = '1';
$snmpparms{UseEnums} = '0';
$snmpparms{UseNumeric} = '0';
$snmpparms{NonIncreasing} = '1';
$snmpparms{Timeout}=10000000; # need long timeout for large tables over WAN
$sess = new SNMP::Session(%snmpparms);

# Now pull in the correlation table so we know which are real tunnels
my $RtnCorrHash = ( $sess->gettable('CISCO-IPSEC-FLOW-MONITOR-MIB::cikePeerCorrTable') );

I've tried various combinations of the options in the session parameters without any change. I've also tried placing a column list on the gettable without any change.

I had this same problem with another large table and just converted it to specific calls to getbulk, but I rather liked using gettable in this case.

There is no error returned from the gettable call. The data returned is correct just incomplete.

There are 54 items in the table, and each one looks sort of like this; only 25 rows were returned the last few times I tried (I am not sure if it always stops at 25 or not):

'1.13.51.56.46.49.52.48.46.49.51.52.46.49.56.1.11.53.48.46.50.52.49.46.50.46.55.55.500.25126' => {
'cikePeerCorrIntIndex' => '500',
'cikePeerCorrRemoteValue' => '50.xxx.xxx.77',
'cikePeerCorrRemoteType' => '1',
'cikePeerCorrSeqNum' => '25126',
'cikePeerCorrIpSecTunIndex' => '25126',
'cikePeerCorrLocalValue' => '38.xxx.xxx.18',
'cikePeerCorrLocalType' => '1'
},

The Cisco device is fairly new and on a high speed LAN; snmpwalk returns data very quickly, so I do not think this is some kind of timeout issue or anything related to packet fragmentation. I am pulling a lot of data from it via zabbix and all those queries work fine.

Any ideas why it will not return all values?

Linwood

o***@LEFerguson.com

2016-06-01 16:45:12 UTC

Permalink

No one? Wrong place to ask? Ill formed question? Any hints?

From: ***@LEFerguson.com [mailto:***@LEFerguson.com]
Sent: Sunday, May 29, 2016 9:22 PM
To: net-snmp-***@lists.sourceforge.net
Subject: SNMP Perl question: Limits on gettable?

Is this an appropriate venue to ask about the perl SNMP routines?

I am using the package version 5.7.3 on Ubuntu 15.40.

Generally it is working, but I have one routine that attempts to query a cisco table as below, and in most cases it works, but for one large ASA with a lot of tunnels, it fails to return all rows in the table, returning what appears to be a random subset of about half of them. Using the snmpwalk command works fine.

The code is as follows:

my %snmpparms;
$snmpparms{Community} = $community;
$snmpparms{DestHost} = inet_ntoa(inet_aton($IP));
$snmpparms{Version} = "2";
$snmpparms{UseSprintValues} = '1';
$snmpparms{UseEnums} = '0';
$snmpparms{UseNumeric} = '0';
$snmpparms{NonIncreasing} = '1';
$snmpparms{Timeout}=10000000; # need long timeout for large tables over WAN
$sess = new SNMP::Session(%snmpparms);

# Now pull in the correlation table so we know which are real tunnels
my $RtnCorrHash = ( $sess->gettable('CISCO-IPSEC-FLOW-MONITOR-MIB::cikePeerCorrTable') );

I've tried various combinations of the options in the session parameters without any change. I've also tried placing a column list on the gettable without any change.

I had this same problem with another large table and just converted it to specific calls to getbulk, but I rather liked using gettable in this case.

There is no error returned from the gettable call. The data returned is correct just incomplete.

There are 54 items in the table, and each one looks sort of like this; only 25 rows were returned the last few times I tried (I am not sure if it always stops at 25 or not):

'1.13.51.56.46.49.52.48.46.49.51.52.46.49.56.1.11.53.48.46.50.52.49.46.50.46.55.55.500.25126' => {
'cikePeerCorrIntIndex' => '500',
'cikePeerCorrRemoteValue' => '50.xxx.xxx.77',
'cikePeerCorrRemoteType' => '1',
'cikePeerCorrSeqNum' => '25126',
'cikePeerCorrIpSecTunIndex' => '25126',
'cikePeerCorrLocalValue' => '38.xxx.xxx.18',
'cikePeerCorrLocalType' => '1'
},

The Cisco device is fairly new and on a high speed LAN; snmpwalk returns data very quickly, so I do not think this is some kind of timeout issue or anything related to packet fragmentation. I am pulling a lot of data from it via zabbix and all those queries work fine.

Any ideas why it will not return all values?

Linwood

Hans Jørgen Jakobsen

2016-06-03 18:38:40 UTC

Permalink

Post by o***@LEFerguson.com
No one? Wrong place to ask? Ill formed question? Any hints?

Post by o***@LEFerguson.com
Generally it is working, but I have one routine that attempts to query a cisco table as below, and in most cases it works, but for one large ASA with a lot of tunnels, it fails to return all rows in the table, returning what appears to be a random subset of about half of them. Using the snmpwalk command works fine.
my %snmpparms;
$snmpparms{Community} = $community;
$snmpparms{DestHost} = inet_ntoa(inet_aton($IP));
$snmpparms{Version} = "2";
$snmpparms{UseSprintValues} = '1';
$snmpparms{UseEnums} = '0';
$snmpparms{UseNumeric} = '0';
$snmpparms{NonIncreasing} = '1';
$snmpparms{Timeout}=10000000; # need long timeout for large tables over WAN
$sess = new SNMP::Session(%snmpparms);
# Now pull in the correlation table so we know which are real tunnels
my $RtnCorrHash = ( $sess->gettable('CISCO-IPSEC-FLOW-MONITOR-MIB::cikePeerCorrTable') );
I've tried various combinations of the options in the session parameters without any change. I've also tried placing a column list on the gettable without any change.
I had this same problem with another large table and just converted it to specific calls to getbulk, but I rather liked using gettable in this case.
There is no error returned from the gettable call. The data returned is correct just incomplete.
'1.13.51.56.46.49.52.48.46.49.51.52.46.49.56.1.11.53.48.46.50.52.49.46.50.46.55.55.500.25126' => {

You might wish you had put some xx in above line :-)

Post by o***@LEFerguson.com
'cikePeerCorrIntIndex' => '500',
'cikePeerCorrRemoteValue' => '50.xxx.xxx.77',
'cikePeerCorrRemoteType' => '1',
'cikePeerCorrSeqNum' => '25126',
'cikePeerCorrIpSecTunIndex' => '25126',
'cikePeerCorrLocalValue' => '38.xxx.xxx.18',
'cikePeerCorrLocalType' => '1'
},
The Cisco device is fairly new and on a high speed LAN; snmpwalk returns data very quickly, so I do not think this is some kind of timeout issue or anything related to packet fragmentation. I am pulling a lot of data from it via zabbix and all those queries work fine.
Any ideas why it will not return all values?

There might be errors in impl of gettable... Problems I would look for:

1) This mib uses 2 text as index. This might be a challenge.
Do it handle text of different lengths?

2) I think the table is dynamib. What happens whem a row (dis)appear
while walking?

Try making tcpdump to figure out where and why it stops.

Be aware that you probable will spend more time debuging than rewriting
the code to use snmp(bulk)walk :-)
/hjj

o***@LEFerguson.com

2016-06-03 21:31:14 UTC

Permalink

Hans, thank you for the answer. Some stuff embedded at LEF>

-----Original Message-----
From: Hans Jørgen Jakobsen [mailto:***@wheel.dk]
Sent: Friday, June 3, 2016 2:39 PM
To: ***@LEFerguson.com
Cc: net-snmp-***@lists.sourceforge.net
Subject: RE: SNMP Perl question: Limits on gettable?

Post by o***@LEFerguson.com
No one? Wrong place to ask? Ill formed question? Any hints?

I will try.
I have not used gettable in perl.
I wasn't aware that it was possible to specify selected coloums.

Have you tried the CLI snmptable? do it give all values?
Do snmpwalk throw some out of sequence warnings?

LEF> yes,CLI works fine, though what it returns is just the indexes and not the hash structure with decoded values that this table gives in perl.

Post by o***@LEFerguson.com
Generally it is working, but I have one routine that attempts to query a cisco table as below, and in most cases it works, but for one large ASA with a lot of tunnels, it fails to return all rows in the table, returning what appears to be a random subset of about half of them. Using the snmpwalk command works fine.
my %snmpparms;
$snmpparms{Community} = $community;
$snmpparms{DestHost} = inet_ntoa(inet_aton($IP));
$snmpparms{Version} = "2";
$snmpparms{UseSprintValues} = '1';
$snmpparms{UseEnums} = '0';
$snmpparms{UseNumeric} = '0';
$snmpparms{NonIncreasing} = '1';
$snmpparms{Timeout}=10000000; # need long timeout for large tables over WAN
$sess = new SNMP::Session(%snmpparms);
# Now pull in the correlation table so we know which are real tunnels
my $RtnCorrHash = (
$sess->gettable('CISCO-IPSEC-FLOW-MONITOR-MIB::cikePeerCorrTable') );
I've tried various combinations of the options in the session parameters without any change. I've also tried placing a column list on the gettable without any change.
I had this same problem with another large table and just converted it to specific calls to getbulk, but I rather liked using gettable in this case.
There is no error returned from the gettable call. The data returned is correct just incomplete.
'1.13.51.56.46.49.52.48.46.49.51.52.46.49.56.1.11.xx.xx.xx.xx.x.xx.xx.
.xx.46.xx.55.500.25126' => {

You might wish you had put some xx in above line :-)

LEF> Yeah, me too. Incidentally, all SNMPTABLE gives is the 2516 above.

There might be errors in impl of gettable... Problems I would look for:

1) This mib uses 2 text as index. This might be a challenge.
Do it handle text of different lengths?

2) I think the table is dynamib. What happens whem a row (dis)appear while walking?

LEF> That definitely can happen but is not happening in this case; the problem is 100% reproducible, but the table changes infrequently.

Try making tcpdump to figure out where and why it stops.

LEF> Good idea, though I do not know what to make of it. It captured exactly two packets, the request and the (incomplete) answer.

LEF> The (not working) gettable request looks like this: GetBulk(33) R=1924371077 N=0 M=200 .1.3.6.1.4.1.9.9.171.1.2.4 }

LEF> The (working) snmptable) request starts like this: GetBulk(34) R=957254588 N=0 M=10 .1.3.6.1.4.1.9.9.171.1.2.4.1.0

LEF> Notice it is returning 10 at a time, which appear to fit in one packet, then it does another separate getbulk using the last returned index as a starting point. Taking this as a guess I adjusted the "repeat" option to 10, and this returns the correct answer. The documentation says:

##This calculation is fairly safe, hopefully, but you can either raise or lower the
##number using this option if desired. In lossy networks, you want to make
##sure that the packets don't get fragmented and lowering this value
##is one way to help that.

LEF> The network is not lossy, though I guess it is possible the target ASA has some kind of limitation on its return packet sizes. Regardless, it would appear the API (vs. CLI) calculation is optimistic about how much data it can get in 1000 bytes (mentioned elsewhere), at least for this table. Frankly 100 seems large regardless as an arbitrary guess, the responses would need to be very tiny. The actual packet responded was complete (as in well structured, not truncated), and checksum matched.

Be aware that you probable will spend more time debuging than rewriting the code to use snmp(bulk)walk :-) /hjj

LEF> Actually I did that shortly after writing this, at this point it is more curiosity.

LEF> So this discussion does give me a fix, though at this point not sure I will go back and change the (working) code. Perhaps the maintainers will find some use in the discussion and example, and consider if the defaults are appropriate. I'm not sure if there is some way to know from the data received if the initial response was complete -- maybe it presumes since less than the requested number came back, it is complete (that would seem to make sense), but it definitely is not in this case.

LEF> Thank you for the response. I had assumed TCPDUMP would not help, really, since it was so repeatable I presumed it was not a network issue (and indeed it is not, at least in terms of loss). But the explicit request difference from the CLI made it obvious.

Linwood

Jeroen van Ingen

2016-06-08 11:12:08 UTC

Permalink

Post by o***@LEFerguson.com
LEF> Good idea, though I do not know what to make of it. It captured exactly two packets, the request and the (incomplete) answer.
LEF> The (not working) gettable request looks like this: GetBulk(33) R=1924371077 N=0 M=200 .1.3.6.1.4.1.9.9.171.1.2.4 }
LEF> The (working) snmptable) request starts like this: GetBulk(34) R=957254588 N=0 M=10 .1.3.6.1.4.1.9.9.171.1.2.4.1.0
##This calculation is fairly safe, hopefully, but you can either raise or lower the
##number using this option if desired. In lossy networks, you want to make
##sure that the packets don't get fragmented and lowering this value
##is one way to help that.
LEF> The network is not lossy, though I guess it is possible the target ASA has some kind of limitation on its return packet sizes. Regardless, it would appear the API (vs. CLI) calculation is optimistic about how much data it can get in 1000 bytes (mentioned elsewhere), at least for this table. Frankly 100 seems large regardless as an arbitrary guess, the responses would need to be very tiny. The actual packet responded was complete (as in well structured, not truncated), and checksum matched.

Recently I had to debug a similar issue. Of course there are a few
factors that determine how large a response PDU is; I'd like to point
out that there are at least two ways how a large response may not make
it back to the requester.

1) IP MTU mismatch: when the SNMP agent (the ASA in this case) is
configured to support jumbo frames (Ethernet) and has a larger IP MTU (>
1500), a large response PDU will be put in a large IP packet. It's UDP
transport; sender and receiver do not negotiate about payload size. A
large response may be dropped at various points in the path.

2) No MTU mismatch, but firewall rules that don't allow fragmented IP:
when the SNMP agent builds a response PDU that is larger than the
payload that will fit with the IP MTU, the result will be that the
response is split over an initial IP packet (that includes the UDP
header and first part of the payload), followed by one or more IP
fragments with the rest of the UDP payload. Some networks filter IP
fragments for security reasons.

Especially SNMP GETBULK responses can get pretty big, depending on both
the request parameters (like MaxRepetitions) and table design: both OID
length (eg tables with text-based indexes or indexes based on large
address fields) and value size play a role.

So when you get partial results or no results, always consider whether
response size could somehow be an issue.

Regards,
Jeroen