Discussion:
[pfSense] System stats: HUGE SPIKE, then failed.
Karl Fife
2018-04-03 21:50:39 UTC
Permalink
There was just now a sudden spike in states, ~100x the normal number,
maxing out the system max in just an hour, and causing the system to fail.

With a maxed out state table, of course the system fails to process
traffic.  Has anyone seen something like this before, or have any ideas
what kinds of things would look like this?

Monitoring PNG attached.

For us, on a normal day, the system hovers around 7-15K states. Just
before noon today, the system suddenly started adding states at a rage
of about 9K per minute until the system maxed out (at 800K states in
just under an hour and fifteen minutes).

Failure mode analysis was difficult because we couldn't access the WebUI
or SSH becasue (of course) the LAN interface couldn't allocate a state
for the connection, so we had to restart (hoping to find something in
the logs.  Logs were not helpful because the circular logs were too
small (subsequently "embiggened" of course), but more to the point, the
offending states wouldn't be logged anyway, so that won't tell what IP
or IP's belong to the offending states anyway.

Going forward:

The ~1 hour window in which to do forensics (when/if this happens again)
is quite small, so I wonder if there is a way to have growl generate a
notification when say, states exceed a certain threshold, so we can at
least pay attention while it's happening.  Any tips on notifications?

Probably irrelevant, but this is: pfSense 2.4.2R p1 AMD64 on a
Supermicro Rangely/Atom ECC, ZFS

Thanks!
-Karl
Nishant Sharma
2018-04-04 01:45:41 UTC
Permalink
Hi Karl,
Post by Karl Fife
There was just now a sudden spike in states, ~100x the normal number,
maxing out the system max in just an hour, and causing the system to fail.
With a maxed out state table, of course the system fails to process
traffic.  Has anyone seen something like this before, or have any ideas
what kinds of things would look like this?
Monitoring PNG attached.
It potentially seems like a malware or trojan trying to make connections to the hosts on the Internet.

tcpdump might show what's happening.

I have seen recent increase in machines infected with WannaCry or Feodo Trojans to incessantly try and connect to a lot of hosts on the Internet on port 21 or 445. It increases state count by 10x at least.

Regards,
Nishant
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Continue reading on narkive:
Loading...