Build Security You Can Trust
I woke up and couldn’t get back to sleep, so I just read a good post by Richard Bejtlich on his site TaoSecurity.blogspot.com.
In his post, Richard talks about the investigative process he follows when he sees an alert on his system. While it is a thorough and very successful process, he’s right to assume that there will be someone who says ‘that takes too long’ because it does. I see the following problems with such a manual and in-depth investigation, at least on a relatively insignificant alert (two syn packets to a non-critical port that nothing is listening on):
- With such an intense process, its human nature that it won’t be followed exactly the same every time. All people are, at some point and to some degree lazy. The easier and quicker the task, the more likely we are to do it.
- Time spent on insignificant incidents such as this take valuable time from more relevant incidents. Richard did say it was a trivial example, but surely he could have used a more relevant attack. As it stands, his example does not support his argument.
When you do security right, you build a platform that you can trust to a high degree. For example; create a predictable, reliable system when building and maintaining your firewalls. Build and maintain all of them using the same process. Document the process and then routinely audit the firewall policies, the firewall software and/or firmware and your firewall maintenance process itself. Then, when you see an attack on a port that is insignificant to your environment, you can trust your system and save valuable cycles for another, more relevant attack, say on port 80 against a web server.
Where Richard and I differ in our security ideology is Richard is reactive and I am proactive. Even the name of his ideology implies reaction; Network Security Monitoring. Now take a breath and don’t let your mind start formulating your flame against me for dismissing logging and monitoring. That’s not what I’m saying. I’m saying that highly effective security practices focus a majority of your time and energy on proactively preventing or mitigating attacks, while of course monitoring for those you can’t prevent, all of which is the very heart of risk management.
In my case, I trust my process of firewall and IPS building and tuning. I trust my process of end-system hardening and maintenance. I trust my process of containment. Therefore I greatly reduce the amount of energy and effort required to investigate alerts such as this. I know that I have already done my due diligence and addressed the risks my assets face and what I have left to worry about are the risks I must accept.
For example; I have a web server available to the public. Of all the risks that system faces, I have explicitly chosen to accept the risk of certain web-based attacks, simply by allowing port 80 access to that system. However, before I ever put the system into production I have done and documented my due diligence; I patched and hardened the OS, I installed and actively tune HIPS software, etc, etc. That process has eliminated a large portion of attack opportunities and has left me with a small and focused area of concern; the web service itself. I know with certainty (because I trust my system process) that for that system to be compromised, the web service itself must first be compromised.
Following my security ideology, the investigative process I’d follow if I received the exact same alert on my system would be as follows:
Here’s the alert taken from Richard’s post:
Count:1 Event#1.78890 2007-01-15 03:17:36 BLEEDING-EDGE DROP Dshield Block Listed Source 203.113.188.203 -> 69.143.202.28 IPVer=4 hlen=5 tos=32 dlen=48 ID=39892 flags=2 offset=0 ttl=104 chksum=57066 Protocol: 6 sport=1272 -> dport=4899
My process:
- I know nothing can respond to a packet on port 4899 because my firewall policies implicitly block access to that port. This specific alert is quickly becoming irrelevant and my concern level is actively reduced.
- I check to see what all my firewalls logged for this source IP. Again from Richard’s example, a total of two inbound syn packets and two outbound resets were logged (I chopped off the time info to fit my blog):
- Lastly I check the web server and HIPS logs on the target system. This will show if there were any web service attacks that made it past my border protection. Since Richard’s example only stated the two packets, the logs on the end-system will be clean.
1 203.113.188.203 -> 69.143.202.28 TCP 1272 > 4899 [SYN] Seq=0 Len=0 MSS=1460 2 69.143.202.28 -> 203.113.188.203 TCP 4899 > 1272 [RST, ACK] Seq=0 Ack=1 Win=0 Len=0 3 203.113.188.203 -> 69.143.202.28 TCP 1272 > 4899 [SYN] Seq=0 Len=0 MSS=1460 4 69.143.202.28 -> 203.113.188.203 TCP 4899 > 1272 [RST, ACK] Seq=0 Ack=1 Win=0 Len=0
At this point my investigation has shown me that my firewalls did their job and blocked whatever this was and I’m now able to move on to the next incident.
I don’t need to do a ‘whois’ on the source IP because I never reached a point where I was concerned with who was poking at me. That isn’t actionable information at this stage (and it has been my experience that these Snort rules that block entire subnets based on Dshield and/or other collaborative data mining are over-kill and counter productive for a global enterprise that expects to do business with, say Chinese clientele). I could at this point file this incident in a ‘look deeper later’ file so that I can be Curious George if I have spare time. However, with the sheer volume of attacks public-facing systems are exposed to day in and day out, compounded by the fact that few shops have the staffing necessary to address every single incident, its as important to know what to investigate as is it to how you investigate it.
Triage is one of the best skills an incident handler should have. Without it you risk wasting time on irrelevant attacks over here while your web server gets pwned over there.
I find it interesting that the Richard’s whole process was initiated by an alert, but Richard is arguing against alert-based systems. Even more interesting is Joe’s comment that states;
“Relying on alerts only would not allow you to notice that this guy is trying out some freshly compiled 0-days against your box.”
- This example didn’t show some guy trying zero-day exploits. It showed two syn packets destined for port 4899. In fact, it doesn’t even indicate an attack. It could be a mis-configured system on the source end.
- Even more to the point; I argue that this specific alert in this particular example is a waste of time because a) there are no systems listening on port 4899 and b) the Snort alert amounts to a ‘guilty by association’ alert that in many cases could be counter productive
Yes, its great to be inquisitive and explorative. Its better to be efficient and effective.

Hey Michael,
I saw your test post and have some questions, but I can’t find out how to contact you?!
Also, if you took this:
http://www.flickr.com/photo_zoom.gne?id=371144567&size=o
It’s awesome. I’m adding it into my rotation of backgrounds on my Macbook.
By Alex on 01.28.07 4:15 pm
Michael, you’ve completely missed the point. Of course it’s better to be proactive than reactive. Of course it’s better to prevent than simply detect compromise. The point you’ve missed is that 100% prevention is impossible and you need a system to identify the incidents you’ve failed to prevent. That requires having the right data available. NSM presents one way to do that. Your criticism also misses the point that I selected a trivial example knowing it was trivial.
By Richard Bejtlich on 01.28.07 9:14 pm
By the way, this is another problem:
“I know with certainty (because I trust my system) that for that system to be compromised, the web service itself must first be compromised.”
This is soccer-goal security and it will end up biting you.
By Richard Bejtlich on 01.28.07 9:32 pm
I had a long reply typed up but decided against it. I think your ego got a bit bruised, which was not my intention.
Below is what I also posted on your blog:
I think you’ve gotten a bit defensive. You said yourself you expect people to challenge the real world usefulness of NSM.
You provided an incident and clearly stated it was an example of how NSM is to be done. Granted you said it was a trivial example but its still a poor example to make your point because the example doesn’t indicate an attack (at any level) and you spent too much time in your investigative process making a decision about the incident.
It would have been better to provide fictional data or even a staged attack to better demonstrate the process.
Your opening paragraph indicated you were interested in how others find the answer to the problem ‘how do I know what happened…”
I thought you seriously wanted to know.
By Michael on 01.28.07 11:27 pm
I think you’re both correct, honestly. It is one of the unfortunate failings of security and IT in general that no two companies are the same, both in infrastructure and personnel. I think both your approaches are just fine.
By LonerVamp on 01.29.07 10:13 am
I think what set the whole thing down the wrong path was where I said he was reactive and I was proactive. What I should have said was he evangelizes reaction and I evangelize proaction. =)
By Michael on 01.29.07 11:05 am
I doubt Richard was implying that he goes through this process for every alert he receives. I think his goal was to explain the steps that can be taking when the data is made available via NSM. It may also be important to note, that I expect Rich was able to get all the information he used to do his analysis in under a minute. Probably under thirty seconds.
The process you followed works fine for you, but there are a number of conditions that can make those steps tough or nearly impossible for many organinzations. In your post you mention how you harden and manage “your firewalls” and “your webserver”. What about organizations where the responsiblity of server adminstration, FWs, and information security isn’t a one man shop? What if the organization has business units spread across the country and are trying to take advantage of scale buy using a shared services model for specialized skills like information security while the individual business units still maintain their own pc’s, servers, and network?
I am not even sure how to respond to that. Alerts are one of the four data
types that Rich requires when he defines NSM (alert, stats, session, and full
content). With Sguil we are trying to create work flow that supports event
driven analysis. The majority of that time, an alert is the driver.
Finally, I don’t think anyone is getting defensive or is feeling they are being attacked, I just think your post included a bit of misinformation.
By Bamm Visscher on 01.29.07 11:10 am
I understand now that he didn’t intend to imply NSM works for the entire range of incidents. My main argument at this point is the example he used. It doesn’t help his explanation, at least it doesn’t for me. For me it took away from the effectiveness of his explanation, pretty thoroughly.
Operational standards, documented procedures, and routine internal audits neatly address that situation.
Again, operational standards, documented procedures, and routine internal audits neatly address that situation as well. I happen to work for a company that fits your description and it works swimmingly for us.
Yeah, I could have done better with that one.
By Michael on 01.29.07 11:39 am
:::SIGH:::
Its not soccer goal security, its risk management and you glazed over everything preceding that line and took it out of context.
Here’s the bit you glazed over;
I’ve not only put extra goalies in front of the goal (HIPS, emphasis on the “P”), but I’ve made the goal smaller (OS hardening), and also installed TV cameras, seismographs, and bad oder detectors (all in the HIPS).
By Michael on 01.29.07 11:59 am
[...] Last weekend I wrote a post in response to one of Richard Bejtlich’s posts on his blog, in which he actively requested input and anticipated getting disagreeable opinions. I have just such an opinion and expressed it both on my blog and his. [...]