A Day in the Trenches: My life in the SOC before, and after, automated investigation
Greetings from the SOC – the Security Operations Center. Not every company has an official “SOC” but I think it’s fair to say that by now, most medium and large organizations have deployed multiple security products. And there’s a team of IT or security people who manage them. I’m one of those people. I’ve worked in IT security for over ten years and today I’m a senior cyber security analyst working in a SOC at a large global company. (I can’t tell you which one, because it’s company policy. Sorry!)
Every new product that we install adds a layer of security, but it also generates a lot of work for the team. If you don’t work in security yourself, you might think that you install products that are designed to block or detect threats and that’s it. Unfortunately, that’s not how it goes. Most of the time, the best security products can do is tell you that something suspicious has happened. Then it’s up to you to decide what that means. So the biggest challenge for us is not deploying the products and setting them up, but fielding the alerts that they send every day. Some of the products in our company send as many as 1,000 alerts each day. (I’m not exaggerating!)
SIEM – Great When it Works
That’s one of the reasons why most organizations over a certain size try to aggregate all of the alerts into a SIEM (Security information and event management) system. In addition to saving all of the logs for compliance purposes, the SIEM also allows you to create rules that correlate groups of related alerts. This helps reduce the information overload and focus our attention on probable incidents.
In my last job, we used a SIEM system. In addition to the standard alerts it provided, we created rules that were quite effective at pinpointing certain types of incidents. For example, we defined a correlation between the IPS and the firewall so that an alert was triggered when an event was generated by the IPS and the same source IP was “allowed” by the firewall. Another rule alerted when a DLP event was generated by an endpoint that also had an anti-virus alert. We even checked whether an employee was physically present (using data from the physical security systems) to detect suspicious login attempts when an employee was off site.
But there were problems with this approach. The biggest challenge was figuring out which rules we even needed. Basically we had to use our own experience and observations and generalize that into patterns that the SIEM could identify.
It was pretty complicated to define a rule so not everybody on the team was able to do it. But the really hard part was keeping the rules up to date. Every time somebody on the IT team changed a device policy or moved a server, it broke a SIEM rule. It often took us time to realize that the rule was even broken, and even more time to fix it. We spent a lot of time and aggravation on this.
And at the end of the day, we still had way too many alerts to investigate. Even after the SIEM correlated groups of alerts into cases, we still had hundreds of cases to investigate.
But What About Investigation?
Even if the SIEM rules work, they only tell you that there is the likelihood of an incident. You still need to look at all of the alerts in in the case and figure out what they mean. That involves finding the common denominators and then going to hunt down and collect forensic data from your network and your endpoints. The alerts only give you one part of the picture. Without the forensics, you can’t understand what really happened.
Then, you need to put the puzzle together. Since none of these products “talk” to each other, it’s up to us to find the patterns and create a timeline of the incident.
All of this digging takes hours – for something minor – and days if we’re talking about a real breach. On most teams, there aren’t many people who understand the data (especially the network forensics). So the investigations pile up and wait for the more experienced analysts – bad news if there is a breach in progress.
Another problem is that many of these cases turn out to be false positives. Hours and hours of work down the drain.
A New Job and Another Way
A year ago I switched companies and joined a new SOC that works differently. We have a SIEM-type solution that we use for log collection and to meet our regulatory compliance obligations. But we don’t rely on the SIEM to detect incidents.
Instead we use the Verint Threat Protection System. It automates the whole process for us. It includes detectors spread out across the attack chain (different attack vectors, command and control, lateral movement) and different places in the organization (network, files, endpoints). It also collects forensics data from the network and endpoints. It consolidates all of the sensors and collectors into a unified workflow, with all parts of the system working together to create a clear picture of what’s actually going on.
No more alerts!
But it doesn’t send us alerts. That’s right – no alerts! Verint TPS sends us incidents. How does it do that? It automates a big part of what we analysts do.
Vertint TPS automates the investigation process. When it receives an alert from a detection sensor, it goes and collects the forensics needed to verify it. When another alert comes in, it checks whether it’s related to the first one, or if it’s something else. Without stopping for a coffee break, Verint gradually builds incidents and hands them over to us in the SOC for a final analysis. It’s the ideal balance between automation and human investigation. Verint TPS does the leg work for me, lays out all of the information, but lets me make the final decision.
So now, instead of getting thousands of alerts and groups of alerts, I get between 10-20 incidents each day. An incident is a complete timeline of the attack chain, with all of the alerts and forensic evidence documented clearly, in one place. It’s like an operational description of the suspicious activity broken down by files, network and endpoints. So it’s really easy to understand what is going on and to make a confident decision about how to deal with it.
My experience as an analyst is just as important as before, but I’m not wasting my time on legwork anymore. It’s like having a whole team of junior analysts working for me around the clock!
Speaking of junior analysts… one of the great things about Verint TPS is the way it makes it easier to share information. All of the relevant data for an incident is in one place and the user interface is really clear, so any member of the team can understand what it means. The forensic data is attached to the relevant alert so you don’t need to be a network expert to find it.
Because it’s so easy to use, we’ve been able to train students to work as junior analysts in our SOC and cover those night and weekend shifts that were always a problem to staff. They can handle most of what needs to be done and when they do need to escalate, it’s very easy to hand off the incident because it’s fully documented. It’s also become much easier to hand over an incident to somebody else after a shift, and to visualize risks for senior management.
Since I’ve stopped chasing alerts, now I have time to research, plan and implement better security for my company. I also enjoy training the next generation of security analysts. Are you looking for a job?!
*The writer is a SOC Manager in a large, global organization.