Execution
The ability to collect, quantify, evaluate and enrich your data.
Introduction
SOC's ability to develop good detections for different techniques and tactics relies heavily on its ability to execute them. Short story is logging is the dark side of detection.
Evaluating your data quality and visibility to measure your detection execution capabilities is not an easy task. In this part we will be talking about two main drivers of this second dimension which are:
Event Visibility : Data prioritization, collection and processing.
Identify critical data sources.
Define your collection strategy.
Define your log storage approach.
Data Observability
Event Traceability: Data source quality and richness
Evaluate the quality of your data
Event Visibility
Data is air the detection breathes, without it, it is dead. Insuring you have good visibility over data sources that matter to you is crucial. There are 4 main things to keep in mind derived from 4 questions when you need to evaluate your visibility from a SOC perspective:
What do you need to collect ?
How you want to collect it ?
How are you planning on storing it ?
How could you know when you're not collecting it ?
Identify Critical Data Sources
Create a Collection Strategy
Your collection strategy can be driven by security operations like compliance requirements or threat detection like detecting post-exploitation techniques used in windows environments. Your collection strategy can also be helpful for tuning purposes to reduce alert fatigue, false positives, data storage costs and optimizing EPS licensing price. Here are some concepts to take in consideration.
Volume vs Relevance: Depending on what drives your log collection strategy (SecOps or Threat Detection or hybrid) if you did the previous step "Identify Critical Data Sources" you now know what do you need to collect and you can build a balanced and hybrid list of when you need to collect everything and when you will go with most relevant events for your use cases.
Log Retention Policy : Your log retention duration can be influenced by regulatory requirements and your detection needs like how far your analysts look back when they're investigating alerts or hunting threats, usage of historical correlations can also be impacted by your log retention policy. We will be talking about this next in Log Storage.
If you're looking for how to configure Windows Event Forwarding here is a great blog from Elastic on how to configure WEF/WEC :
Log Storage
Your SIEM's database type and log storage approach matters and affects your execution capabilities if you care about speed of your queries and availability of your data.
Database Types
There are two main types of SIEM databases for a logging use case:
Schema-on-Write
Schema-on-Read
Schema-on-Write databases define the schema of your data; i.e. fields, structure, and mappings of data at ingestion time but Schema-on-Read applies the schema at search time, meaning that if you run a query on a Schema-on-Write database type the search time decreases but the ingestion time increases because your data is already indexed mapped and the heavy work was done at ingestion time, however in the other hand Schema-on-Read would prioritize ingesting data to avoid data loss hence increasing your queries' search time.
Several SIEMs claim they can do both however trust but verify and going for a hybrid approach is recommended in a security monitoring use case, so here are some pros and cons for Schema-on-Write approach:
Pros:
Faster search
Less computing resources are needed
Easier correlations.
Cons:
Writing data to disk speed could be affected and data loss is a risk too (may impact forensic evidence compliance for a court case).
Requires knowledge of your data models
Difficult to handle unstructured data.
Reference :
Storage Architecture
Each SIEM vendor have their own methods for sizing a security monitoring solution storage capacity but most of them adopt the same approach internally, Hot, Warm and Cold architecture :
Hot: Most recent and active logs to monitor. These nodes are know to have fast disk writing capabilities (SSDs) and low storage capacity. Most analysts or threat hunters use a duration from last 15 minutes to last 7 days. Depending on your query rate and look back time, log retention on this type of nodes must preferably be set on 30 days or more.
Warm: Once past the time frame of the most use, logs can be moved from SSDs to slower but larger mediums like Hard Disk or Tape. These are typically stored for at least 90 days.
Cold: Beyond the first 90 days, the chances of needing a particular log file is slim, but not none. Cold storage is a cheap long term solution, but will take a long time to spool back up for use if needed.
Reference :
Data Observability
Do you get notified when a data source is down or a new one is integrated ? Can you tell when the data stream of last week is much less than the week before or the month before? Do you know when an important event is no longer collected from a specific asset? do you notice when an event field is no longer populated?
Observability is maintaining a data pipeline with minimal downtime and maximum reliability by running regular health checks. Data observability is important for your detection engineering and can be applied on many level
Index level : When an index stores much less data than the usual.
Log source type level : When a data source type like a firewall cluster or web servers stop sending events.
Asset level : When a log source stops sending data.
Event level : When an Event ID for example stops being recorded from a data source
Field level : When Process Command Line field for example stops being populated.
Event Traceability
After approaching Event Visibility and defining relevant data sources, Event Traceability comes to help you estimate the reliability of your data on a much deeper level in order to have simplified and trusted detection implementations. The following are some use cases for evaluating event traceability of some data sources.
The success of event traceability requires a data model in place to parse an normalize your events. I won't be going through these aspects since this article is already long enough but I will suggest great references to check out.
Example 1 : Antivirus Data Traceability
I started by defining the event types that I will need in my security operations and threat detections development no matter who is the vendor. For example I will need to be informed when a virus is detected or when an asset's license expires or when a virus deletion fails...etc. After that, I listed the event field that must be present for each event type and evaluated it based on the following color scale:
GREEN : Collected, Parsed and Normalized
YELLOW: Collected, Parsed but Not Normalized
ORANGE: Only collected
RED: Not Collected
GREY: Not Applicable
Example 2 : Windows Authentication Event Traceability
Example 3 : Windows Process Creation and DNS Event Traceability
As you can notice in the following figures, doing such exercise let you know what events you can rely on with relevant data during your detection engineering process. For example, Process Command Line is not audited by default in EID 4688 and Process is very helpful for correlations as it can be used by SIEM platforms like Elastic to create a process tree context of execution.
This example highlights the differences between Windows DNS Debug logs which are written to a file on the Windows DNS Server and Sysmon's EID 22 which is recorded on Windows Event Logs and generated on the client side. It is important to know these differences since your agent should be able collect Windows Server DNS logs only if it can read and parse them from the dnslog.txt file written to disk. If you're using WEF/WEC to collect logs you won't be able to collect them from a file also using an agent like Winlogbeat won't do it unless you're using Filebeat (yet another agent). QRadar's Wincollect for example can collect both Windows Event Logs and DNS Debug logs but custom filtering can be more complex.
Last updated