Dealing with an incident requires not only prompt notification of the incident, but the ability to sort out the cause of the incident, the ability to perform forensic analysis, identify what other systems, users, devices and applications have been compromised or affected by the incident, identify the extent or impact of the incident, the duration of the activity that led to the incident and many other factors.
In other words, notification of an incident is only the first step in a complex journey that could potentially lead to the discovery of a major cyber breach, or perhaps the reversal of a non-breach. totally benign incident.
While security orchestration response and automation (SOAR) solutions help automate and structure these activities, the activities themselves require telemetry data that provides breadcrumbs to help delineate, identify and potentially remedy the situation. This is becoming increasingly important in the cloud for several reasons:
- The shared security model of the public cloud can lead to gaps in telemetry (for example, a lack of telemetry from the underlying infrastructure that could help correlate breadcrumbs at the infrastructure level at the infrastructure level. ‘application).
- Lack of consistency in telemetry information, as applications increasingly segment into microservices, containers, and Platform-as-a-Service, and as various modules come from different sources such as internal development, open source, commercial modules and outsourced development.
- Bad configurations and misunderstandings as control switches between DevOps, CloudOps and SecOps.
- All this coupled with a significant expansion of the attack surface with the decomposition of monolithic applications into microservices.
When incidents do occur, the ability to quickly assess the scope, impact, and root cause of the incident is directly proportional to the availability of quality data and its ability to be easily queried, analyzed, and dissected. As businesses migrate to the cloud, logs have become the de facto standard for telemetry data collection.
The challenges of relying almost exclusively on logs for telemetry
The first problem is that many hackers and bad actors turn off logging on the compromised system to hide their activity and footprint. This creates gaps in telemetry that can significantly delay incident response and recovery initiatives. On occasion, DevOps teams may also reduce logging to end systems and applications to reduce CPU usage (and associated costs in the cloud), resulting in additional gaps in telemetry data.
A second problem is that logs tend to be large and in many cases written by developers for developers, leading to too much and possibly irrelevant telemetry data. This increases the costs of storing and indexing this data, as well as longer polling times and more effort on the part of the party sifting through this data.
Finally, the log levels can be increased or decreased, but ultimately the logs themselves are predefined because they are built into the code. Changing the information published in the logs is not something that can be done in real or near real time in response to an incident, but may require code changes, resulting in significant delays and impaired incident response capacity. .
The 3 Rs of telemetry
This brings us to the 3 Rs of telemetry – Reliable, Relevant and Real-time.
To meet rapid response needs, telemetry data must be reliable, i.e. available when needed and without gaps introduced by malicious actors or even inadvertently by various operators due to misconfiguration. or poor communication. It must be relevant, that is, it must provide meaningful actionable information without dramatically increasing costs or request times due to excessive, duplicate, and irrelevant information. And finally, it must be real-time, that is, the telemetry data stream can be changed and new telemetry data or additional telemetry data can be derived with a single click.
A great way to supplement logs in the cloud and process the three Rs is to use telemetry data derived from observing network traffic. After all, command and control activity, lateral movement of malware, and data exfiltration all happen over the network. If end systems or applications are compromised and logging is disabled at the server or application level, network activity continues and may continue to capture breadcrumbs identifying malicious activity.
Network-based telemetry can provide reliable information flow even when terminals or terminal systems are compromised or affected. Metadata generated from network traffic can be surgically tuned to provide highly relevant and targeted telemetry flow.
Security operations teams can select from thousands of metadata elements specific to their use case, such as focusing on DNS metadata or metadata associated with remote desktop activity, and remove any ‘other network metadata that may not be relevant, thus reducing costs and (equally important) being able to write targeted queries. And, if it becomes necessary to extend or modify the telemetry data being acquired, it can be easily modified at the network level without requiring any modification to the application. A simple API call can modify network metadata which is captured in near real time.
As businesses look to migrate to the cloud, supplementing their log sources with network-based telemetry will prove invaluable in strengthening their security and compliance. In this sense, network-based telemetry is an essential element in securing the move to the cloud.