Episode 24 — Decide what to log and why: events that power detection and investigations
Logging is one of those security topics that feels simple until you are staring at an incident timeline with missing pieces. The forcing function is always the same: you cannot detect what you do not observe, and you cannot investigate what you did not record. Good logging is not about collecting everything your tools can emit, because that approach usually collapses under cost, noise, and operational fatigue. Instead, strong logging starts with priorities that match how attackers move and how defenders answer questions under pressure. When you decide logging priorities well, your detections become more reliable, your investigations become faster, and your post-incident learning becomes grounded in evidence rather than guesswork. The goal is to capture what matters most, in a way that stays sustainable month after month.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
Before you decide what to collect, you need to be clear about why you are collecting it, because different goals require different data and different fidelity. Detection is about catching suspicious behavior early enough to reduce impact, which usually requires event timeliness and enough context to avoid false positives. Investigation is about reconstructing what happened, which demands coverage, consistent timestamps, and detail that supports attribution of actions to identities and hosts. Compliance is often about demonstrating control operation over time, which tends to emphasize retention, integrity, and the ability to produce specific records on request. Troubleshooting is about operational stability, and it benefits from logs that explain failures, resource constraints, and service interactions even when nothing malicious is happening. When you make these goals explicit, you stop treating logs as a generic pile of data and start treating them as an instrument panel designed for specific questions.
Once the goals are clear, the next step is deciding which sources deserve your attention first, because not all log sources contribute equally to security outcomes. High-value sources tend to be the ones that represent control points and choke points where many actions pass through. Identity systems are high value because attackers cannot do much at scale without authenticating, stealing sessions, or abusing privileges. Endpoints are high value because user devices are frequent initial access points and often show the earliest signs of execution and persistence. Servers are high value because they host business logic, data stores, and administrative interfaces that attackers target for impact. Network telemetry is high value because it shows connections and flows across boundaries, especially when host-level visibility is incomplete. Starting with these sources is not a claim that everything else is irrelevant, but it is a recognition that early wins come from instrumenting the places where attacker movement becomes observable.
Identity logging deserves special emphasis because identity is where intent often becomes visible. Authentication events let you see who logged in, from where, with what method, and whether the attempt succeeded or failed. Authorization events and privilege changes reveal whether an identity gained capabilities it did not previously have, which is often a precursor to lateral movement and data access. Session events help you understand persistence and token abuse, especially when attackers avoid repeated password use by reusing valid sessions. Administrative actions in identity platforms, such as creating accounts, changing group membership, or modifying conditional access, are particularly sensitive because they can represent stealthy control over future access. When identity logs are high quality, they also allow you to tie actions across endpoints, servers, and cloud services back to a user or service identity. That linkage is the backbone of credible investigations.
Endpoint and server logging are where you capture what actually ran, what changed, and what new footholds were created. Process start events are foundational because they reveal execution, and execution is a necessary step for most attacks beyond simple credential abuse. File events and registry or configuration events help you detect persistence mechanisms and unauthorized modifications, especially when attackers try to survive reboots or blend into normal system behavior. Service start and stop events, scheduled task changes, and installation events often reveal attempts to embed malicious tooling in a durable way. On servers, you also care about application-specific security events, especially around authentication, authorization failures, and administrative operations that change configuration or access. If you only log network connections, you may see that something talked to something, but you will often miss what the process was and what it did locally. Host logs provide that local truth, which is essential for both detection and confidence.
Network logging adds value because it captures boundary crossings and relationships between systems that individual hosts cannot fully describe. Connections to unusual destinations, changes in typical traffic patterns, and unexpected lateral movement across segments can all be early signals of compromise. Network visibility also helps when endpoints are unmanaged, when agents fail, or when attackers tamper with host logging. That said, network data is easiest to misinterpret without context, because a single connection might be benign or malicious depending on the process, user, and timing. This is why network logs are most powerful when they can be correlated with identity and host events, rather than treated as a standalone truth source. When you decide what to log from the network, focus on the points that matter most, such as internet egress, ingress to sensitive segments, remote administration pathways, and traffic involving high-value systems. The goal is to make attacker movement expensive and visible, not to capture every packet as a default.
After you decide the sources, you define the key events that actually power detection and investigations. Authentication events matter because they establish access, and they help you spot password spraying, impossible travel patterns, unusual device sign-ins, and repeated failures followed by a success. Privilege use events matter because they show when an identity crosses from ordinary user behavior into administrative power, which is a common pivot point in real incidents. Process start events matter because they reveal the moment code executes, which is where you can spot suspicious tooling, unusual parent-child process relationships, and execution from odd locations. Change events matter because attackers make changes to persist, evade, and expand access, whether that means disabling security controls, altering configurations, or modifying access policies. The discipline is to treat these events as the skeleton of your logging strategy, then add supporting detail where it improves your ability to answer specific questions. If you cannot say which event types your detections depend on, you are likely collecting data that looks impressive but is not operationally useful.
A practical way to test your choices is to walk through a phishing-to-compromise scenario and ask what logs you would need at each step. You start with the initial lure, where email telemetry and endpoint events help you see whether a malicious attachment was opened or a link led to a suspicious download. You then move to execution, where process start logs and script interpreter activity help you determine whether code ran and what it attempted to do. Next comes credential access and lateral movement, where identity logs show abnormal authentication attempts and privilege use, and where network telemetry shows unexpected internal connections. Finally, you reach impact, which might include data access, encryption, or service disruption, where server and application logs reveal what data was touched and what operations were performed. When you do this exercise honestly, you will usually find at least one gap that would slow your investigation or force you to guess. That is the kind of gap you want to fix before the real incident, not during it.
The most common logging failure is attempting to log everything until storage collapses and teams drown in noise. Unlimited logging sounds safe, but it often creates the opposite outcome, because critical signals become harder to find and the operational cost becomes unsustainable. When storage costs spike, organizations tend to respond by shutting off large categories of logs abruptly, which creates blind spots and breaks detections. Even when storage is available, excessive verbosity can overwhelm ingestion pipelines, introduce latency, and degrade the usability of your detection platform. The right goal is not maximum volume, it is maximum value per unit of cost and attention. This is why prioritization matters, and it is why you should expect to make tradeoffs based on risk. A mature approach treats logging as a product you operate, with performance constraints and quality expectations, rather than as a one-time configuration task.
A useful quick win is defining a minimum viable log set for each asset class, because it turns logging from an abstract ideal into a concrete baseline. For identity systems, that minimum set typically includes authentication outcomes, privilege changes, and administrative actions that affect access. For endpoints, it often includes process start telemetry, security control status, and key persistence-related changes that indicate tampering or footholds. For servers, it generally includes authentication and authorization events, administrative changes, and application security-relevant events for the services hosted there. For network layers, it often includes boundary telemetry and key flows that indicate ingress, egress, and lateral movement around sensitive areas. The value of a minimum set is that it creates a floor you can defend, even when budgets are tight or environments are in flux. Once the minimum set is stable, you can iterate upward based on observed gaps, detection needs, and incident lessons.
To make logs useful, you also need context fields that allow you to correlate events reliably across sources. User identity is central, and it should be consistent enough that you can tie an endpoint action back to a login event and then to a server access event. Host identity matters too, including stable identifiers for machines and services, so that renames, scaling events, and ephemeral instances do not break your ability to follow a timeline. Time synchronization is often underestimated, but inconsistent clocks can turn an investigation into a confusing puzzle where cause and effect appear reversed. Network Time Protocol (N T P) is a common foundation for aligning timestamps across systems, but the principle is broader than one technology, because the outcome you need is consistent, trustworthy time. Outcome codes and status fields also matter because they allow you to distinguish success from failure and expected behavior from anomalous behavior. Without these context fields, you may have logs, but you will not have answers.
It can be sobering, but useful, to mentally rehearse investigating an incident without the right logs and feel where the investigation stalls. Imagine you suspect an account was compromised, but you cannot tell where it authenticated from, whether multi-factor was satisfied, or whether privileges were elevated. Imagine you see suspicious network connections, but you cannot tie them to a process on the host, so you cannot tell whether it was a browser, a management tool, or malware. Imagine you suspect persistence, but you cannot see configuration changes or scheduled task creation events, so you do not know what to remove. In each case, the absence of logs does not just slow you down, it forces you into risky decisions, such as isolating large parts of the environment or assuming compromise scope is broader than it is. The fix is not to collect everything, but to target the specific missing questions and add the event types and context fields that answer them. This kind of rehearsal is one of the fastest ways to improve logging because it ties the work directly to real investigative pain.
A good memory anchor is to log what answers your top questions, because questions are what drive both detection and investigation. Your top questions are typically about who did what, where they did it, when they did it, and whether it succeeded. You also need to answer how it happened, which often requires process lineage, session context, or the relationship between a change and the actor who initiated it. When you frame logging around questions, you avoid collecting data that has no operational purpose, and you build a library of evidence that supports decisions. This anchor also helps you explain logging priorities to non-security stakeholders, because you can connect a log source to a concrete outcome, such as confirming whether an administrative change occurred or proving whether sensitive data was accessed. Over time, your questions will evolve as your environment and threat landscape evolve, and your logging strategy should evolve with them. Logging is not static, because your top questions change as attackers adapt and as systems change.
Retention is where logging meets reality, because even the best events are useless if they are gone when you need them. Retention should be set based on investigation needs, threat dwell time expectations, and any legal or regulatory requirements that apply to your organization. Short retention windows may be acceptable for very verbose operational logs, but security-relevant logs often need longer horizons to support investigations that begin weeks after initial compromise. Longer retention also supports trend analysis and threat hunting, but it must be balanced against cost, privacy considerations, and data minimization principles. Integrity and access control also matter for retained logs, because logs are only trustworthy if they are protected from tampering and unauthorized viewing. A common operational pattern is tiered retention, where high-fidelity searchable data is kept for a shorter window and longer-term storage is kept in a cheaper, less interactive form, but the important point is to decide intentionally rather than defaulting. Retention is a policy decision with real security implications, not merely a storage configuration.
At this stage, it is helpful to name three events you always want logged, because it forces clarity about what your program considers non-negotiable. Authentication success and failure events are almost always on that short list, because they are the front door to most systems. Privilege changes or administrative actions are usually next, because they represent a shift in power that often precedes larger compromise. Process start events, especially on endpoints and key servers, are often the third, because they reveal execution and can help you catch malicious tooling and suspicious behavior patterns. These are not the only important events, but they are a practical core that supports many detection and investigation paths. If your environment cannot reliably provide these events, you should treat that as a strategic gap, not a minor inconvenience. Clarity here also helps you prioritize work, because you can measure whether your minimum set is actually present and usable across the systems that matter.
To turn this into action, pick one critical system and improve logging coverage there, because targeted improvements build momentum and teach the organization what good looks like. Choose a system that has high exposure, high privilege, or high business impact, such as a remote access gateway, an identity platform component, a key application server, or a sensitive data store. Validate what logs you currently receive from it, whether the timestamps align with the rest of your environment, and whether the events include enough context to answer your top questions. Then close the most painful gap first, which might be adding authentication events, enabling process telemetry, improving change auditing, or ensuring outcome codes are captured consistently. After the improvement, test it by walking through a realistic scenario and confirming you can reconstruct what happens with confidence. One well-instrumented critical system can serve as a template for scaling the same approach to other systems, because it gives teams a concrete example of sustainable, high-value logging.
To conclude, deciding what to log and why is a design problem, not a storage problem, and strong designs start with clear goals and high-value sources. You identify whether you are optimizing for detection, investigation, compliance, troubleshooting, or a deliberate balance, and you accept that different goals require different event types and context. You start with identity, endpoints, servers, and network telemetry because they form the core visibility needed to understand access, execution, and movement. You define key events around authentication, privilege use, process starts, and changes, and you pressure-test your choices with realistic attack paths such as phishing-to-compromise. You avoid the trap of logging everything by committing to a minimum viable set per asset class, enriching it with context fields like user, host identity, time synchronization, and outcome codes. You set retention based on investigative and legal needs, and you keep the strategy grounded in the questions you must be able to answer under pressure. Then you document that minimum set so it can be implemented consistently and reviewed over time, because durable logging is built through disciplined priorities, not through accidental accumulation.