Episode 33 — Design network visibility that matters: telemetry selection and baseline behavior modeling

In this episode, we design network visibility with a very practical goal in mind: detect threats before damage spreads, not after the business is already paying the price. Visibility is not the same thing as data volume, and the organizations that struggle most often have plenty of telemetry but cannot answer simple questions quickly under pressure. Effective visibility is engineered, which means you select the right sources, place sensors where they matter, enrich events with context, and model baseline behavior so abnormal patterns stand out. The network is a powerful observation layer because it sees boundary crossings and relationships between systems, especially when host telemetry is incomplete or delayed. At the same time, network telemetry can be noisy and ambiguous if it is collected without purpose and without baselining. The objective is to build a visibility design that supports detection, investigation, and fast containment decisions in a way that stays sustainable. When you get this right, your defenders stop guessing and start proving what happened with evidence that is easy to pivot on.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

The first design decision is selecting telemetry sources that provide complementary views rather than redundant noise. Network flow data is valuable because it shows who talked to whom, how often, and how much, which is often enough to see lateral movement patterns and suspicious egress behavior. Domain Name System (D N S) telemetry is valuable because it reveals what hosts are trying to resolve, which often exposes command and control infrastructure, phishing follow-on activity, and malware staging domains. Proxy telemetry is valuable because it captures web access behavior with richer context, including URLs and user-agent details, and it can show suspicious downloads and access to known bad destinations. Firewall telemetry is valuable because it shows allowed and blocked boundary crossings, which can reveal scanning, policy violations, and unexpected exposure changes. Endpoint Detection and Response (E D R) telemetry is valuable because it ties network activity to processes and users on the host, turning ambiguous connections into explainable behavior. The important principle is that each source answers different kinds of questions, and good visibility designs combine them so you can correlate quickly rather than chasing one noisy signal.

Each telemetry source also has strengths and weaknesses that influence where you invest first. Flow data scales well and is relatively stable, but it can be limited in attribution because it does not inherently tell you which user or process initiated a connection. D N S data is often a high-signal indicator of compromise, but it can be confounded by resolvers, split-horizon configurations, and encrypted name resolution patterns depending on your environment. Proxy data can be very informative for user-driven activity, but it may not cover server-to-server communications or traffic that bypasses the proxy. Firewall logs are critical for boundary decisions, but they can be voluminous and can hide important anomalies if you do not focus on the right zones and rules. E D R provides deep context, but it requires consistent deployment, stable policies, and tuning to avoid drowning in endpoint events. Recognizing these tradeoffs helps you build a layered visibility design where one source compensates for another’s blind spots. The goal is not to pick favorites; the goal is to ensure your primary incident questions can be answered even when one sensor degrades.

Visibility should be prioritized along critical paths and internet-facing services because those are where time-to-detect matters most and where attacker opportunity is highest. Internet-facing services are exposed to broad scanning and exploitation, so you want strong observability around ingress patterns, authentication behavior, and egress from those service environments. Critical paths include administrative access pathways, identity systems, remote access gateways, and key application dependencies such as database access and third-party integrations. If an attacker compromises a low-value workstation, the true risk comes from whether they can pivot into these critical paths. By concentrating visibility on critical paths, you increase the chance you detect lateral movement early, when containment is easier and the blast radius is smaller. This prioritization also prevents you from spending most of your telemetry budget on low-impact segments while leaving high-impact paths under-instrumented. A mature visibility design starts with the places where one mistake becomes many, and it expands outward as confidence and capacity grow.

Prioritization also helps you place sensors where they will observe meaningful boundaries rather than just producing more internal noise. An egress boundary is a classic high-value sensor location because most malware needs to communicate out to infrastructure, download additional payloads, or exfiltrate data. Administrative networks and jump paths are high-value locations because legitimate traffic there is relatively constrained, making anomalies easier to spot. Segmentation boundaries between user networks and server networks are high-value because lateral movement across those boundaries is often a significant incident stage. Identity infrastructure paths are high-value because authentication and privilege events correlate strongly with real compromise. By contrast, placing sensors in overly broad internal segments without defined boundaries can create high volumes of telemetry with weak interpretability. The aim is to watch the transitions that matter and the choke points that define your security architecture. When you treat sensor placement as an architectural decision, you get more detection value per unit of data.

Baselining is what converts raw telemetry into a model of normal, and normal is what you need to identify abnormal behavior reliably. A baseline should be built by segment and time, because behavior differs between business hours and off hours, between weekdays and weekends, and between production segments and office networks. Baselines also differ by role, such as servers that talk to a narrow set of dependencies versus workstations that browse widely. If you build one global baseline for the entire network, it will be too broad to be useful and will generate either constant false positives or constant silence. A good baseline captures typical communication pairs, typical volumes, typical destination categories, and typical authentication and administrative patterns within each segment. It should also account for planned periodic events like backups, patch cycles, and batch jobs so those patterns do not appear as recurring anomalies. The goal of baselining is not to eliminate all surprises; it is to define what should not be surprising so that real surprises stand out. When baselines are credible, anomaly detection becomes practical rather than performative.

A helpful exercise is identifying abnormal behavior from a simple baseline shift, because not every useful detection requires complex models. For example, if a server segment normally makes few outbound connections and suddenly starts making frequent external connections to new destinations, that shift is worth investigating. If an administrative network normally accesses a small set of management interfaces and suddenly reaches a wider set of internal hosts, that shift may indicate credential misuse or a compromised admin station. If a workstation segment normally resolves a stable set of domains and suddenly begins resolving many newly registered or high-entropy domains, that shift can indicate malware command and control behavior. These simple shifts are often more actionable than elaborate analytics because they map to intuitive investigative steps. They also tend to be resilient against minor environmental change because they focus on directional deviation rather than precise thresholds. The key is to tie baseline shifts to meaningful hypotheses about what could be happening, so the response is not simply curiosity but a structured investigation path.

A major pitfall is collecting telemetry without clear investigative questions, because that leads to expensive data lakes that still leave responders guessing during incidents. Visibility should be driven by questions you must answer quickly, such as what accessed this system, where did this identity authenticate from, what did this host connect to, and what changed right before the incident escalated. If you cannot name the questions, you cannot justify the telemetry, and you cannot tune or prioritize effectively. This pitfall also leads to wasted effort in parsing and indexing fields that are rarely used while missing key context fields that would have made investigations faster. Another pitfall is mistaking compliance logging for detection logging, because compliance often focuses on retention and completeness while detection requires timeliness, correlation, and high-signal context. You need both, but you should not confuse their purposes. When telemetry is question-driven, it becomes easier to evaluate value, reduce noise, and defend investments. The system becomes designed rather than accumulated.

A practical quick win is focusing first on identity, egress, and administrative paths, because these areas often yield the highest detection value quickly. Identity provides the actor view and shows how access is gained and expanded. Egress provides the communication view and often reveals command and control, staging, and exfiltration attempts. Administrative paths provide the high-impact control plane view, where misuse can create broad compromise and where legitimate activity is constrained enough to baseline well. By concentrating on these three areas, you can build a detection posture that catches many common incident patterns even before you instrument every internal segment. This approach also creates a natural correlation triangle, where an identity event can be linked to an endpoint process and then to egress behavior, building confidence quickly. It is also operationally manageable, because you can focus tuning and validation on a smaller set of high-value sensors. This is how you avoid boiling the ocean while still improving outcomes meaningfully.

Enriching telemetry with asset context and ownership fields is what turns network data from ambiguous signals into actionable cases. Asset context includes what the system is, what environment it belongs to, what criticality tier it has, and what role it plays in the architecture. Ownership fields include which team is responsible for the asset and who should be contacted during incidents or for remediation. With enrichment, an unusual egress pattern from a production identity service is immediately more urgent than the same pattern from a development workstation. With enrichment, you can route alerts to the right team quickly rather than spending precious time figuring out who owns the system. Enrichment also improves baselining because different asset classes have different normal behaviors, and baselines should reflect those differences. The goal is to make the telemetry self-describing enough that responders can act with confidence and speed. Without enrichment, investigations turn into detective work about basic facts instead of focusing on the incident.

A useful mental rehearsal is tracing an incident path using visibility data, because it shows whether your telemetry design supports real investigative flow. Imagine a suspicious authentication occurs, and you want to know which host it came from and what that host did next. You pivot from identity logs to endpoint telemetry to identify the process responsible, then you pivot to network flows to see internal movement and external communication. You check D N S and proxy logs to see what domains were resolved and which URLs were accessed, and you confirm whether firewalls allowed or blocked unusual connections. You also consult baseline models to determine whether this pattern is new for the segment and time of day, and whether it resembles known benign periodic jobs. If your telemetry design supports this pivoting smoothly, you will be able to scope and contain incidents faster. If you find that pivots break due to missing fields, inconsistent timestamps, or gaps in sensor coverage, you have found the next design improvement to prioritize. Rehearsal turns your design into a tested workflow rather than an abstract architecture.

A memory anchor that keeps visibility design honest is that questions drive telemetry, not the other way. It is tempting to let tool capabilities dictate what you collect, but tools will always offer more data than you can use well. When you start with questions, you can justify sources, you can prioritize sensor placement, and you can define what fields and enrichment are required. Questions also help you tune baselines because you know what patterns matter and what deviations are meaningful. This anchor is especially helpful during budget and storage discussions, because it reframes the conversation away from how much data you can store and toward what decisions you need to make quickly during incidents. It also helps during operations, because when someone asks why a source is being collected, you can answer with the question it supports. Over time, question-driven design produces a smaller, higher-quality telemetry set that improves detection outcomes more than raw volume ever will. The most effective visibility programs are those that can explain their design choices clearly.

Sensors and telemetry pipelines must be validated regularly so gaps do not silently grow, because drift and failure are inevitable in complex environments. A sensor can stop sending due to credential expiration, network path changes, certificate issues, or rate limits, and these failures often look like quiet absence rather than a loud error. Parsing changes and schema drift can also degrade usefulness, where events still arrive but key fields are missing or mis-mapped. Validation should confirm that critical sources are still present, that key event types are still being captured, and that normalized fields remain populated correctly. It should also confirm timeliness, because delayed telemetry may be acceptable for audits but not for rapid detection. Regular validation turns visibility into a reliable capability rather than a hopeful assumption. When validation is routine, you find problems early and fix them before an incident forces you to discover missing evidence.

At this point, you should be able to name three visibility questions you always ask, because these questions define the minimum viability of your network observability. You always want to know who initiated the activity, which means you need identity attribution and role context. You always want to know where the activity went, which means you need destination context such as internal segment, external address, and domain resolution patterns. You always want to know what changed around the time of the activity, which means you need administrative and configuration change telemetry and a baseline model that tells you whether the pattern is new for that segment and time. These questions are broadly applicable across most incidents and support rapid scoping. When you can state them clearly, you can also evaluate whether your telemetry design answers them reliably. If any question cannot be answered quickly, that is a design gap you should treat as high priority.

To turn design into progress, pick one segment to baseline this week and treat it as a focused project rather than as an indefinite aspiration. Choose a segment that matters, such as an administrative network, an internet-facing service zone, or a production application segment with clear boundaries. Gather flow, D N S, firewall, proxy, and E D R perspectives for that segment where available, and build a simple baseline of typical communication pairs, typical destinations, and typical time-based patterns. Then define what you would consider abnormal in that segment, such as new external destinations, new high-volume egress, new cross-segment access, or unusual authentication from that segment. Validate the baseline by reviewing a small sample of recent data and confirming that expected periodic jobs are accounted for and that the baseline does not flag normal activity constantly. The goal is a baseline that is stable enough to support detection and triage, not a perfect model. Once one segment is baselined successfully, you have a pattern you can repeat for the next segment.

To conclude, designing network visibility that matters is about selecting the right telemetry sources, prioritizing the right places, and modeling normal behavior so abnormal patterns become detectable and actionable. You choose complementary sources such as flows, D N S, proxy, firewall, and E D R so you can correlate actor, host behavior, and network movement quickly. You prioritize critical paths and internet-facing services because that is where time-to-detect matters most and where attacker opportunity is highest. You build baselines by segment and time so you can recognize meaningful shifts, and you practice spotting abnormal behavior from simple deviations that map to clear investigative hypotheses. You avoid collecting telemetry without investigative questions, and you use the quick win focus on identity, egress, and administrative paths to gain high-value visibility early. You enrich telemetry with asset context and ownership so incidents route quickly and severity reflects business reality. Then you validate sensor coverage today, because the best visibility design fails if sensors drift or gaps silently grow, and reliable detection depends on evidence that is consistently present when you need it most.

Episode 33 — Design network visibility that matters: telemetry selection and baseline behavior modeling
Broadcast by