Firewalls and Intrusion Detection Systems David Brumley [email protected] Carnegie Mellon University IDS and Firewall Goals Expressiveness: What kinds of policies can we write? Effectiveness: How well does it detect attacks while avoiding false positives? Efficiency: How many resources does it take, and how quickly does it decide? Ease of use: How much training is necessary? Can a non-security expert use it? Security: Can the system itself be attacked? Transparency: How intrusive is it to use? 2 Firewalls Dimensions: 1. Host vs. Network 2. Stateless vs. Stateful 3. Network Layer
3 Firewall Goals Provide defense in depth by: 1. Blocking attacks against hosts and services 2. Control traffic between zones of trust 4 Logical Viewpoint ? Inside Firewall m Outside For each message m, either: Allow with or without modification Block by dropping or sending rejection notice
Queue 5 Placement Host-based Firewall Host Firewall Outside Features: Faithful to local configuration Travels with you Network-Based Firewall Host A Host B Host C Firewall
Outside Features: Protect whole network Can make decisions on all of traffic (trafficbased anomaly) 6 Parameters Types of Firewalls 1. Packet Filtering 2. Stateful Inspection 3. Application proxy Policies 1. Default allow 2. Default deny 7 Recall: Protocol Stack Application (e.g., SSL)
Transport (e.g., TCP, UDP) Network (e.g., IP) Link Layer (e.g., ethernet) Physical TCP Header Application message - data TCP data IP Header TCP data IP
TCP data ETH IP TCP data Link (Ethernet) Header TCP data ETH Link (Ethernet) Trailer 8
Stateless Firewall e.g., ipchains in Linux 2.2 Application Outside Transport Network Inside Filter by packet header fields 1. IP Field (e.g., src, dst) 2. Protocol (e.g., TCP, UDP, ...) 3. Flags (e.g., SYN, ACK) Link Layer Firewall
Fail-safe good practice Example: only allow incoming DNS packets to nameserver A.A.A.A. Allow UDP port 53 to A.A.A.A Deny UDP port 53 all 9 Need to keep state Example: TCP Handshake Inside Firewall Outside Syn SNCrandC ANC0 Desired Policy: Every SYN/ACK must have been preceded
by a SYN SYN/ACK: ACK: SNSrandS ANSSNC SNSNC+1 ANSNS Listening Store SNc, SNs Wait Established 10 Stateful Inspection Firewall e.g., iptables in Linux 2.4 Added state
(plus obligation to manage) Application Outside Transport Inside Timeouts Size of table Network Link Layer State 11 Stateful More Expressive Example: TCP Handshake Inside
Record SNc in table Firewall Syn SNCrandC ANC0 SYN/ACK: Verify ANs in table Outside ACK: SNSrandS ANSSNC SNSNC+1 ANSNS Listening Store SNc, SNs
Wait Established 12 State Holding Attack Assume stateful TCP policy Inside Firewall Attacker Syn Syn 2. Exhaust Resources ... Syn
1. Syn Flood 3. Sneak Packet 13 Fragmentation Data Frag 1 Frag 2 Frag 3 IP Hdr DF=0 MF=1 ID=0 Frag 1 IP Hdr
DF=0 MF=1 Frag 2 IP Hdr DF=1 MF=0 ID=2n Frag 3 DF : Dont fragment (0 = May, 1 = Dont) MF: More fragments (0 = Last, 1 = More) Frag ID = Octet number Octet 1 Ver ID=n
Octet 2 IHL say n bytes Octet 3 TOS Octet 4 Total Length ID 0 D M F F
Frag ID ... 14 Reassembly Data Frag 1 Frag 2 IP Hdr DF=0 MF=1 ID=0 Frag 1 IP Hdr DF=0
Byte 2n 15 Example 2,366 byte packet enters a Ethernet network with a default MTU size of 1500 Packet 1: 1500 bytes 20 bytes for IP header 24 Bytes for TCP header 1456 bytes will be data DF = 0 (May fragment), and MF=1 (More fragments) Fragment offset = 0 Packet 2: 910 bytes
20 bytes for IP header 24 bytes for the TCP header 866 bytes will be data DF = 0 (may fragment), MF = 0 (Last fragment) Fragment offset = 182 (1456 bytes/8) 16 Overlapping Fragment Attack Assume Firewall Policy: Incoming Port 80 (HTTP) Incoming Port 22 (SSH) Packet 1 ... DF= 1 MF=1
80 Port Bypass policy Sequence Number .... 17 Stateful Firewalls Pros More expressive Cons State-holding attack Mismatch between firewalls understanding of protocol and protected hosts 18 Application Firewall
Outside Application Transport Network Link Layer Inside Check protocol messages directly Examples: SMTP virus scanner Proxies Application-level callbacks State 19 Firewall Placement
20 Demilitarized Zone (DMZ) Inside Firewall Outside WWW DNS NNTP SMTP DMZ 21 Dual Firewall Inside
Elizabeth D. Zwicky Simon Cooper D. Brent Chapman William R Cheswick Steven M Bellovin Aviel D Rubin 24 Intrusion Detection and Prevetion Systems 25 Logical Viewpoint ? Inside IDS/IPS m Outside
For each message m, either: Report m (IPS: drop or log) Allow m Queue 26 Overview Approach: Policy vs Anomaly Location: Network vs. Host Action: Detect vs. Prevent 27 Policy-Based IDS Use pre-determined rules to detect attacks Examples: Regular expressions (snort), Cryptographic hash (tripwire, snort) Detect any fragments less than 256 bytes alert tcp any any -> any any (minfrag: 256; msg: "Tiny fragments detected, possible hostile activity";) Detect IMAP buffer overflow alert tcp any any -> 192.168.1.0/24 143 ( content: "|90C8 C0FF FFFF|/bin/sh";
getuid() geteuid() exit() Exit(g) Exit(f) Execution inconsistent with automata indicates attack 29 Anomaly Detection Safe New Event Distribution of normal events Attack
IDS 30 Example: Working Sets Days 1 to 300 Day 300 Alice Alice outside working set working set of hosts 18487 fark reddit xkcd
slashdot fark reddit xkcd slashdot 31 Anomaly Detection Pros Does not require predetermining policy (an unknown threat) Cons Requires attacks are not strongly related to known traffic Learning distributions is hard
32 Automatically Inferring the Evolution of Malicious Activity on the Internet Shobha Venkataraman David Brumley AT&T Research Carnegie Mellon University Subhabrata Sen Oliver Spatscheck AT&T Research AT&T Research A Spam Haven
the move E ... K Labeled IPs from spam assassin, IDS logs, etc. Tier 1 Goal: Characterize regions changing from bad to good (-good) or -good) or good to bad (-good) or -bad) 34
Research Questions Given a sequence of labeled IPs 1. Can we identify the specific regions on the Internet that have changed in malice? 2. Are there regions on the Internet that change their malicious activity more frequently than others? 35 Per-IP Per-IP often not Granularity interesting (e.g., Spamcop) B A C
Previous work: Fixed granularity Spam Haven Tier 1 Tier 1 Tier 2 D E ... Tier 2 K DSL Challenges 1. Infer the right granularity
CORP X 36 B A BGP granularity Spam Haven (e.g., Network-Aware clusters [KW00]) Tier 1 Tier 1 Tier 2 D
C Previous work: Fixed granularity E ... Tier 2 W DSL Challenges 1. Infer the right granularity CORP X 37
Coarse granularity B A C Idea: Infer granularity Spam Haven Well-managed network: fine granularity Medium granularity Tier 1 Tier 1
Tier 2 D E ... Tier 2 K DSL Challenges 1. Infer the right granularity CORP X 38 B
A C Spam Haven fixed-memory device high-speed link Tier 1 Tier 1 Tier 2 D E ... Tier 2 W DSL
SMTP Challenges 1. Infer the right granularity 2. We need online algorithms X 39 Research Questions Given a sequence of labeled IPs We Present 1. Can we identify the specific regions on the Internet that have changed in malice? -good) or -Change
2. Are there regions on the Internet that change their malicious activity more frequently than others? -good) or -Motion 40 Background 1. 2. IP Prefix trees TrackIPTree Algorithm 41 184.108.40.206/32 B A
C Ex: 1 host (all bits) Spam Haven Tier 1 Tier 1 220.127.116.11/16 Tier 2 D E ... Ex: 18.104.22.168-22.214.171.124 Tier 2
W DSL X CORP IP Prefixes: i/d denotes all IP addresses i covered by first d bits 42 Whole Net 0.0.0.0/0 0.0.0.0/1 0.0.0.0/2 126.96.36.199/1 188.8.131.52/2
184.108.40.206/2 220.127.116.11/3 18.104.22.168/4 192.0.0.0/2 22.214.171.124/3 126.96.36.199/4 0.0.0.0/31 0.0.0.0/32 0.0.0.1/32 An IP prefix tree is formed by masking each bit of an IP address. One Host 43
0.0.0.0/0 0.0.0.0/1 + 0.0.0.0/2 Ex: 188.8.131.52 is good - 184.108.40.206/1 220.127.116.11/2 Ex: 18.104.22.168 is bad 22.214.171.124/2 126.96.36.199/3
[VBSSS09] is an IP tree with at most k-leaves, each leaf labeled with good (+) or bad (-). 44 /1 TrackIPTree Algorithm [VBSSS09] ... /16 /17 of In: stream labeled IPs /18 - +
TrackIPTree Out: k-IPTree 45 -Change Algorithm 1. 2. 3. 4. Approach What doesnt work Intuition Our algorithm 46 Goal: identify online the specific regions on the Internet that have changed in malice. /0
T1 for epoch 1 /1 /16 /17 /18 - /0 T2 for epoch 2 /1 /16 + + -good) or -Good:
A change from bad to good Epoch 1 IP stream s1 /17 /18 + + -good) or -Bad: A change from good to bad Epoch 2 IP stream s2 .... 47 Goal: identify online the specific regions on the Internet that have changed in malice. /0
T1 for epoch 1 /1 /16 /17 /18 - /0 T2 for epoch 2 /1 /16 + +
False positive: Misreporting that a change occurred /17 /18 + + False Negative: Missing a real change 48 Goal: identify online the specific regions on the Internet that have changed in malice. /0 T1 for epoch 1
/1 /16 /17 /18 - /0 /1 /16 - + T2 for epoch 2 -
Different Granularities! Idea: divide time into epochs and diff Use TrackIPTree on labeled IP stream s1 to learn T1 Use TrackIPTree on labeled IP stream s2 to learn T2 Diff T1 and T2 to find -good) or -Good and -good) or -Bad 49 Goal: identify online the specific regions on the Internet that have changed in malice. -good) or -Change Algorithm Main Idea: Use classification errors between Ti-1 and Ti to infer -good) or -Good and -good) or -Bad 50 -Change Algorithm Si-1 Ti-2
Fixed TrackIPTree Si Ti-1 Si-1 Ann. with class. error Told,i-1 Si Ann. with class. error Told,i TrackIPTree Ti
compare (weighted) classification error (note both based on same tree) -good) or -Good and -good) or Bad 51 Comparing (Weighted) Classification Error Told,i-1 /16 Told,i IPs: 200 Acc: 40% IPs: 50 Acc: 30%
Acc: 10% IPs: 80 Acc: 5% -good) or --Change Change Localized 55 Evaluation 1. 2. 3. What are the performance characteristics? Are we better than previous work? Do we find cool things? 56 Performance In our experiments, we : let k=100,000 (k-IPTree size)
processed 30-35 million IPs (one days traffic) using a 2.4 Ghz Processor Identified -good) or -Good and -good) or -Bad in <22 min using <3MB memory 57 How do we compare to network-aware clusters? (By Prefix) 6 DChange Networkaware 100 80 60 5 10
2.5x as many changes on average! 4 40 20 0 10 IPs in DchangePrefixes No. of DchangePrefixes 120 10 3 15 20 25
30 Interval in Days (a) -change Prexes 35 10 15 20 Inte (b) IPs in 58 Spam Grum botnet takedown 59 22.1 and 28.6
thousand new DNSChanger bots appeared Botnets 38.6 thousand new Conficker and Sality bots 60 Caveats and Future Work For any distribution on which an ML algorithm works well, there is another on which is works poorly. The No Free Lunch Theorem ! Our algorithm is efficient and works well in practice. ....but a very powerful adversary
could fool it into having many false negatives. A formal characterization is future work. 61 Detection Theory Base Rate, fallacies, and detection systems 62 be the set of all possible events. Let be the set of all possible events. be the set of all possible events. For example: Audit records produced on a host Network packets seen 63 be the set of all possible events. Example: IDS Received 1,000,000 packets. 20 of them corresponded to an intrusion.
The intrusion rate Pr[I] is: Pr[I] = 20/1,000,000 = .00002 I Intrusion Rate: Set of intrusion events I 64 be the set of all possible events. Defn: Sound I A Alert Rate: Set of alerts A 65 be the set of all possible events.
Defn: Complete I A 66 be the set of all possible events. Defn: False Negative Defn: False Positive I A Defn: True Positive Defn: True Negative 67 be the set of all possible events. Think of the detection rate as the set of
intrusions raising an alert normalized by the set of all intrusions. I A Defn: Detection rate 68 be the set of all possible events. 18 4 2 I A
69 be the set of all possible events. Think of the Bayesian detection rate as the set of intrusions raising an alert normalized by the set of all alerts. (vs. detection rate which normalizes on intrusions.) I Defn: Bayesian Detection rate A ! Crux of IDS usefulness 70 be the set of all possible events. 4 2
About 18% of all alerts are false positives! I A 18 71 Challenge Were often given the detection rate and know the intrusion rate, and want to calculate the Bayesian detection rate 99% accurate medical test 99% accurate IDS 99% accurate test for deception ... 72 Fact:
Proof: 73 Calculating Bayesian Detection Rate Fact: So to calculate the Bayesian detection rate: One way is to compute: 74 Example 1,000 people in the city 1 is a terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000 Suppose we have a new terrorist facial recognition system that is 99% accurate. City (this times 10)
99/100 times when someone is a terrorist there is an alarm For every 100 good guys, the alarm only goes off once. An alarm went off. Is the suspect really a terrorist? 75 Example ! g Answer: The facial recognition system is 99% accurate. That means there is only a 1% chance the guy is not the terrorist. n o
r W City (this times 10) 76 Formalization 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001 99/100 times when someone is a terrorist there is an alarm. P[A|T] = .99 City (this times 10) For every 100 good guys, the alarm only goes off once. P[A | not T] = .01 Want to know P[T|A]
77 Intuition: Given 999 good guys, we have 999*.01 9-10 false alarms 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001 99/100 times when someone is a terrorist there is an alarm. P[A|T] = .99 City False alarms (this times 10) For every 100 good guys, the alarm only goes off once. P[A | not T] = .01 Want to know P[T|A] 78
Unknown Unknown 79 Recall to get Pr[A] Fact: Proof: 80 ..and to get Pr[A I] Fact: Proof: 81 82 83
Visualization: ROC (Receiver Operating Characteristics Curve) Plot true positive vs. false positive for a binary classifier at various threshold settings 84 sealarmrate. Unfortunately, to apply these results to the current situation we need to have a m graspin the form of a statistical modelof, what constitutes normal or background 70% detection fc. Plotting the detection rate as a function of the falsealarm ratewerequires end up with what is called FPintroduction < 1/100,000to ROC curves, ROCReceiver Operating Characteristiccurve. (For a general ection and estimation theory, see [VT68]). We 1 True positives Assumed ROC ve already stated that the points and
are members of the ROC curve for any rusion detector. Furthermore between these 0.8 nts the curve is convex, were it concave, we 80% detection uld be better off to reverse our decision, and it generates 40% FP 0.6 nnot contain any dipsthat would in effect inate a faulty, non-optimal detector, since a ranmised test would then be better. See gure 2 0.4 the ROC curve of our previous example. e see that our ROC curve has a very sharp rise m since we quickly will have to reach 0.2 ceptable detection rate values while still False positives eping the falsealarm rate at bay. It is doubtful ven policy detection, the type of detection of0 0 0.2 0.4
0.6 0.8 1 thought to be the most resilient to falsealarms P(A|I) n reach as low values as i.e. ile still keeping the detection rate as high as Figure 2: Plot of as a function of 0.7 or above. False positives To reach such levels it is imperative that the designer of intrusion detection systems do not From Axelsson, RAID 99
roduce some policy element that has even a remote chance of triggering in the face of benign85 For IDS I be an intrusion, A an alert from the IDS 1,000,000 msgs per day processed 2 attacks per day 10 attacks per message P(A|I) Let Why is anomaly detection hard Think in terms of ROC curves and the Base Rate fallacy. Are real things rare? If so, hard to learn Are real things common? If so, probably ok.
86 Conclusion Firewalls 3 types: Packet filtering, Stateful, and Application Placement and DMZ IDS Anomaly vs. policy-based detection Detection theory Base rate fallacy 87
Definitions and Origins of Culture. Traditional definition of culture . Culture is the sum of the values, rituals, symbols, beliefs, and thought processes that are learned, shared by a group of people, and transmitted from generation to generation.
Half of a 3x5 flashcard (3" x 2.5" min.) Use MS Word to size the picture. Demo example. Glue is preferred over tape when inserting anything into the notebook because it is more permanent and less susceptible to alteration. There...
Bivariate Statistics GTECH 201 Lecture 17 Overview of Today's Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square Goodness of Fit Test Kolmogorov-Smirnov Test Differences Between Two Samples Are there significant differences between the two samples?
All non-fiction text types cover have the same five key issues to look at. These are: Audience Purpose Examples Typical structure Typical language features Persuasive writing is for arguing a case or point of view and is intended for anyone...
*following up on observations made by Kenning Arlitsch, Google Knowledge cards for libraries vary in quality. LC looks good, but KC for the London (Ohio) public library is just a location on a map. Kenning's argues that the KC can...
* * Alcatel-Lucent AT&T Bigband Networks CableLabs Cisco Systems DTS Ericsson ETRI Huawei IneoQuest Technologies Intel JDSU Juniper LG Electronics Microsoft Motorola Nagravision NEC Corporation of America Nielsen Company Nokia Siemens Networks Philips Consumer Electronics Qwest RGB Networks Rogers Wireless...
Ready to download the document? Go ahead and hit continue!