Network Traffic Tracing at SIGCOMM 2008
Update: The traces are now available.
This page is a modified version of the OSDI 2006 Tracing Handout
We are a team of researchers who are tracing network activity at SIGCOMM
2008 with a view to making the data available to the research community. We
are only recording information that is pertinent to networking research, in
a suitably anonymized form. We are not recording sensitive information such
as the user or client identities or the content of user communication. This
page details what we are tracing and why, how the traces are being
processed to protect sensitive information, and whom to contact if you have
further questions. Thank you.
Contents:
Tracing Overview
Frequently Asked Questions
Technical Details
Joining the trace
Tracing is optional. If you would like to participate in the trace, associate
with the SIGCOMM-ONLY-Traced BSSID. Otherwise, join SIGCOMM-ONLY-Untraced.
We will not be tracing any data on SIGCOMM-ONLY-Untraced.
During the trace
We are not recording payloads (packet bodies) except for DHCP and DNS
payloads, and will collect various headers (802.11, IP, TCP, UDP, and ICMP)
and physical layer information of all packets on the Traced network.
After the trace
The trace will be anonymized after collection and the non-anonymized
data will be encrypted; only the researchers listed in Q2 below will
have access to it. After anonymizing the traces, the non-anonymized trace
will be destroyed.
Getting the trace data
The anonymized trace data will be available within six months after the
conference. Check back at the
UMD Wifidelity project site.
Frequently Asked Questions |
Q1. What are the goals of this tracing project?
Our goal is to gather a detailed trace of network activity at SIGCOMM 2008
to improve 802.11 tracing techniques as part of the
Wifidelity project
and enable analysis of the behavior of a wireless LAN that is (presumably)
heavily used. Besides using this data for our research, we also plan to
make the traces available to the research community.
Q2. Who is gathering the traces?
The traces are being gathered by a team of researchers from University of
Maryland, College Park:
Aaron Schulman,
Dave Levin, and
Neil Spring;
in coordination with the local arrangements chair,
Ratul Mahajan
from Microsoft.
Q3. Who has approved this tracing project?
The tracing plan has been approved by the SIGCOMM 2008 Executive Committee.
Q4. What is being traced?
We are recording network protocol information from all wired and wireless
packets sent on the SIGCOMM-ONLY-Traced wireless network. The
information being recorded for each packet includes physical layer
information such as the wireless signal strength as well as the 802.11, IP,
TCP, UDP, and ICMP headers, depending on the packet type. We are not
recording packet payloads above the transport layer except for DHCP and DNS
payloads. However, we are anonymizing or deleting potentially sensitive
information such as MAC and IP addresses, and DNS names.
Q5. How is the trace being anonymized?
MAC addresses and IP addresses will be anonymized using
AnonTool.
Q6. Will the packet payload be captured or stored?
Packet payload will be recorded for DHCP and DNS requests and responses.
However, information such as DNS names and IP addresses contained in the
payload will be anonymized after being stored.
Q7. Will my activities be identifiable?
Given that the traces are being anonymized after collection and the
non-anonymized traces will be encrypted during transport and destroyed post
anonymization, we believe that it would be difficult for anyone to identify
users or learn which Internet services or hosts they have communicated
with. That said, we are not in a position to prove that no such information
can be gleaned from the anonymized traces.
Q8. What will be done with the anonymized data? Who will have access?
The anonymized traces will be made available to the research community, for
example, through a repository such as CRAWDAD
We plan to make the data available within 6 months after SIGCOMM 2008.
Q9. Will any non-anonymized data be stored?
Yes, we will be anonymizing the trace offline after collection. However
after the traces have been anonymized, the non-anonmyized data will be
destroyed.
Q10. Who will have access to the non-anonymized data, and for how long?
As noted in Q9, the anonymization will be done offline, so the University
of Maryland researchers listed in Q2 will have access to the non-anonymized
data during the time it takes to perform the offline anonymization (no more
than a few days after the trace collection is concluded). In the mean time
the trace data will be stored in an encrypted form. After the trace is
anonymized, the non-anonymized data will be destroyed.
Q11. What identifiable information could still be extracted from the final anonymized trace?
It may be possible to identify users using a side-channel attack, for
instance, by exploiting information such packet sizes and packet timing; we
do not plan to protect the data against such attacks. Also, we would like
to permit the identification of the manufacturer of a wireless NIC (which
could be useful when analyzing the traces), so the first 3 bytes of the MAC
address will be left non-anonymized. However, this could violate the
principle of k-anonymity, i.e., that it should not be possible to identify
any user as being a member of a group with fewer than k members. If a
group size is smaller than 10, our offline anonymization will replace this
MAC-address prefix with another value so as to create a group of at least
10 nodes (i.e., we set k to 10). So it would be possible to identify the
3-byte prefix of a node's MAC address provided that there are at least 10
nodes that share the same prefix.
Q12. How should I protect my data and identifiable activities if I use the wireless network?
As noted above, we are taking every care to obscure sensitive information
while still leaving the traces useful for research. However, we have no
control over who else might be sniffing on the network traffic, even though
such sniffing is against the terms of use for the SIGCOMM wireless network.
Since this is an ever-present danger, especially in wireless networks, we
strongly recommend that you use secure protocols and procedures for
communication (e.g., SSL, SSH, VPN). That said, we are not in a position to
provide definitive advice on how best to protect yourself when using a
wireless network. You would have to consult your IT staff regarding this.
Q13. Whom should I contact if I have further questions about this tracing project?
Please contact Aaron Schulman (schulman@cs.umd.edu) or Dave Levin
(dml@cs.umd.edu).
We are gathering traces of wireless traffic belonging to the traced network
at several monitoring nodes distributed across the conference floor. In
addition, we are gathering traces on the wired switch to which the wireless
access points connect.
Here is a description of the traces we are gathering and the anonymization
that is being performed. Our description here focuses on tracing on
the wireless LAN. A subset of this (viz., everything above the PHY layer)
also applies to the tracing on the wired LAN. What traffic is being
monitored? Each monitor will capture all of the 802.11 frames it sees,
including:
- Data frames
- Management frames (e.g., association, authentication)
- Control frames (e.g., RTS, CTS, ACK)
What information is being logged? For each wireless frame captured at a
monitor, we record up to 250 bytes of the following information:
- Per-frame PHY information, including:
- Channel frequency
- RSSI
- Modulation rate
- Entire MAC header, with only the source and destination MAC addresses being
anonymized as follows:
- Online we will be storing all MAC addresses
- Offline, we anonymize the MACs all the 3-byte MAC prefixes that
occur fewer than 10 times with a common prefix.
This ensures k-anonymity, for k=10.
- The entire IPv4 and TCP/UDP header, with the source and destination
IPv4 addresses anonymized as follows:
- The IP address is replaced with a one-way hash.
- In addition, we record whether the IP address
belongs to the following categories:
- Auto conf (169.254/16).
- Private address space (10/8, 172.16/12,192.168/16).
- The entire DHCP payload, with the following
anonymization:
- Client IP address (ciaddr) is anonymized as in 3.a.
- Client hardware address (chaddr) is anonymized as
in 2.
- Your IP address (yiaddr) is anonymized as in 3.a.
- The "client identifier" option, if present, is replaced
with a one-way hash.
- The DNS request/response payload, with the following
anonymization/deletion:
- The domain name in each RR is replaced with a
one-way hash.
- The resource data contained in each RR is deleted.
Security and privacy issues:
- We have taken reasonable measures to secure the
machines used for tracing: kept them up-to-date on patches,
turned off unnecessary services, protected access with a
strong password, etc.
- We will throw away the secret key used for the keyed
one-way hash once the trace anonymization is concluded to
make it difficult to perform a dictionary attack on the one-
way hash.
- Despite the anonymization, it may be possible for some
information to leak. For example, it may be possible to
infer which website was visited based on the size of the
response received. We are unable to obfuscate such
information without damaging the data significantly.
Web Accessibility