News

Join the CRAWDAD community

Reset your CRAWDAD account password

Datasets and tools by name

Datasets and tools by release date

Datasets and tools by keyword:

Datasets by measurement purpose:

About the CRAWDAD project

CRAWDAD references in CiteULike

CRAWDAD contributors by country

CRAWDAD members by country

CRAWDAD FAQ

CRAWDAD sponsors

Related websites

 

 Search CRAWDAD via Google:

The dartmouth/campus dataset (v. 2009-09-09)  >  the tcpdump traceset

There are 3 traces in this traceset
last modified
2006-11-14
reason for most recent change
TCPDUMP traceset is resanitized.
short description

Packet headers from every wireless packet sniffed in 27 buildings on Dartmouth College campus.

description

The packet headers from every wireless packet sniffed in four (Fall01), five (Spring02), or 18 (Fall03) buildings on campus. The Fall 2001 data was used for [MobiCom 2002 paper].
The Fall 2001 and Spring 2002 data was used for [WiNet 2005 paper]. The 2003/4 data was used for [MobiCom 2004 paper]. This fall03 data also contains a list of device types, as determined using the OS fingerprinting tool p0f. Note that the MAC addresses in this list are only devices that we saw associate with an AP (i.e., that appeared in the syslog or SNMP data). Thus it does not include non-wireless client MAC addresses, such as routers or spoofed MACs that do not appear in syslog.

The total compressed datasets are over 200 GB, so they are too large to post as tarballs. The best option is to use an http tool like curl or wget to download the whole Fall01, Spring02, or Fall03 directory from the web site. Or you can arrange to send us a USB or firewire drive (>250GB) and we can ship it back to you with all of our data on it. You can get the README, some of my analysis software, and the output of my own analysis programs listing the amount of traffic seen at each sniffer for each port number for each day, that is available as a small (660 KB) tgz file. NB: this is for the 2001/2 data.

release date
2004-11-09
date/time of measurement start
2001-09-25
date/time of measurement end
2004-02-28
methodology

We used network ''sniffers'' to obtain detailed network-level traces. Due to the volume of traffic on the wireless network, it was impractical to capture all the traffic. Moreover, the structure of our WLAN, with several subnets, meant that there was no convenient central point for capturing wireless traffic. Instead, we installed 18 sniffers in 14 different buildings; in some large buildings, we needed multiple sniffers to monitor all of the building's APs. The buildings were among the most popular wireless locations in 2001, and included libraries, dormitories, academic departments and social areas. In total, our 18 sniffers covered 121 APs. Each sniffer was a Linux box with two Ethernet interfaces. One interface was used for remote access, to maintain the sniffer and to obtain the data for analysis. The other interface was used for collecting (''sniffing'') data. In each of the 18 switchrooms we attached the APs to a switch, and set another port on the switch to ''mirror'' mode, so that all the traffic on that switch would be sent to this port. The sniffer's second interface was attached to this mirrored port. We used tcpdump to capture any wireless traffic that came through these APs and their wired interfaces..

sanitization

To sanitize the MAC address, we randomized the bottom six hex digits. We collected every MAC address from all of our syslog, SNMP, an tcpdump traces, and built a huge table mapping real MACs to randomized MACs, ensuring that all mappings are unique.
[We did not change either the MAC address 000000000000 or FFFFFFFFFFFF, they remain as they were.]
We applied this mapping consistently across all data files of all types, so if you see a MAC address in the tcpdump files, and see it again in the SNMP trace, you can be sure it's the same client. We used a prefix-preserving IP address sanitizer, see Xu, J., Fan, J. Ammar, M., and Moon, S. ``On the Design and Performance of Prefix-Preserving IP Traffic Trace Anonymization'', Proc. of 10th IEEE International Conference on Network Protocols (ICNP 2002), Paris, France, November 2002. What this means is that you can compare the prefixes of the sanitized IP addresses, i.e. if two IP addresses share the same k-bit prefix, the sanitized addresses will also share the same k-bit prefix.

disruptions to data collection
There were unfortunate gaps in the data collection, generally caused by power failures.
limitation

In both Fall01 and Spring02 datasets we lose a little data each day. We restarted tcpdump once a day, to cause it to begin a new log file. We killed the tcpdump process, then started a new one; as a result, some tcpdump data files end with partial packets, and no doubt we lost a few packets in the transition. We missed any traffic between two clients associated with the same AP, as this would not be sent via the AP's wired interface, but we believe this occurred rarely.

error
The Fall01 data in particular suffers from a lot of corruption. It appears that older versions of tcpdump have a serious bug that causes them to record the MAC address of many frames incorrectly. In the Spring we used a newer tcpdump that did not have this problem.
 the dartmouth/campus/tcpdump/fall01 trace
 the dartmouth/campus/tcpdump/spring02 trace
 the dartmouth/campus/tcpdump/fall03 trace
 how to cite this traceset