CRAWDAD metadata

CRAWDAD metadata is a description for CRAWDAD data, tools, and related authors and papers. Here is a sample metadata and following will be how to read and navigate the metadata.


Metadata structure (example)

  • [Data]
    • [Dataset] ucsd/sigcomm2001 (v. 2002-04-23) [what's new] [version history]
      • [Traceset] ucsd/sigcomm2001/snmp (v. 2002-04-05)
        • [Trace] ucsd/sigcomm2001/snmp/Stations (v. 2002-04-05) [download 4 MB zip]
        • [Trace] ucsd/sigcomm2001/snmp/AP_Mibtree (v. 2002-04-05) [download 59 MB zip]
      • [Traceset] ucsd/sigcomm2001/tcpdump (v. 2004-11-09)
        • [Trace] ucsd/sigcomm2001/tcpdump/08292005 (v. 2002-04-23) [what's new][download 267 MB gz]
  • [Tools]
    • [Tool] ucsd/sigcomm2001/tool/snmputil.exe (v. 2002-04-05) [download 73 KB exe]
    • [Tool] ucsd/sigcomm2001/tool/extract.pl (v. 2002-04-05) [download 3 KB pl]
  • [Authors]
    • [Author] Anand Balachandran
    • [Author] Geoffrey M. Voelker
    • [Author] Paramvir Bahl
    • [Author] P. Venkat Rangan
  • [Papers]
    • [Paper] meng-flows
    • [Paper] balachandran-behavior

CRAWDAD metadata has four categories: data, tools, authors, and papers. As shown in the above example, metadata structure represents a hierarchy in each category. For example, there is a hierarchy of dataset, traceset, and trace in data category in that order. The other categories - tools, authors, and papers - have only one level of hierarchy.

  • Hiearchy in data category : A dataset is a set of wireless network data, collected by the same organization on the same type of network with some temporal locality (e.g., without a long time gap). For example, the dataset in the above example is a set of data which were collected by University of California, San Diego on the 802.11 network of a conference held in the campus during three days. A traceset is a set of traces that were collected using the same measurement technique, e.g., snmp, tcpdump, syslog, etc. A dataset can contain multiple tracesets, and a traceset can contain multiple traces.
  • Hierarchical naming : Naming in data category follows the hierarchy of dataset, traceset, and trace, by joining them with "/". For example, the dataset "ucsd/sigcomm2001" has two tracesets, "ucsd/sigcomm2001/snmp" and "ucsd/sigcomm2001/tcpdump", which represent trace sets collected using snmp and tcpdump, respectively. Likewise, the traceset "ucsd/sigcomm2001/snmp" contains two "downloadable (by clicking [download] link)" traces, "ucsd/sigcomm2001/snmp/Stations" and "ucsd/sigcomm2001/snmp/AP_Mibtree". More information on each entity (dataset, traceset, or trace) can be obtained by clicking its name.
  • Other categories: represents the tools, authors, and papers which are related with the entities shown in data category.
  • Versions: We assume that only the entities in data and tools categories have versions: the entities in the other categories have no version. We use the release date as a version number. For example, the version number "v. 2004-04-23" of the dataset "ucsd/sigcomm2001" indicates that the dataset was released on April, 23, 2004. For browsing all the versions, you can click "[version history]" link. If you want to know the changes from the previous version, you can click "[what's new]" link.
  • Fields: When you click each entity (e.g., dataset, traceset, trace, or tool), actual metadata appears in a series of metadata fields.

    The following fields are common to all entities (dataset, traceset, trace, and tool):

    • version: metadata version (see above)
    • changes: changes since the last version (release)
    • bibtex: bibtex entry used for reference in papers
    • metadata last modified: date when the metadata "description" was last modified (note: this date may be different from the release date)
    • summary: executive summary
    • release date: date when the entity was released
    • download url: file size, type, and download location
    • related data/tools: the other entities related with the entity

    The following fields are common to all data entities (dataset, traceset, and trace):

    • measurement start: date when measurement started
    • measurement end: date when measurement ended
    • measurement purpose: e.g., Usage Characterization, Network Performance Analysis, etc
    • sanitization: how to tidy up the data especially for protecting the privacy
    • hole: missing data due to system failures or configuration mistake
    • error: incorrectly measured data
    • limitation: what the methodology used cannot collect or accurately measure
    • note: other description

    The following fields are specific to dataset:

    • keyword: keyword list used for "Browse" page
    • authors: author list
    • web site: an original web site or a CRAWDAD web site for the dataset
    • wiki: wiki address for the dataset
    • network type: e.g., 802.11 infrastructure, bluetooth, etc.
    • environment: non-technical description (e.g., on the user population) of the dataset
    • network: network configuration
    • collection: collection methodology

    The following fields are specific to traceset:

    • methodology: detailed description of collection methodology

    The following fields are specific to trace:

    • derived: false if the trace is an original raw trace, otherwise true if the trace was derived from another trace.
    • format: trace format
    • configuration: experimetal setup for collecting the trace
    • tools used: tools used for the trace

    The following fields are specific to tool:

    • keyword: keyword list used for "Browse" page
    • authors: author list
    • web site: an original web site or a CRAWDAD web site for the tool
    • wiki: wiki address for the tool
    • license: terms of copyright, usage, change, or distribution of the tool
    • support: how to get supports on the tool
    • build: how to build the tool
    • intput: input for the tool
    • output: output for the tool
    • parameters: parameters for the tool
    • usage: detailed usage
    • example: usage example
    • algorithm: algorithm used for the tool