Join the CRAWDAD community

Reset your CRAWDAD account password

Datasets and tools by name

Datasets and tools by release date

Datasets and tools by keyword:

Datasets by measurement purpose:

About the CRAWDAD project

CRAWDAD references in CiteULike

CRAWDAD contributors by country

CRAWDAD members by country


CRAWDAD sponsors

Related websites


 Search CRAWDAD via Google:

The mit/reality dataset (v. 2005-07-01)  >  the blueaware traceset

There are 5 traces in this traceset
download the realitymining.tar.gz file
from a CRAWDAD mirror:  US UK AU
size="39 MB" type="tar.gz"
last modified
short description

Traceset of communication, proximity, location, and activity information.


The authors have captured communication, proximity, location, and activity information from 100 subjects at MIT over the course of the 2004-2005 academic year. This data represents over 350,000 hours (~40 years) of continuous data on human behavior.

reason for most recent change
the initial version
release date
date/time of measurement start
date/time of measurement end

Every Bluetooth device is capable of device-discovery, which allows them to collect information on other Bluetooth devices within 5-10 meters. This information includes the Bluetooth MAC address (BTID), device name, and device type. The BTID is a hex number unique to the particular device. The device name can be set at the user's discretion; e.g., Tony's Nokia. Finally, the device type is a set of three integers that correspond to the device discovered; e.g., Nokia mobile phone, or IBM laptop.

To log BTIDs we designed a software application, BlueAware, that runs passively in the background on MIDP2-enabled mobile phones. Bluetooth was primarily designed to enable wireless headsets or laptops to connect to phones, but as a byproduct, devices are becoming aware of other Bluetooth devices carried by people nearby. Our application records and timestamps the BTIDs encountered in a proximity log and makes them available to other applications. BlueAware is automatically run in the background when the phone is turned on, making it essentially invisible to the user.

Bluedar was developed to be placed in a social setting and continuously scan for visible devices, wirelessly transmitting detected BTIDs to a server over an 802.11b network. The heart of the device is a Bluetooth beacon designed by Mat Laibowitz incorporating a class 2 Bluetooth chipset that can be controlled by an XPort web server. We integrated this beacon with an 802.11b wireless bridge and packaged them in an unobtrusive box. An application was written to continuously telnet into multiple BlueDar systems, repeatedly scan for Bluetooth devices, and transmit the discovered proximate BTIDs to our server. Because the Bluetooth chipset is a class 2 device, it is able to detect any visible Bluetooth device within a working range of up to twenty-five meters.

disruptions to data collection
1. All the data from a phone are stored on a flash memory card, which has a finite number of read-write cycles. Initial versions of our application wrote over the same cells of the memory card. This led to failure of a new card after about a month of data collection, resulting in the complete loss of data. When the application was changed to store the incremental logs in RAM and subsequently write each complete log to the flash memory, our data corruption issues virtually vanished. However, ten cards were lost before this problem was identified, destroying portions of the data collected during the months of September and October for six Sloan students and four Media Lab students. 2. Another source of missing data is due to powered-off devices. On average we have logs accounting for approximately 85.3% of the time since the phones have been deployed. Less than 5% of this is due to data corruption, while the majority of the missing 14.7% is due to almost one fifth of the subjects turning off their phones at night. 3. There is a small probability (between 1-3% depending on the phone) that a proximate, visible device will not be discovered during a scan. Typically this is due to either a low level Symbian crash of an application called the "BTServer", or a lapse in the device discovery protocol. The BT server crashes and restarts approximately once every three days (at a 5 minute scanning interval) and accounts for a small fraction of the total error. However, to detect other subjects, we can leverage the redundancy implicit in the system. Because both of the subjects' phones are actually scanning, the probability of a simultaneous crash or device discovery error is less than 1 in 1000 scans.
  1. Continually scanning and logging BTIDs can expend an older mobile phone battery in about 18 hours. While continuous scans provide a rich depiction of a user's dynamic environment, most individuals expect phones to have standby times exceeding 48 hours. Therefore BlueAware was modified to only scan the environment once every five minutes, providing at least 36 hours of standby time.

  2. While the custom logging application on the phone crashes occasionally (approximately once every week), these crashes fortunately do not result in significant data loss. An additional small application was written to start on boot and continually review the running processes on the phone, verifying that our logging application is always running. Should there be a time where this is not the case, the application is immediately restarted. This functionality also ensures that logging begins immediately once the phone is turned on. However, while this logging application is now fairly robust and can be assumed to be running anytime the phone is on, the dataset generated is certainly not without noise.

  3. By scanning only periodically every five minutes, shorter proximity events may be missed.

1. The ten meter range of Bluetooth along with the fact that it can penetrate some types of walls, means that people not physically proximate may incorrectly be logged as such. 2. An error comes from the phone being either explicitly turned off by the user or exhausting the batteries. According to our collected survey data, users report exhausting the batteries approximately 2.5 times each month. One fifth of our subjects manually turn the phone off on a regular basis during specific contexts such as classes, movies, and (most frequently) when sleeping. Immediately before the phone powers down, the event is timestamped and the most recent log is closed. A new log is created when the phone is restarted and again a timestamp is associated with the event. 3. A more critical source of error occurs when the phone is left on, but not carried by the user. From surveys, we have found that 30% of our subjects claim to never forget their phones, while 40% report forgetting it about once each month, and the remaining 30% state that they forget the phone approximately once each week. Identifying the times where the phone is on, but left at home or in the office presents a significant challenge when working with the dataset. To grapple with the problem, we have created a 'forgotten phone' classifier. Features included staying in the same location for an extended period of time, charging, and remaining idle through missed phone calls, text messages and alarms. When applied to a subsection of the dataset which had corresponding diary text labels, the classifier was able to identify the day where the phone was forgotten, but also mislabeled a day when the user stayed home sick. By ignoring both days, we risk throwing out data on outlying days, but have greater certainty that the phone is actually with the user. A significantly harder problem is to determine whether the user has temporarily moved beyond ten meters of his or her office without taking the phone. Empirically, this appears to happen with many subjects on a regular basis and there doesn't seem to be enough unique features of the event to accurately classify it. However, this phenomenon does not diminish the extremely strong correlation between detected proximity and self-report interactions. Lastly, while frequency of proximity within the workplace can be useful, the most salient data comes from detecting a proximity event outside MIT, where temporarily forgetting the phone is less likely to repeatedly occur.

In return for the use of the Nokia 6600 phones, students have been asked to fill out web-based surveys regarding their social activities and the people they interact with throughout the day. Comparison of the logs with survey data has given us insight into our dataset's ability to accurately map social network dynamics. Through surveys of approximately forty senior students, we have validated that the reported frequency of (self-report) interaction is strongly correlated with the number of logged BTIDs (R=.78, p=.003), and that the dyadic self-report data has a similar correlation with the dyadic proximity data (R=.74, p~=.0001). Additionally, a subset of subjects kept detailed activity diaries over several months. Comparisons revealed no systematic errors with respect to proximity and location, except for omissions due to the phone being turned off.

 the mit/reality/blueaware/activityscpan trace
 the mit/reality/blueaware/callspan trace
 the mit/reality/blueaware/cellspan trace
 the mit/reality/blueaware/coverspan trace
 the mit/reality/blueaware/devicespan trace
 how to cite this traceset