Sunday, December 23, 2007

Managing Multi-GBs in Wireshark

This summer I wrote a short paper on a process that I developed to support some analysis of 54GBs of pcap data captured using Wireshark.

Prior to the capture (lesson learned), it was never investigated whether or not Wireshark could support analysis of this quantity of data. The short answer is NO. Depending on your computer’s resources, when attempting to open any significantly sized pcap file (500MB) you will likely experience the dreadful out of memory error.

This obstacle created an opportunity. I have captured what I learned using Wireshark to process this large pcap files here. If you find this process remotely useful or if you have any questions, feel free to contact me.

"A Network-Usage Baseline, within this paper, is defined as a profile of the characteristics of a communication network within a particular window of usage. For example, these characteristics can include utilization, network applications, number of users, etc. These baselines can be used for forecasting and planning, as well as optimization and troubleshooting. This paper is used to document a methodology for creating network-usage baselines using Wireshark, its packaged toolsets (e.g. dumpcap.exe, tshark.exe), an advanced text editing tool and a common spreadsheet application. Notwithstanding Wireshark’s many capabilities, there are limitations when attempting to analyze extremely large capture files (GB+). The task of creating a baseline on modern, multi-user networks normally requires the manipulation and analysis of many very large, multi-GB data captures. This methodology was developed based on roughly 72 hours of captured data, at three distinct intersections in the network architecture. The resulting data equated to 52GBs of libpcap capture files. All of these files were captured using the tool dumpcap.exe, which accompanies the Wireshark installed. During the capturing, each packet was intentionally “snapped” at 50 bytes to minimize the size of the resulting capture files. To the author’s current knowledge, this “snapping” at 50 bytes does not impact the statistics being generated. The example charts at the end of this paper support this understanding..."

No comments: