Windows Implementation Sleuth Kit Implementation Notes http://www.sleuthkit.org Brian Carrier Last Updated: Sept 2008 INTRODUCTION ======================================================================= Version 2.06 of The Sleuth Kit included support for Microsoft Windows. There were several design changes that needed to occur so that TSK could run on both Windows and Unix systems. The biggest change, and the focus of this document, was how Unicode and non-English characters were dealt with. PROBLEM ======================================================================= Unicode characters can be stored in multiple formats. Unix systems use UTF-8, which stores the characters in 1, 2, 3, or 4 bytes. Windows users UTF-16, which stores characters in 2 or 4 bytes. Because of this difference, the input to and output of TSK is different on Windows versus Unix. SOLUTION ======================================================================= The solution to this problem was to create many C #defines that map a general name to the specific function or type that is used on each platform. Internally, all code uses the UTF-8 encoding. This means that the input and output may need to be converted on Windows. The input data consists of image file names, image and file system types, and addresses. There is no need to convert the file names because the native system calls need the same format as the input. For the image, volume, and file system types, I assume that they will always be in English and therefore they are easily converted to ASCII on Windows. Lastly, addresses in a string form are easy to convert to an integer and this is done using either UTF-8 or UTF-16 atoi-type functions. For output, the printf and fprintf functions were wrapped with TSK-specific versions. The wrappers will convert the UTF-8 code to UTF-16, if needed, and then print the resulting data. Therefore, few changes occurred to the volume and file system code except that the printf wrappers were used. The command line tools needed to be changed to handle the 2-byte TCHAR values as input and to use the T* functions, which map to either UTF-8 or UTF-16 functions. Update: When support was added for the mingw cross-compiler, some of the things had to be changed. Specifically, the biggest change was that the command line arguments in the tools have to be obtained via GetCommandLineW() instead of using wmain() because mingw does not support wmain(). ----------------------------------------------------------------------- Send documentation updates to: <doc-updates at sleuthkit dot org> Copyright (c) 2006-2008 by Brian Carrier. All Rights Reserved