Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 51835a94f557499b811fa47c760aac2d > files > 40

sleuthkit-3.0.1-1mdv2010.0.i586.rpm

                        Windows Implementation
                     Sleuth Kit Implementation Notes
                        http://www.sleuthkit.org

                            Brian Carrier
                     Last Updated: Sept 2008


INTRODUCTION
=======================================================================
Version 2.06 of The Sleuth Kit included support for Microsoft Windows.  
There were several design changes that needed to occur so that TSK could
run on both Windows and Unix systems.  The biggest change, and the focus 
of this document, was how Unicode and non-English characters were dealt 
with.


PROBLEM 
=======================================================================
Unicode characters can be stored in multiple formats.  Unix systems
use UTF-8, which stores the characters in 1, 2, 3, or 4 bytes. Windows
users UTF-16, which stores characters in 2 or 4 bytes.  Because of
this difference, the input to and output of TSK is different on Windows
versus Unix.


SOLUTION
=======================================================================
The solution to this problem was to create many C #defines that map
a general name to the specific function or type that is used on each
platform.  Internally, all code uses the UTF-8 encoding.  This means
that the input and output may need to be converted on Windows.

The input data consists of image file names, image and file system types,
and addresses.  There is no need to convert the file names because the
native system calls need the same format as the input.  For the image,
volume, and file system types, I assume that they will always be in
English and therefore they are easily converted to ASCII on Windows.
Lastly, addresses in a string form are easy to convert to an integer
and this is done using either UTF-8 or UTF-16 atoi-type functions.

For output, the printf and fprintf functions were wrapped with
TSK-specific versions.  The wrappers will convert the UTF-8 code to
UTF-16, if needed, and then print the resulting data.

Therefore, few changes occurred to the volume and file system code except
that the printf wrappers were used.  The command line tools needed to
be changed to handle the 2-byte TCHAR values as input and to use the T*
functions, which map to either UTF-8 or UTF-16 functions.

Update: When support was added for the mingw cross-compiler, some of
the things had to be changed.  Specifically, the biggest change was
that the command line arguments in the tools have to be obtained via
GetCommandLineW() instead of using wmain() because mingw does not
support wmain().

-----------------------------------------------------------------------
Send documentation updates to: <doc-updates at sleuthkit dot org>

Copyright (c) 2006-2008 by Brian Carrier.  All Rights Reserved