Pages

Thursday, May 19, 2016

PowerShell v3 View File Types in Folder Grouped by Binary Signature

One of my recent projects involved loading some files into SQL. After getting the files into the system we quickly realized the file extensions did not accurately reflect the actual file type. In other words, the file extensions were incorrect in some cases. Seeing as this project is already past due I decided to get involved with a new scripter on my team and provide a quick script to help summarize what we were really looking at. In this case, I decided to read each file, to the fourth byte, and, group the files by file signature. The effective script I developed looks like this, with the real paths changed to nonsensical ninja based names to be a little less uber-serious and mega-geeky.
# Clear console in GUI
Clear-Host;

# Set root path
$dir = C:MickeyMouseNinjaOfCheese;

# Get list of child folders
Get-ChildItem -Path $dir |

# Iterate over result set
ForEach-Object {

       # Iterate over files in each sub folder
       Get-ChildItem -Path $_.fullname |
      
       # Exclude specific files irrelevant to the discussion
       Where-Object {$_.name -ne kungfu.txt} |
      
       # Process each file
       ForEach-Object -Begin {
             # Instantiate new hashtable
             $fileheaders = @{};
            
             # Report status to host
             Write-Output "Processing $($_.Fullname)"
       } `
       -Process {
             # Clear string buffer from previous iteration
             $string = $null
            
             # Read file as binary data for first four characters
             Get-Content -Path $_.fullname -Encoding Byte -TotalCount 4 | % {
                    # Convert each ASCII code characer to hex and append to buffer
                    $string += [Convert]::ToString($_,16).PadLeft(2,0)
             }
            
             # Add file path (as key) and header (as string)
             $fileheaders.Add($_.fullname, $string)
       } `
       -End {
             # Enumerate collection
             $fileheaders.GetEnumerator() |
            
             # Group by file signature (i.e., each hastable members value)
             Group-Object Value |
            
             # Output to table displaying the count of each grouped file signature and the signature
             Format-Table -AutoSize -Property Count, Name
       }
}
I dont want to go through this at too granular a level, so, I will hit the highlights. The comments should provide a pretty good road map. When the script run it:

  1. Gets a list of folders and passes that collection to a foreach
  2. The foreach gets the files in each folder and passes that to the pipeline
  3. This gets passes through a Where to eliminate any kungfu.txt files as I dont need to know what their file type is
  4. With each file I pass it through a ForEach which uses the three sections
    1. Begin: to set up a new hashtable for the folder and output the status to the host
    2. Process: to clear the buffer ($string), read each file as binary for 4 characters and convert this string to a hex string stored in the buffer. Finally, it adds the file fullname (as a unique key) and the signature to the hashtable.
    3. End: to process the hashtable, group the collection by hashtable value (the file signature) and output the results to a Format-Table cmdlet. This autosizes, to display long paths, and shows the count and name (or file signature).
When I run this script on a few folders, I see these output:
Processing C:MickeyMouseNinjaOfCheese001

Count Name   
----- ----   
  210 49492a00
    1 5b496465
    2 ffd8ffe0


Processing C:MickeyMouseNinjaOfCheese002

Count Name   
----- ----   
  212 49492a00
    2 224d3a5c
    3 0d0a4d3a
    2 6364202e
    1 5b496465


Processing C:MickeyMouseNinjaOfCheese003

Count Name   
----- ----   
  209 49492a00
    1 5b496465
    1 424d6e0e
    1 424d4e13
    1 424d0e10
With a little more work, I can see the breakdown of file types by analyzing my signature database, but, that is another story of sorts.

Related Posts by Categories

0 comments:

Post a Comment