Friday, April 10, 2009

De-Dupe Files Script Tool for Windows 32bit

You can skip the background story about the why, when and how I decided to write this little File De-Duping script, if you are not interested in it and jump right the section about how the script works, what it does, where to download it, how to install and de-install it and the source code of it as well. The tool is Freeware, but any donation (money or goods) and/or simply a “Thank You” (e.g. via the comments section at the end of this post) are appreciated nevertheless.

Important Node: I updated the script and this post, because of some bugs that I found and issues with the 3rd party tool “touch.exe”. I had to remove it and come up with another solution for the problem that it solved. I also added some nice stuff, so it was more than just bug fixing :).

The Background Story

Who did not have the problem yet to have hundreds of any type of files, text files, documents, spread sheets, images, videos and others in one directory with the a high chance that you have duplicates, identical files that only have a different file name. I often have to deal with duplicate images that I downloaded from the internet.

I cannot always remember, if I downloaded a particular image already or not and go by the slogan, archive / save / back up first, sort later, because there will be nothing to sort, if you don’t save a copy and go back to the place where you found them at a later time, to learn that the stuff isn’t there anymore for any reason. Maybe even the whole web site is gone the way of the Dodo.

The original file names are often useless, either to generic, like image1.jpg, logo.gif or simply 1.jpg, 2.jpg etc. or they are long and cryptic without any meaning, like: 3104458219_cc0dfd3980_o_d.jpg or something like that. So you end up giving the files your own name and thus make it highly likely that you download the same file at a different time again and give it a different file name than you did the first time around. Voila, duplicate.

I looked at a number of tools and options and found some that were decent, but all of them had always something (or didn’t have something) to make things that should be semi automatic to a manual time consuming ordeal. What I wanted is something that finds duplicates, does not delete them straight out or have me right then go over each dupe found to make a decision about it right away. I also wanted to be able to verify that the dupes found are really identical so it must be easy to know quickly which file is a duplicate of another file.

Non of the solutions that I tried delivered on all those aspects so de-duping of files was painstaking and time consuming, if I did it or I simply would not do it at all and carry a bunch of dupe garbage around .

The idea for this simple script of mine came when I did the sorting and inventory of the SAC art pack releases back in December last year and January this year. I uploaded files, like MODs that I converted to Mpeg-1 Audio Layer 3, short MP3, and other files to my file sharing account at Mediafire.com. I learned that Mediafire.com has a de-duper function build into their system that prevents user from uploading the same file twice to their account. You can upload it a second time, but you will get the message that the file already exists and that your new upload was being deleted. I got that message and first thought that it was an error on Mediafire’s part.

What dupes? There were no dupes, I thought. Yeah, there were some SID Adlib music files that had the same size and all, but there where like a dozen of them, all with a different name and even different prefix, indicating that the files were created by different musicians. Also the dates were different. Being close to send an email to Mediafire customer support, I decided to listen to the songs that could be potential duplicates to the file that was rejected (Mediafire didn’t tell me the file name or shared URL of the original file). And there it was, the same tune with a different file name and different file dates by the same artist.

I was curious about how Mediafire noticed that the files are identical and did some research. They are using the MD5 Check Sum value of a file. Chances are astronomical that two files that are not byte by byte identical will have the same MD5 check sum value. That’s a smart, fast and easy way to find dupes and the idea of writing a de-dupe script that does exactly what I want it to do was born.

This is enough of a background story I believe. Let’s get busy with the Script itself.

How the Script Works

The script detects duplicate files within a directory. Duplicate files are files that have the same MD5 Check Sum value.  Two DIFFERENT/NON IDENTICAL files having the same MD5 Check Sum is not impossible, but highly unlikely. This allows the script to detect duplicate files regardless of their file name or other characteristics, such as "date created" or "date modified".

The tool scans all files within a directory. It does not include files in sub directories of the processed folder.

If a duplicate file is found, it will be renamed by by appending the original file name as prefix with an _ as separator, which is also used to replace the "." that indicates the file extension of the original file name (other "." in the file name itself remain). At the end of the file name is the string [DEDUPED] added.

Example

For Example aFile1.EXT and bFile2.EXT are identical. After the script was executed, one of the two files will remain as it is and the other one is being renamed. Which file will be considered the "original" is determined by which file was found first. The script sorts the files by name first, before it de-dupes them.

In this example bFile1.EXT would be considered the original and bFile2.EXT will be renamed to aFile1_EXT_bFile2[DEDUPED].EXT. This makes dupes appear right after the original, if you sort the directory by file name. To be able to filter the dupes to copy/move them away or to delete them, use the copy, move or del command in MS DOS. For example "DEL *[DEDUPED].*" would delete all duplicate files found and renamed by the script.

More New Stuff

The script creates two files by default in the processed directory:

  • "!DeDupe-FileList.txt" - a list of all files in the directory and their MD5 Check Sum Values (tab separated)
  • "!DeDupeLog.txt" - a processing log file where you can find the list of dupes that were detected, their old & new file name and the corresponding original file

If you do not want any of the files to be created, change the options for "WriteFileList" and "WriteDeDupeLog" to "0" in the beginning of the code of "DedupeFilesInFolder.vbs". Alternatively use the command line options:
/log:[0/1]“ and “/list:[0/1]“ to turn the creation of the list and/or log on/off.

You can also suppress all dialogs via the command line option “/quite:[0/1]”.
/quite:1” would disable the progress dialog, results message and all error messages.

Note, the script returns error levels for batch processing regardless of the "quiet" settings.
The Error Level codes are:

0 = Script Ran Successful
1 = Script Ran, but there were no files to process
2 = The script was aborted (only relevant if progress dialog is on)
4 = Script Error (md5sum.exe not or processing path not found)

Also new, a nice progress dialog using MS Internet Explorer and an extended results message box. Here are some screen shots of the new and updated windows.

DeDupeScript-Progres  DeDupeScript-Results

Installation/De-Installation - Download

Download this small 36 KB ZIP file with the name Roy-DeDupeScript11b.zip and extract the archive to a folder on your local hard drive. The ZIP archive contains the following EIGHT files:

  • DedupeFilesInFolder.vbs
  • DeDupeInstall.bat
  • DedupeInstall.reg
  • DeDupeUnInstall.bat
  • DedupeUnInstall.reg
  • file_id.diz
  • md5sum.exe
  • Readme.txt

    File_ID.diz and Readme.txt are simply text files, md5sum.exe is a 3rd party command line utility that was developed by somebody else and is being needed for my script. The two .BAT files and two .REG are only needed for the installation and de-installation of the tool. The .VBS file is the main tool script written in Visual Basic Script (VBScript).

    Use the provided Batch Scripts "DeDupeInstall.bat" and "DeDupeUnInstall.bat" to install or un-install the De-Dupe Shell Extension.

    Installation

    Double click on the Batch Script File "DeDupeInstall.bat"
    Thats it.

    Notes:
    The install batch file copies md5sum.exe and DedupeFilesInFolder.vbs into your System32 directory under your windows installation directory  and Imports the registry file "DedupeInstall.reg" into your systems registry database. It creates entries under the Registry Key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Directory\shell\

    Non of the files in the installation directory will be needed anymore to run the script itself. You will need them only to uninstall the tool or to re-install it again, if necessary.

    De-installation

    Double click on the Batch Script File "DeDupeUnInstall.bat"
    The Un-Install batch file deletes the three files from your System32 directory and utilizes the registry file DedupeUnInstall.reg to remove the entries for the script from your systems registry database. If you want to continue to use md5sum.exe and only want to disable the shell extension, either simply double click on the file DedupeUnInstall.reg without executing the uninstall batch file (the script DedupeFilesInFolder.vbs will remain in your System32 folder though) or you can copy the tool back into your system folder manually after you ran the uninstall batch file.

    About the Software

    The De-Dupe Windows Explorer Shell Extension Script Tool is written in VBScript and is executed by the system tool WScript.exe. The De-Dupe script (DedupeFilesInFolder.vbs) uses a small support tool that it requires to work properly.

    "md5sum.exe" is a small command line tool that return the MD5 Check Sum value for a file.
    It can also validate MD5 check sums, which is a feature that is not used by the De-Dupe script.
    You can find out more information about it at http://etree.org/md5com.html
    Md5Sum was written by bruce@gridpoint.com

    Legal Stuff/Copyright and Disclaimer

    The 3rd party tool that come with the De-Dupe script is freeware and can be used and copied by anybody without the need of a license or to pay a fee. Since I did not write that tool, I cannot take any responsibility for any issues that they might cause by it, via my script or without out.

    This De-Dupe script is also freeware and can be used, copied and modified for free,

    Important Disclaimer!

    The author, of this software accepts no responsibility for damages resulting from the use of this product and
    makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose.

    This software is provided "AS IS", and you, its user, assume all risks when using it.

    Depending if I have the time and the urge to extend on the script, new features might get added to this tool in the future. I could envision additional configuration options and alternative options for what to do with duplicate files that were found by the script. Since you are free to do modifications to the script yourself and improve on it, I would appreciate, if you would let me know and send me the enhanced version of yours, if you decide to take matters into your own hands. :)

    Change Log

    V1.1

  • MD5Sum Determination Issue Resolved for file names with spaces in it
  • Sorting by File Name Issue Resolved, now the "original" is really the first one sorted by name
  • Progress dialog implemented to show status
  • Quiet option implemented to suppress all dialogs
  • Return of Error Levels implemented for batch scripts that call the script
  • Rename logic changed, [DEDUPED] added to the renamed file in addition to existing logic
  • touch.exe tool removed. It did not work reliable, period
  • File List output with file names and their MD5 checksums implemented
  • Log File output implemented
  • command line parameters introduced to suppress file list and log file creation as well as to enable/disable "quiet" mode
  • general code clean up

    Source Code

    Here is the Source Code of the script. After that is also the code of the install and uninstall batch files and registry key values and settings.

    Again, you can download the whole code and the 3rd part command line tool md5sum.exe in a single ZIP archive called Roy-DeDupeScript11b. You may need a ZIP extracting utility, although Windows XP and later should be able to open the file without the need to install additional software. However, if this does not work for any reasons, either download the commercial program WinZip at this web site or download and install the open source Zip and other archiver’s processing tool called PeaZip.

    DedupeFilesInFolder.vbs Source Code

       1: =======================================================================
       2: Parameters you might want to change
       3: =======================================================================
       4: Specifies the action to take if dupes are found, used by DupeHandling
       5: Values other than 1 are not supported/implemented yet
       6: Dim DupeAction: DupeAction = 1
       7:  
       8: If you want to suppress the progress dialog, the results message popup 
       9: and Error Messages set bQuiet = true
      10: Note, the script returns error levels for batch processing regardless
      11: of the bQuiet settings. The ErrorLevel codes are:
      12: 0 = Script Ran Successful
      13: 1 = Script Ran, but there were no files to process
      14: 2 = The script was aborted (only relevant if progress dialog is on)
      15: 4 = Script Error (md5sum.exe not or processing path not found) 
      16: Dim bQuiet: bQuiet = false
      17:  
      18: LOGFILES
      19: Set to 1 to generate file list with name and md5 sum, set to 0 to disable
      20: Dim WriteFileList: WriteFileList = 1
      21: File name for the file list. File is saved in processing path folder
      22: Dim FileListFName: FileListFName = "!DeDupe-FileList.txt"
      23:  
      24: Write a log file with all dupes that were processed
      25: Dim WriteDeDupeLog: WriteDeDupeLog = 1
      26: File name for the file list. File is saved in processing path folder 
      27: Dim DeDupeLogFName: DeDupeLogFName = "!DeDupeLog.txt"
      28:  
      29: =======================================================================
      30: Dont touch the stuff below this line, unless you know what 
      31: you are doing.
      32: =======================================================================
      33: Dim oFso: set oFso = Wscript.createobject("scripting.fileSystemObject")
      34: Dim oFolder, oFiles, oFile, oLogFile
      35: Dim iCounter: iCounter = 0 File Counter
      36: Dim sFolderPath Work Folder Path (Current folder)
      37: Dim arguments: Set arguments = Wscript.arguments
      38: Dim md5sumPath,sErr, sMsg, sMD5CS
      39: Dim MyArray() Array with MD5 Checksum and File Names
      40: Dim iDupCnt: iDupCnt = 0  Counter for Dupes
      41: Dim iDupErr: iDupErr = 0  Error Count for Dupe Action
      42: Dim iMD5SumErr: iMD5SumErr = 0 Error Count for MD5 Check Sum Calc
      43: Dim iFilesCnt: iFilesCnt = 0 Number of files proc 
      44: Dim iFilesProc:iFilesProc = 0 Number of files processed (iFilesCnt - iMD5SumErr)
      45: Dim iDupProc Dupe Processed Count (iDupCnt - iDupErr) 
      46:  
      47: global const and vars for Statusbar
      48: Const conBarSpeed = 80
      49: Const conForcedTimeOut = 900000
      50: Dim objIE
      51: Dim objProgressBar
      52: Dim objTextLine1
      53: Dim objTextLine2
      54: Dim objQuitFlag
      55:  
      56: Dim bAbort: bAbort = false
      57:  
      58: System Constants
      59: Const SYSTEM_FOLDER = 1, TEMP_FOLDER = 2
      60: Const ForAppending = 8
      61: Const ForReading = 1
      62: Const ForWriting = 2
      63:  
      64: ========================================================================
      65: Initialization of Work Environment
      66:  
      67: if arguments.Named.Exists("quiet") then
      68:  
      69:   if arguments.Named.Item("quiet") = 1 then
      70:     bQuiet = true
      71:   end if
      72:  
      73:   if arguments.Named.Item("quiet") = 0 then
      74:     bQuiet = false
      75:   end if
      76:  
      77: end if
      78:  
      79: if arguments.Named.Exists("list") then
      80:  
      81:   if arguments.Named.Item("list") = 0 or arguments.Named.Item("list") = 1 then
      82:     WriteFileList = arguments.Named.Item("list")
      83:   end if
      84:  
      85: end if
      86:  
      87: if arguments.Named.Exists("log") then
      88:  
      89:   if arguments.Named.Item("log") = 0 or arguments.Named.Item("log") = 1 then
      90:     WriteDeDupeLog = arguments.Named.Item("log")
      91:   end if
      92:  
      93: end if
      94:  
      95: Check for command line paramater passed
      96:  
      97: if arguments.unnamed.count = 0 then
      98:   Set Path to Current Path 
      99:   sFolderPath = ofso.GetAbsolutePathName(".")
     100: else
     101:   Set Path to folder that was passed as argument for the script call
     102:   sFolderPath = arguments.unnamed(0)
     103: end if
     104:  
     105: Make sure that 3rd party tools md5sum.exe and touch.exe are either in 
     106: the System32 directory or the current path (I dont check the whole Path Env)
     107: md5sumPath   = oFso.BuildPath(oFso.GetSpecialFolder(SYSTEM_FOLDER), "md5sum.exe")
     108:  
     109: if not oFso.FileExists(md5sumPath) then
     110:   md5sumPath = oFso.BuildPath(oFso.GetAbsolutePathName("."), "md5sum.exe")
     111:  
     112:   if not oFso.FileExists(md5sumPath) then
     113:     sErr     = sErr & "Md5sum.exe not found in " & oFso.GetSpecialFolder(SYSTEM_FOLDER) & _
     114:     " nor " & oFso.GetAbsolutePathName(".") & vbcrlf & vbcrlf
     115:   end if
     116:  
     117: end if
     118:  
     119: Make sure that the folder (especially the ones passed as Param) exists
     120:  
     121: if not oFso.FolderExists(sFolderPath) then
     122:   sErr = sErr & "Processing Folder: " & sFolderPath & _
     123:   " does not exist." & vbcrlf & vbcrlf
     124: end if
     125:  
     126: If something is not right, show error and abort the script
     127:  
     128: if sErr <> "" then
     129:   if bQuiet = false then Wscript.echo sErr
     130:   CleanUpAndQuit 4
     131: end if
     132:  
     133: Dim sLogOutput: sLogOutput = oFso.BuildPath(sFolderPath,DeDupeLogFName)
     134:  
     135: Okay.. Lets get started
     136: ------------------------------------------------------------------------
     137:  
     138: Set oFolder = oFso.GetFolder(sFolderPath)
     139: Set oFiles  = oFolder.Files
     140: iFilesCnt   = oFiles.count
     141:  
     142: if iFilesCnt > 0 then
     143:   ReDim MyArray(oFiles.count,3)
     144:   Build 2 Dimensional Array with CheckSum of 
     145:   Filename & File Name itself for all files in 
     146:   current directory. Looking like this 
     147:   (x = dimention 2 and y = dimention 1) 
     148:   the 3rd column is MD5 +[]+ lower case file name for sorting purposes
     149:   43a52d14577de0299146aa9f8f0c062f, file1.ext, 43a52d14577de0299146aa9f8f0c062f[]file1.ext 
     150:   0052d12577de56567546aa9f8f0c0af3, file2.ext, 0052d12577de56567546aa9f8f0c0af3[]file2.ext
     151:  
     152:   if bQuiet = false then
     153:     Launch Status Bar
     154:     StartIE "De-Dupeing Files in " & sFolderPath
     155:     SetLine1 "Step 1/4: Reading Files and MD5 Check Sums. Path:" & sFolderPath
     156:   end if
     157:  
     158:   For each oFile in oFiles
     159:     iCounter = iCounter + 1
     160:     sMD5CS   = GetMd5Sum(oFile.name)
     161:     MyArray(iCounter - 1,0) = sMD5CS
     162:     MyArray(iCounter - 1,1) = oFile.name
     163:     MyArray(iCounter - 1,2) = sMD5CS & "[]" & lcase(oFile.name)
     164:     Check if Abort Button was pressed
     165:  
     166:     if bQuiet = false then
     167:  
     168:       If IsQuit() = True Then
     169:         bAbort = true
     170:         Exit For
     171:       End If
     172:  
     173:       Set Status Bar Value   
     174:       SetLine2 "Files Processed: " & CStr(iCounter) & " of " & cstr(iFilesCnt)
     175:     end if
     176:  
     177:   Next
     178:  
     179: end if
     180:  
     181: iCounter = iCounter - 1
     182:  
     183: if bAbort = true and bQuiet = false then
     184:   Close Status Bar
     185:   CloseIE
     186: end if
     187:  
     188: if iCounter >= 0 and bAbort = false then
     189:  
     190:   if bAbort = false then
     191:  
     192:     if bQuiet = false then
     193:       Set Status Bar Value     
     194:       SetLine1 "Step 2/4: Sort Files"
     195:       SetLine2 "Processing " & cstr(iCounter - 1) & " Files"
     196:     end if
     197:  
     198:     Sort the Array by File Name
     199:     Call QuickSort(MyArray,0,ubound(MyArray,1),2)
     200:  
     201:     if bQuiet = false then
     202:       Check if Abort Button was pressed
     203:  
     204:       If IsQuit() = True Then
     205:         bAbort = true
     206:       End If
     207:  
     208:     end if
     209:  
     210:   end if
     211:  
     212:   if WriteFileList = 1 then
     213:     Write File List out into Text File
     214:  
     215:     if bQuiet = false then
     216:       Set Status Bar Value
     217:       SetLine1 "Step 3/4: Writing File List"
     218:       SetLine2 oFso.BuildPath(sFolderPath,FileListFName)
     219:     end if
     220:  
     221:     Call WriteFile(MyArray)
     222:     Check if Abort Button was pressed
     223:  
     224:     if bQuiet = false then
     225:  
     226:       If IsQuit() = True Then
     227:         bAbort = true
     228:       End If
     229:  
     230:     end if
     231:  
     232:   end if
     233:  
     234:   if bAbort = false then
     235:  
     236:     if bQuiet = false then
     237:       Set Status Bar Value
     238:       SetLine1 "Step 4/4: Detect and Process Duplicates"
     239:       SetLine2 ""
     240:     end if
     241:  
     242:     Detect Duplicates     
     243:     Call FindDupes(MyArray)
     244:  
     245:     Wrapping up
     246:     iDupProc = iDupCnt - iDupErr
     247:  
     248:     if bQuiet = false then
     249:       Close Status Bar
     250:       CloseIE
     251:     end if
     252:  
     253:     sMsg = "Number of Files Found: " & iFilesCnt & vbcrlf & _
     254:     "Number of MD5 Sum Errors: " & iMD5SumErr & vbcrlf & _
     255:     "Number of Files Processed: " & iFilesProc & vbcrlf & _
     256:     "------------------------------------" & vbcrlf & _
     257:     "Number of Dupes Found: " & iDupCnt & vbcrlf & _
     258:     "Number of Dupe Processing Errors: " & iDupErr & vbcrlf & _
     259:     "Number of Dupes Processed: " & iDupProc & vbcrlf
     260:  
     261:     ErrorLogWrite "Number of Files Found: " & iFilesCnt
     262:     ErrorLogWrite "Number of MD5 Sum Errors: " & iMD5SumErr
     263:     ErrorLogWrite "Number of Files Processed: " & iFilesProc
     264:     ErrorLogWrite "Number of Dupes Found: " & iDupCnt
     265:     ErrorLogWrite "Number of Dupe Processing Errors: " & iDupErr
     266:     ErrorLogWrite "Number of Dupes Processed: " & iDupProc
     267:  
     268:     if WriteFileList = 1 then
     269:       sMsg = sMsg & vbcrlf & "List of Files Generated at:" & vbcrlf & _
     270:       oFso.BuildPath(sFolderPath,FileListFName) & vbcrlf
     271:       ErrorLogWrite "List of Files Generated at: " & _
     272:       oFso.BuildPath(sFolderPath,FileListFName)
     273:     end if
     274:  
     275:     if WriteDeDupeLog = 1 then
     276:       sMsg = sMsg & vbcrlf & "Log File Generated at: " & vbcrlf & sLogOutput
     277:     end if
     278:  
     279:     if bQuiet = false then
     280:       WScript.echo sMSg
     281:     end if
     282:  
     283:   else
     284:  
     285:     if bQuiet = false then
     286:       Close Status Bar
     287:       CloseIE
     288:     end if
     289:  
     290:   end if
     291:  
     292: else
     293:  
     294:   if bAbort = false then
     295:     No Files Found to dedupe
     296:  
     297:     if bQuiet = false then
     298:       Wscript.echo "No Files to de-dupe found in " & sFolderPath
     299:     end if
     300:  
     301:     CleanUpAndQuit 1
     302:   end if
     303:  
     304: end if
     305:  
     306: if bAbort = true then
     307:   Aborted Message
     308:  
     309:   if bQuiet = false then
     310:     Wscript.echo "The De-Dupe Script Was abborted."
     311:   end if
     312:  
     313:   CleanUpAndQuit 2
     314: end if
     315:  
     316: CleanUpAndQuit 0
     317:  
     318: ==============================================================================
     319: Function GetMd5Sum(ByVal strFile)
     320:    Declare the FileSystemObject object constants and variables.
     321:   Dim objTS, strTempFile, strCmdLine, objRE
     322:  
     323:   With oFso
     324:      Construct a temporary filename.
     325:     Do
     326:     strTempFile = .BuildPath(.GetSpecialFolder(TEMP_FOLDER), "!" & .GetTempName)
     327:   Loop While .FileExists(strTempFile)
     328:  
     329:   Use cmd.exe to construct a command that will execute md5sum.exe
     330:   strCmdLine = .BuildPath(.GetSpecialFolder(SYSTEM_FOLDER), "cmd.exe") _
     331:    & " /c " & md5sumPath & "  """ & strFile & """>" & strTempFile
     332:  
     333: End With
     334:  
     335:  Execute the command in a hidden window. Wait for the command
     336:  to complete before continuing.
     337: CreateObject("WScript.Shell").Run strCmdLine, 0, True
     338:  
     339:  Open the temporary file.
     340: s         = ""
     341: On Error Resume Next
     342: Set objTS = oFso.OpenTextFile(strTempFile, 1)
     343: s         = objTS.ReadAll
     344: On Error Goto 0
     345:  
     346: check that it didnt fail and has the checksum
     347:  
     348: if trim(s) <> "" and instr(s," *") > 0 then
     349:   GetMD5Sum  = left(s,instr(s," *") - 1)
     350:   iFilesProc = iFilesProc + 1
     351: else
     352:   Error... not good
     353:   iMD5SumErr = iMD5SumErr + 1
     354:   GetMD5Sum  = ""
     355: end if
     356:  
     357: objTS.Close
     358: oFso.DeleteFile strTempFile
     359: End Function
     360:  
     361: ==================================================================================
     362: Array Sort Functions
     363: Sub SwapRows(ary,row1,row2)
     364: == This proc swaps two rows of an array 
     365: Dim x,tempvar
     366:  
     367: For x = 0 to Ubound(ary,2)
     368:   tempvar     = ary(row1,x)
     369:   ary(row1,x) = ary(row2,x)
     370:   ary(row2,x) = tempvar
     371: Next
     372:  
     373: End Sub  SwapRows
     374: Sub QuickSort(vec,loBound,hiBound,SortField)
     375: ==--------------------------------------------------------==
     376: == Sort a 2 dimensional array on SortField        ==
     377: ==                            ==
     378: == This procedure is adapted from the algorithm given in: ==
     379: ==  ~ Data Abstractions & Structures using C++ by ~   ==
     380: ==  ~ Mark Headington and David Riley, pg. 586  ~   ==
     381: == Quicksort is the fastest array sorting routine for   ==
     382: == unordered arrays.  Its big O is  n log n         ==
     383: ==                            ==
     384: == Parameters:                      ==
     385: == vec     - array to be sorted             ==
     386: == SortField - The field to sort on (2nd dimension value) ==
     387: == loBound and hiBound are simply the upper and lower   ==
     388: ==   bounds of the arrays 1st dimension.  Its probably  ==
     389: ==   easiest to use the LBound and UBound functions to  ==
     390: ==   set these.                       ==
     391: ==--------------------------------------------------------==
     392: Dim pivot(),loSwap,hiSwap,temp,counter
     393: Redim pivot (Ubound(vec,2))
     394:  
     395: == Two items to sort
     396:  
     397: if hiBound - loBound = 1 then
     398:  
     399:   if vec(loBound,SortField) > vec(hiBound,SortField) _
     400:     then Call SwapRows(vec,hiBound,loBound)
     401:   End If
     402:  
     403:   == Three or more items to sort
     404:  
     405:   For counter = 0 to Ubound(vec,2)
     406:     pivot(counter)       = vec(int((loBound + hiBound) / 2),counter)
     407:     vec(int((loBound + hiBound) / 2),counter) = vec(loBound,counter)
     408:     vec(loBound,counter) = pivot(counter)
     409:   Next
     410:  
     411:   loSwap = loBound + 1
     412:   hiSwap = hiBound
     413:  
     414:   do
     415:   == Find the right loSwap
     416:   while loSwap < hiSwap and vec(loSwap,SortField) <= pivot(SortField)
     417:   loSwap = loSwap + 1
     418:   wend
     419:   == Find the right hiSwap
     420:   while vec(hiSwap,SortField) > pivot(SortField)
     421:   hiSwap = hiSwap - 1
     422:   wend
     423:   == Swap values if loSwap is less then hiSwap
     424:   if loSwap < hiSwap then Call SwapRows(vec,loSwap,hiSwap)
     425:  
     426: loop while loSwap < hiSwap
     427:  
     428: For counter = 0 to Ubound(vec,2)
     429:   vec(loBound,counter) = vec(hiSwap,counter)
     430:   vec(hiSwap,counter)  = pivot(counter)
     431: Next
     432:  
     433: == Recursively call function .. the beauty of Quicksort
     434: == 2 or more items in first section
     435: if loBound < (hiSwap - 1) then Call QuickSort(vec,loBound,hiSwap - 1,SortField)
     436: == 2 or more items in second section
     437: if hiSwap + 1 < hibound then Call QuickSort(vec,hiSwap + 1,hiBound,SortField)
     438:  
     439: End Sub  QuickSort
     440: Sub PrintArray(vec,lo,hi,mark)
     441: ==-----------------------------------------==
     442: == Print out an array from the lo bound  ==
     443: ==  to the hi bound.  Highlight the column ==
     444: ==  whose number matches parm mark     ==
     445: ==-----------------------------------------==
     446: Dim i,j
     447: sRes = ""
     448:  
     449: For i = lo to hi
     450:  
     451:   For j = 0 to Ubound(vec,2)
     452:     sRes = sRes & vec(i,j) & vbTab & vbTab
     453:   Next
     454:  
     455:   sRes   = sRes & vbcrlf
     456: Next
     457:  
     458: wscript.echo sRes
     459: End Sub
     460:  
     461: ===================================================================================
     462: Actual De-Duper Functions
     463: Sub FindDupes(Arr)
     464: Dim a, b, s, iCnt, sOrg
     465: sKey = ""
     466: iCnt = Ubound(Arr,1)
     467:  
     468: For a = 0 to iCnt
     469:  
     470:   s = trim(Arr(a,0))
     471:  
     472:   if s <> "" then
     473:  
     474:     if sKey = "" then
     475:       First CheckSum Value in Array, Set Key, dont check further 
     476:       sKey = s
     477:       sOrg = Arr(a,1)
     478:     else
     479:       CheckSum from previous file set, check if identical
     480:  
     481:       if s = sKey then
     482:         Dupe
     483:         DupeHandling s,Key, Arr(a,1), sOrg
     484:       else
     485:         Set key to Checksum of new file, because it is different
     486:         sKey = s
     487:         sOrg = Arr(a,1)
     488:       end if
     489:  
     490:     end if
     491:  
     492:   end if
     493:  
     494:   if bQuiet = false then
     495:     Set Status Bar Value   
     496:     SetLine2 "Files Processed: " & CStr(a + 1) & " of " & cstr(iCnt + 1)
     497:  
     498:     Check if Abort Button was pressed
     499:  
     500:     If IsQuit() = True Then
     501:       Exit For
     502:       bAbort = true
     503:     End If
     504:  
     505:   end if
     506:  
     507: Next
     508:  
     509: End Sub
     510:  
     511: Sub DupeHandling(MD5dupe, MD5Org, FNameDupe, FNameOrg)
     512: Here is where You decide what to do with the found duplicate
     513: You could for example perform additional checks 
     514: beyond the MD5 Checksum also
     515: Dim sSrc, sOrg, sOrgExt, sOrgBase, sDest, sDestName, sDestExt, sDestBase
     516: Increase Dupe Counter
     517: iDupCnt = iDupCnt + 1
     518:  
     519: Determine the action to take
     520:  
     521: Select Case DupeAction
     522:   Case 1
     523:     Rename Dupe by appending Original File name as prefix with 
     524:     an _ as separator. Also the extension of the original file
     525:     Full Path of Dupe File 
     526:     sSrc    = oFso.BuildPath(sFolderPath,FNameDupe)
     527:     FUll Path of Org File
     528:     sOrg    = oFso.BuildPath(sFolderPath,FNameOrg)
     529:     Get Extension of Org File
     530:     sOrgExt = oFso.GetExtensionName(sOrg)
     531:  
     532:     Get Base File Name of Org File without Extension
     533:     sOrgBase = left(FNameOrg,    InStrRev(FNameOrg, "." & sOrgExt, - 1,1) - 1)
     534:  
     535:     Build New File name/path for Dupe Path\OrgBase_OrgExt_DupeBase.DupeExt
     536:     sDestExt  = oFso.GetExtensionName(FNameDupe)
     537:     sDestBase = left(FNameDupe,    InStrRev(FNameDupe, "." & sDestExt, - 1,1) - 1)
     538:     sDestName = sOrgBase & "_" & sOrgExt & "_" & sDestBase & "[DEDUPED]" & "." & sDestExt
     539:     sDest     = oFso.BuildPath(sFolderPath, sDestName)
     540:  
     541:     Move
     542:  
     543:     if oFso.FileExists(sDest) then
     544:       New File already exist, cannot rename dupe, Increase Dupe Processing Error Count
     545:       iDupErr = iDupErr + 1
     546:       ErrorLogWrite "Rename Failed! Org: " & FNameOrg & ", Dupe Src: , " & _
     547:       FNameDupe & ", Dest: " & sDestName
     548:     Else
     549:       oFso.MoveFile sSrc, sDest
     550:       ErrorLogWrite "Dupe Processed! Org: " & FNameOrg & ", Dupe Src: , " & _
     551:       FNameDupe & ", Dest: " & sDestName
     552:     End if
     553:  
     554:   Case Else
     555:     Not implemented yet
     556: End Select
     557:  
     558: End Sub
     559:  
     560: ==============================================================
     561: Support Funtions
     562: Sub WriteFile(arr)
     563: Write List of Files with their MD5 Sums to a Text file
     564: Dim a loop count
     565:  
     566: Dim f: f = oFso.BuildPath(sFolderPath,FileListFName)
     567: Check if an old Listings File Already Exists and Delete it
     568:  
     569: if oFso.FileExists(f) then
     570:   oFso.DeleteFile f, true
     571: end if
     572:  
     573: Dim oF: Set oF = oFso.OpenTextFile(f, ForAppending, true, - 2)
     574:  
     575: File Name + TAB + MD5 Sum of File
     576:  
     577: For a = 0 to Ubound(arr,1)
     578:   oF.writeline trim(arr(a,1)) & vbtab & trim(arr(a,0))
     579: Next
     580:  
     581: oF.Close
     582: Set oF = Nothing
     583: End Sub
     584:  
     585: Function ErrorLogWrite(sErrLogMsg)
     586:  
     587: Dim bOpenLog: bOpenLog = false
     588: Dim sFullErrMsg
     589:  
     590: if WriteDeDupeLog = 1 then
     591:  
     592:   if not isObject(oLogfile) then
     593:     set oLogfile = nothing
     594:   end if
     595:  
     596:   if not (oLogfile is nothing) then
     597:   else
     598:     bOpenLog = true
     599:   end if
     600:  
     601:   if  bOpenLog = true then
     602:     Set oLogfile = oFSO.OpenTextFile(sLogOutput, ForWriting, True, - 2)
     603:     ErrorLogWrite("----------------------------------------------")
     604:     ErrorLogWrite("New DeDupe Batch Started")
     605:     ErrorLogWrite("Work Path: " & sFolderPath)
     606:     ErrorLogWrite("-----------------------------------------------")
     607:   end if
     608:  
     609:   sFullErrMsg = LogDateFormat(now) & chr(9) & sErrLogMsg
     610:  
     611:   oLogFile.Writeline sFullErrMsg
     612:  
     613: end if
     614:  
     615: end function
     616:  
     617: function LogDateFormat(dSourceDate)
     618: Const sLogDtNumbers = "0000"
     619: Dim sLgDtYYYY, sLgDtMM, sLgDtDD, sLgDtHH, sLgDtNN, sLgDtSS
     620:  
     621: sLgDtYYYY     = right(sLogDtNumbers & year(dSourceDate),4)
     622: sLgDtMM       = right(sLogDtNumbers & month(dSourceDate),2)
     623: sLgDtDD       = right(sLogDtNumbers & day(dSourceDate),2)
     624: sLgDtHH       = right(sLogDtNumbers & hour(dSourceDate),2)
     625: sLgDtNN       = right(sLogDtNumbers & minute(dSourceDate),2)
     626: sLgDtSS       = right(sLogDtNumbers & second(dSourceDate),2)
     627: LogDateFormat = sLgDtYYYY & "-" & sLgDtMM & "-" & sLgDtDD & _
     628: " " & sLgDtHH & ":" & sLgDtNN & ":" & sLgDtSS
     629: End Function
     630:  
     631: =================================================================
     632: Progress Bar Code
     633:  
     634: --------------------------------------------------------
     635:  Function   StartIE
     636:  Abstract   Launch IE Dialog Box and Progress bar
     637:  Parameters Titel of the box
     638: --------------------------------------------------------
     639:  
     640: Private Sub StartIE(strTitel)
     641: Dim objDocument
     642: Dim objWshShell
     643:  
     644: Set objIE        = CreateObject("InternetExplorer.Application")
     645: objIE.height     = 230
     646: objIE.width      = 400
     647: objIE.menubar    = False
     648: objIE.toolbar    = false
     649: objIE.statusbar  = false
     650: objIE.addressbar = false
     651: objIE.resizable  = False
     652: objIE.navigate ("about:blank")
     653:  
     654:  wait till ie is loaded
     655: While (objIE.busy)
     656: wend
     657:  
     658: set objDocument = objIE.document
     659:  setup the dialog box  
     660: WriteHtmlToDialog objDocument, strTitel
     661:  
     662:  with ie/html loaded, define assorted objects...
     663: set objTextLine1       = objIE.document.all("txtMilestone")
     664: set objTextLine2       = objIE.document.all("txtRemarks")
     665: Set objProgressBar     = objIE.document.all("pbText")
     666: set objQuitFlag        = objIE.document.Secret.pubFlag
     667:  
     668: objTextLine1.innerTEXT = ""
     669: objTextLine2.innerTEXT = ""
     670:  
     671:  objIE.document.body.innerHTML = "Building Document..."
     672:  + "<br>load time= " + n
     673: objIE.visible = True
     674:  
     675:  set focus to ie 
     676: Set objWSHShell = WScript.CreateObject("WScript.Shell")
     677: objWshShell.AppActivate("Microsoft Internet Explorer")
     678: End Sub
     679:  
     680: --------------------------------------------------------
     681:  Function CloseIE
     682:  Abstract Close the IE Browser Windows
     683: --------------------------------------------------------
     684:  
     685: Private Function CloseIE()
     686: On Error Resume Next
     687: objIE.quit
     688: End Function
     689:  
     690: --------------------------------------------------------
     691:  Function   SetLine1
     692:  Abstract   Set Text Line in the Progress Bar Dialog Box
     693:  Parameters Progress Text
     694: --------------------------------------------------------
     695:  
     696: Private sub SetLine1(sNewText)
     697: On Error Resume Next
     698: objTextLine1.innerTEXT = sNewText
     699: End Sub
     700:  
     701: --------------------------------------------------------
     702:  Function   SetLine2
     703:  Abstract   Set Text Line in the Progress Bar Dialog Box
     704:  Parameters Progress Text
     705: --------------------------------------------------------
     706:  
     707: Private sub SetLine2(sNewText)
     708: On Error Resume Next
     709: objTextLine2.innerTEXT = sNewText
     710: End Sub
     711:  
     712: --------------------------------------------------------
     713:  Function   IsQuit
     714:  Abstract   Checks if the quit button was pressed
     715:  Parameters Progress Text
     716: --------------------------------------------------------
     717:  
     718: Private function IsQuit()
     719: On Error Resume Next
     720: IsQuit   = False
     721:  
     722: If objQuitFlag.Value = "quit" Then
     723:   IsQuit = True
     724: End If
     725:  
     726: End function
     727:  
     728: --------------------------------------------------------
     729:  Function   WriteHtmlToDialog
     730:  Abstract   Set HTML Text for the IE Dialog box
     731:  Parameters IE Document Object, Title Text
     732: --------------------------------------------------------
     733:  
     734: Private Sub WriteHtmlToDialog(objDocument, strTitel)
     735: objDocument.Open
     736: objDocument.Writeln "<title>" & strTitel & "</title> "
     737: objDocument.Writeln "<style>"
     738: objDocument.Writeln " BODY {background: Silver} BODY { overflow:hidden }"
     739: objDocument.Writeln " P.txtStyle {color: Black; font-family: Arial; " _
     740:  & " font-size: 10pt; font-weight: normal; margin-left: 10px; " _
     741:  & " width: 340px } "
     742: objDocument.Writeln " input.pbStyle {color: Navy; font-family: Wingdings; " _
     743:  & " font-size: 10pt; background: Silver; height: 20px; " _
     744:  & " width: 340px } "
     745: objDocument.Writeln "</style>"
     746: objDocument.Writeln "<div id=""objProgress"" class=""Outer""></div>"
     747:  write out text lines... 
     748: objDocument.Writeln "<P id=txtMilestone class=txtStyle style=margin-left: 10px> </P>"
     749: objDocument.Writeln "<P id=txtRemarks class=txtStyle style=margin-left: 10px ></P>"
     750: objDocument.Writeln "<CENTER>"
     751:  write progbar
     752: objDocument.Writeln "<input type=text id=pbText class=pbStyle value= >"
     753: objDocument.Writeln "<br><br>"  space down a little
     754:  write cancel button...
     755: objDocument.Writeln "<input type=button value=Cancel " _
     756:  & " onclick=SetReturnFlag(""quit"") >"
     757: objDocument.Writeln "</CENTER>"
     758:  write hidden object...
     759: objDocument.Writeln "<form name=secret >" _
     760:  & " <input type=hidden name=pubFlag value=run >" _
     761:  & "</form>"
     762: objDocument.Writeln "<SCRIPT LANGUAGE=VBScript >"
     763:  write "local script" to handle cmdCancel_Click event...
     764: objDocument.Writeln "Sub SetReturnFlag(sFlag)"
     765: objDocument.Writeln " secret.pubFlag.Value = sFlag"
     766: objDocument.Writeln " txtMileStone.style.color = ""Red"" "
     767: objDocument.Writeln " txtRemarks.style.color = ""Red"" "
     768: objDocument.Writeln "End Sub"
     769:  progress bar
     770: objDocument.Writeln "Function PctComplete(nPct)"
     771: objDocument.Writeln "pbText.Value = String(nPct,"" "") & String(4,""n"")"
     772: objDocument.Writeln "End Function"
     773:  calc progress bar and direction
     774: objDocument.Writeln "Sub UpdateProgress()"
     775: objDocument.Writeln "Dim intStep"
     776: objDocument.Writeln "Dim intDirection"
     777: objDocument.Writeln "If (IsNull(objProgress.getAttribute(""Step"")) = True) Then"
     778: objDocument.Writeln "intStep = 0"
     779: objDocument.Writeln "Else"
     780: objDocument.Writeln "intStep = objProgress.Step"
     781: objDocument.Writeln "End If"
     782: objDocument.Writeln "if (IsNull(objProgress.GetAttribute(""Direction""))=True) Then"
     783: objDocument.Writeln "intDirection = 0"
     784: objDocument.Writeln "Else"
     785: objDocument.Writeln "intDirection = objProgress.Direction"
     786: objDocument.Writeln "End If"
     787: objDocument.Writeln "if intDirection=0 then"
     788: objDocument.Writeln "intStep = intStep + 1"
     789: objDocument.Writeln "else"
     790: objDocument.Writeln "intStep = intStep - 1"
     791: objDocument.Writeln "end if"
     792: objDocument.Writeln "Call PctComplete(intStep)"
     793: objDocument.Writeln "if intStep>=23 then"
     794: objDocument.Writeln "intDirection=1"
     795: objDocument.Writeln "end if"
     796: objDocument.Writeln "if intStep<=0 then"
     797: objDocument.Writeln "intDirection=0"
     798: objDocument.Writeln "end if"
     799: objDocument.Writeln "objProgress.SetAttribute ""Step"", intStep"
     800: objDocument.Writeln "objProgress.SetAttribute ""Direction"", intDirection"
     801: objDocument.Writeln "Window.setTimeout GetRef(""UpdateProgress""), " & conBarSpeed
     802: objDocument.Writeln "End Sub"
     803:  timeout function
     804: objDocument.Writeln "Sub DialogHardTimeout()"
     805:   objDocument.Writeln "SetReturnFlag(""quit"")"
     806: objDocument.Writeln "End sub"
     807: objDocument.Writeln "Sub Window_OnLoad()"
     808: objDocument.Writeln "theleft = (screen.availWidth - document.body.clientWidth) / 2"
     809: objDocument.Writeln "thetop = (screen.availHeight - document.body.clientHeight) / 2"
     810: objDocument.Writeln "window.moveTo theleft,thetop"
     811: objDocument.Writeln "Window.setTimeout GetRef(""UpdateProgress""), " & conBarSpeed
     812: objDocument.Writeln "Window.setTimeout GetRef(""DialogHardTimeout""), " & conForcedTimeOut
     813: objDocument.Writeln "End Sub"
     814: objDocument.Writeln "</SCRIPT>"
     815: objDocument.Close
     816: End Sub
     817:  
     818: Sub CleanUpAndQuit(RetCode)
     819: House Cleaning
     820:  
     821: if not isObject(oLogfile) then
     822:   set oLogfile = nothing
     823: end if
     824:  
     825: if not (oLogfile is nothing) then
     826: else
     827:   oLogFile.Close
     828:   set oLogfile = nothing
     829: end if
     830:  
     831: Set oFso = Nothing
     832: WScript.Quit(RetCode)
     833:  
     834: End Sub

     

    DeDupeInstall.bat

       1: @echo off
       2: Cls
       3: echo.
       4: echo  Installing De-dupe Shell Extension and Its Support Tools
       5: echo =============================================================
       6: echo.
       7:  
       8: echo   1. Copy md5sum.exe to %SystemRoot%\system32\
       9: copy md5sum.exe %SystemRoot%\system32\md5sum.exe
      10: echo.
      11:  
      12: echo   2. Copy DedupeFilesInFolder.vbs to %SystemRoot%\system32\
      13: copy DedupeFilesInFolder.vbs %SystemRoot%\system32\DedupeFilesInFolder.vbs
      14: echo.
      15:  
      16: echo   3.Register "DeDupe" Directory Shell Extension
      17: regedit /s DedupeInstall.reg
      18: echo.
      19:  
      20: echo   4. done... DeDupe Shell Extension Installed Successfully
      21: echo.
      22: echo.
      23: pause

    DeDupeInstall.reg

    Note the default registry value in HEX. This was necessary to be able to add the value:
    wscript.exe %SystemRoot%\system32\DedupeFilesInFolder.vbs”  to the registry. Notice the %SystemRoot% environment variable? That needs to be expanded by windows, when you activate the shell extension. If you would create the value as Reg_SZ string, it would not work.

    It needs to be an expandable string (because it will be expanded at runtime), REG_Expand_SZ. Expandable string values can only be created by the registry import, if you provide the value of a string in hexadecimal format. This tells Windows to create REG_EXPAND_SZ value instead of the basic REG_SZ.After each character follows the hex value 00 and at the end two more 00 are added as well. Why this is necessary, I have no idea, but this is how you have to do it.

       1: Windows Registry Editor Version 5.00
       2:  
       3: [HKEY_CLASSES_ROOT\Directory\shell\DeDupe]
       4: @="DeDupe"
       5:  
       6: [HKEY_CLASSES_ROOT\Directory\shell\DeDupe\command]
       7: @=hex(2):77,00,73,00,63,00,72,00,69,00,70,00,74,00,2E,00,65,00,78,00,65,00,20,\
       8:   00,25,00,53,00,79,00,73,00,74,00,65,00,6D,00,52,00,6F,00,6F,00,74,00,25,00,\
       9:   5C,00,73,00,79,00,73,00,74,00,65,00,6D,00,33,00,32,00,5C,00,44,00,65,00,64,\
      10:   00,75,00,70,00,65,00,46,00,69,00,6C,00,65,00,73,00,49,00,6E,00,46,00,6F,00,\
      11:   6C,00,64,00,65,00,72,00,2E,00,76,00,62,00,73,00,00,00

    DeDupeUnInstall.bat

       1: @echo off
       2: CLS
       3: echo.
       4: echo  Un-Installing De-dupe Shell Extension and Its Support Tools
       5: echo =============================================================
       6: echo.
       7:  
       8: echo   1. Delete md5sum.exe from %SystemRoot%\system32\
       9: if EXIST %SystemRoot%\system32\md5sum.exe del /Q /F "%SystemRoot%\system32\md5sum.exe"
      10: echo.
      11:  
      12: echo   2. Delete DedupeFilesInFolder.vbs from %SystemRoot%\system32\
      13: if EXIST %SystemRoot%\system32\DedupeFilesInFolder.vbs del /Q /F 
      14:          "%SystemRoot%\system32\DedupeFilesInFolder.vbs"
      15: echo.
      16:  
      17: echo   3. Unregister "DeDupe" Directory Shell Extension 
      18: regedit /s DedupeUnInstall.reg
      19: echo.
      20:  
      21: echo   4. done... DeDupe was Uninstalled Successfully
      22: echo.
      23: echo.
      24: pause

    DedupeUnInstall.reg

       1: Windows Registry Editor Version 5.00
       2:  
       3: [-HKEY_CLASSES_ROOT\Directory\shell\DeDupe]
       4: @="DeDupe"

    This is another tool created by Carsten Cumbrowski aka Roy/SAC and I hope that you will find this one helpful as well. For comments, praise or complaints, please use the comments section of the blog below. Thanks.

    Cheers!

    Carsten aka Roy/SAC

  • No comments:

    Post a Comment

    Hi, thanks for taking the time to comment at my blog.

    Due to spam issues comments are not immediately posted on the site and require my manual approval first, before they become visible.

    I try to approve comments as quickly as possible and usually within 24 hours.

    To be notified about follow up comments that are made after yours, use the subscribe option with your email address and you will receive an email alert, if somebody else comments at this post in the future.

    Also check out the rest of the website beyond this blog, visit RoySAC.com. Also see my YouTube channels, SACReleases for intros and demos.

    Cheers!
    Carsten aka Roy/SAC

    Note: Only a member of this blog may post a comment.