You can skip the background story about the why, when and how I decided to write this little File De-Duping script, if you are not interested in it and jump right the section about how the script works, what it does, where to download it, how to install and de-install it and the source code of it as well. The tool is Freeware, but any donation (money or goods) and/or simply a “Thank You” (e.g. via the comments section at the end of this post) are appreciated nevertheless.
Important Node: I updated the script and this post, because of some bugs that I found and issues with the 3rd party tool “touch.exe”. I had to remove it and come up with another solution for the problem that it solved. I also added some nice stuff, so it was more than just bug fixing :).
The Background Story
Who did not have the problem yet to have hundreds of any type of files, text files, documents, spread sheets, images, videos and others in one directory with the a high chance that you have duplicates, identical files that only have a different file name. I often have to deal with duplicate images that I downloaded from the internet.
I cannot always remember, if I downloaded a particular image already or not and go by the slogan, archive / save / back up first, sort later, because there will be nothing to sort, if you don’t save a copy and go back to the place where you found them at a later time, to learn that the stuff isn’t there anymore for any reason. Maybe even the whole web site is gone the way of the Dodo.
The original file names are often useless, either to generic, like image1.jpg, logo.gif or simply 1.jpg, 2.jpg etc. or they are long and cryptic without any meaning, like: 3104458219_cc0dfd3980_o_d.jpg or something like that. So you end up giving the files your own name and thus make it highly likely that you download the same file at a different time again and give it a different file name than you did the first time around. Voila, duplicate.
I looked at a number of tools and options and found some that were decent, but all of them had always something (or didn’t have something) to make things that should be semi automatic to a manual time consuming ordeal. What I wanted is something that finds duplicates, does not delete them straight out or have me right then go over each dupe found to make a decision about it right away. I also wanted to be able to verify that the dupes found are really identical so it must be easy to know quickly which file is a duplicate of another file.
Non of the solutions that I tried delivered on all those aspects so de-duping of files was painstaking and time consuming, if I did it or I simply would not do it at all and carry a bunch of dupe garbage around .
The idea for this simple script of mine came when I did the sorting and inventory of the SAC art pack releases back in December last year and January this year. I uploaded files, like MODs that I converted to Mpeg-1 Audio Layer 3, short MP3, and other files to my file sharing account at Mediafire.com. I learned that Mediafire.com has a de-duper function build into their system that prevents user from uploading the same file twice to their account. You can upload it a second time, but you will get the message that the file already exists and that your new upload was being deleted. I got that message and first thought that it was an error on Mediafire’s part.
What dupes? There were no dupes, I thought. Yeah, there were some SID Adlib music files that had the same size and all, but there where like a dozen of them, all with a different name and even different prefix, indicating that the files were created by different musicians. Also the dates were different. Being close to send an email to Mediafire customer support, I decided to listen to the songs that could be potential duplicates to the file that was rejected (Mediafire didn’t tell me the file name or shared URL of the original file). And there it was, the same tune with a different file name and different file dates by the same artist.
I was curious about how Mediafire noticed that the files are identical and did some research. They are using the MD5 Check Sum value of a file. Chances are astronomical that two files that are not byte by byte identical will have the same MD5 check sum value. That’s a smart, fast and easy way to find dupes and the idea of writing a de-dupe script that does exactly what I want it to do was born.
This is enough of a background story I believe. Let’s get busy with the Script itself.
How the Script Works
The script detects duplicate files within a directory. Duplicate files are files that have the same MD5 Check Sum value. Two DIFFERENT/NON IDENTICAL files having the same MD5 Check Sum is not impossible, but highly unlikely. This allows the script to detect duplicate files regardless of their file name or other characteristics, such as "date created" or "date modified".
The tool scans all files within a directory. It does not include files in sub directories of the processed folder.
If a duplicate file is found, it will be renamed by by appending the original file name as prefix with an _ as separator, which is also used to replace the "." that indicates the file extension of the original file name (other "." in the file name itself remain). At the end of the file name is the string [DEDUPED] added.
Example
For Example aFile1.EXT and bFile2.EXT are identical. After the script was executed, one of the two files will remain as it is and the other one is being renamed. Which file will be considered the "original" is determined by which file was found first. The script sorts the files by name first, before it de-dupes them.
In this example bFile1.EXT would be considered the original and bFile2.EXT will be renamed to aFile1_EXT_bFile2[DEDUPED].EXT. This makes dupes appear right after the original, if you sort the directory by file name. To be able to filter the dupes to copy/move them away or to delete them, use the copy, move or del command in MS DOS. For example "DEL *[DEDUPED].*" would delete all duplicate files found and renamed by the script.
More New Stuff
The script creates two files by default in the processed directory:
- "!DeDupe-FileList.txt" - a list of all files in the directory and their MD5 Check Sum Values (tab separated)
- "!DeDupeLog.txt" - a processing log file where you can find the list of dupes that were detected, their old & new file name and the corresponding original file
If you do not want any of the files to be created, change the options for "WriteFileList" and "WriteDeDupeLog" to "0" in the beginning of the code of "DedupeFilesInFolder.vbs". Alternatively use the command line options:
”/log:[0/1]“ and “/list:[0/1]“ to turn the creation of the list and/or log on/off.
You can also suppress all dialogs via the command line option “/quite:[0/1]”.
“/quite:1” would disable the progress dialog, results message and all error messages.
Note, the script returns error levels for batch processing regardless of the "quiet" settings.
The Error Level codes are:
0 = Script Ran Successful
1 = Script Ran, but there were no files to process
2 = The script was aborted (only relevant if progress dialog is on)
4 = Script Error (md5sum.exe not or processing path not found)
Also new, a nice progress dialog using MS Internet Explorer and an extended results message box. Here are some screen shots of the new and updated windows.
Installation/De-Installation - Download
Download this small 36 KB ZIP file with the name Roy-DeDupeScript11b.zip and extract the archive to a folder on your local hard drive. The ZIP archive contains the following EIGHT files:
File_ID.diz and Readme.txt are simply text files, md5sum.exe is a 3rd party command line utility that was developed by somebody else and is being needed for my script. The two .BAT files and two .REG are only needed for the installation and de-installation of the tool. The .VBS file is the main tool script written in Visual Basic Script (VBScript).
Use the provided Batch Scripts "DeDupeInstall.bat" and "DeDupeUnInstall.bat" to install or un-install the De-Dupe Shell Extension.
Installation
Double click on the Batch Script File "DeDupeInstall.bat"
Thats it.
Notes:
The install batch file copies md5sum.exe and DedupeFilesInFolder.vbs into your System32 directory under your windows installation directory and Imports the registry file "DedupeInstall.reg" into your systems registry database. It creates entries under the Registry Key:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Directory\shell\
Non of the files in the installation directory will be needed anymore to run the script itself. You will need them only to uninstall the tool or to re-install it again, if necessary.
De-installation
Double click on the Batch Script File "DeDupeUnInstall.bat"
The Un-Install batch file deletes the three files from your System32 directory and utilizes the registry file DedupeUnInstall.reg to remove the entries for the script from your systems registry database. If you want to continue to use md5sum.exe and only want to disable the shell extension, either simply double click on the file DedupeUnInstall.reg without executing the uninstall batch file (the script DedupeFilesInFolder.vbs will remain in your System32 folder though) or you can copy the tool back into your system folder manually after you ran the uninstall batch file.
About the Software
The De-Dupe Windows Explorer Shell Extension Script Tool is written in VBScript and is executed by the system tool WScript.exe. The De-Dupe script (DedupeFilesInFolder.vbs) uses a small support tool that it requires to work properly.
"md5sum.exe" is a small command line tool that return the MD5 Check Sum value for a file.
It can also validate MD5 check sums, which is a feature that is not used by the De-Dupe script.
You can find out more information about it at http://etree.org/md5com.html
Md5Sum was written by bruce@gridpoint.com
Legal Stuff/Copyright and Disclaimer
The 3rd party tool that come with the De-Dupe script is freeware and can be used and copied by anybody without the need of a license or to pay a fee. Since I did not write that tool, I cannot take any responsibility for any issues that they might cause by it, via my script or without out.
This De-Dupe script is also freeware and can be used, copied and modified for free,
Important Disclaimer!
The author, of this software accepts no responsibility for damages resulting from the use of this product and
makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose.
This software is provided "AS IS", and you, its user, assume all risks when using it.
Depending if I have the time and the urge to extend on the script, new features might get added to this tool in the future. I could envision additional configuration options and alternative options for what to do with duplicate files that were found by the script. Since you are free to do modifications to the script yourself and improve on it, I would appreciate, if you would let me know and send me the enhanced version of yours, if you decide to take matters into your own hands. :)
Change Log
V1.1
Source Code
Here is the Source Code of the script. After that is also the code of the install and uninstall batch files and registry key values and settings.
Again, you can download the whole code and the 3rd part command line tool md5sum.exe in a single ZIP archive called Roy-DeDupeScript11b. You may need a ZIP extracting utility, although Windows XP and later should be able to open the file without the need to install additional software. However, if this does not work for any reasons, either download the commercial program WinZip at this web site or download and install the open source Zip and other archiver’s processing tool called PeaZip.
DedupeFilesInFolder.vbs Source Code
1: =======================================================================
2: Parameters you might want to change 3: =======================================================================
4: Specifies the action to take if dupes are found, used by DupeHandling
5: Values other than 1 are not supported/implemented yet
6: Dim DupeAction: DupeAction = 1 7: 8: If you want to suppress the progress dialog, the results message popup 9: and Error Messages set bQuiet = true
10: Note, the script returns error levels for batch processing regardless
11: of the bQuiet settings. The ErrorLevel codes are:
12: 0 = Script Ran Successful 13: 1 = Script Ran, but there were no files to process
14: 2 = The script was aborted (only relevant if progress dialog is on)
15: 4 = Script Error (md5sum.exe not or processing path not found)
16: Dim bQuiet: bQuiet = false 17: 18: LOGFILES 19: Set to 1 to generate file list with name and md5 sum, set to 0 to disable
20: Dim WriteFileList: WriteFileList = 1 21: File name for the file list. File is saved in processing path folder
22: Dim FileListFName: FileListFName = "!DeDupe-FileList.txt"
23: 24: Write a log file with all dupes that were processed
25: Dim WriteDeDupeLog: WriteDeDupeLog = 1 26: File name for the file list. File is saved in processing path folder
27: Dim DeDupeLogFName: DeDupeLogFName = "!DeDupeLog.txt"
28: 29: =======================================================================
30: Dont touch the stuff below this line, unless you know what
31: you are doing. 32: =======================================================================
33: Dim oFso: set oFso = Wscript.createobject("scripting.fileSystemObject") 34: Dim oFolder, oFiles, oFile, oLogFile 35: Dim iCounter: iCounter = 0 File Counter 36: Dim sFolderPath Work Folder Path (Current folder)
37: Dim arguments: Set arguments = Wscript.arguments 38: Dim md5sumPath,sErr, sMsg, sMD5CS 39: Dim MyArray() Array with MD5 Checksum and File Names 40: Dim iDupCnt: iDupCnt = 0 Counter for Dupes
41: Dim iDupErr: iDupErr = 0 Error Count for Dupe Action
42: Dim iMD5SumErr: iMD5SumErr = 0 Error Count for MD5 Check Sum Calc
43: Dim iFilesCnt: iFilesCnt = 0 Number of files proc 44: Dim iFilesProc:iFilesProc = 0 Number of files processed (iFilesCnt - iMD5SumErr)
45: Dim iDupProc Dupe Processed Count (iDupCnt - iDupErr) 46: 47: global const and vars for Statusbar
48: Const conBarSpeed = 80 49: Const conForcedTimeOut = 900000 50: Dim objIE 51: Dim objProgressBar 52: Dim objTextLine1 53: Dim objTextLine2 54: Dim objQuitFlag 55: 56: Dim bAbort: bAbort = false 57: 58: System Constants 59: Const SYSTEM_FOLDER = 1, TEMP_FOLDER = 2 60: Const ForAppending = 8 61: Const ForReading = 1 62: Const ForWriting = 2 63: 64: ========================================================================
65: Initialization of Work Environment 66: 67: if arguments.Named.Exists("quiet") then
68: 69: if arguments.Named.Item("quiet") = 1 then
70: bQuiet = true
71: end if
72: 73: if arguments.Named.Item("quiet") = 0 then
74: bQuiet = false
75: end if
76: 77: end if
78: 79: if arguments.Named.Exists("list") then
80: 81: if arguments.Named.Item("list") = 0 or arguments.Named.Item("list") = 1 then
82: WriteFileList = arguments.Named.Item("list")
83: end if
84: 85: end if
86: 87: if arguments.Named.Exists("log") then
88: 89: if arguments.Named.Item("log") = 0 or arguments.Named.Item("log") = 1 then
90: WriteDeDupeLog = arguments.Named.Item("log")
91: end if
92: 93: end if
94: 95: Check for command line paramater passed
96: 97: if arguments.unnamed.count = 0 then 98: Set Path to Current Path 99: sFolderPath = ofso.GetAbsolutePathName(".")
100: else
101: Set Path to folder that was passed as argument for the script call
102: sFolderPath = arguments.unnamed(0) 103: end if 104: 105: Make sure that 3rd party tools md5sum.exe and touch.exe are either in
106: the System32 directory or the current path (I dont check the whole Path Env)
107: md5sumPath = oFso.BuildPath(oFso.GetSpecialFolder(SYSTEM_FOLDER), "md5sum.exe")
108: 109: if not oFso.FileExists(md5sumPath) then
110: md5sumPath = oFso.BuildPath(oFso.GetAbsolutePathName("."), "md5sum.exe")
111: 112: if not oFso.FileExists(md5sumPath) then
113: sErr = sErr & "Md5sum.exe not found in " & oFso.GetSpecialFolder(SYSTEM_FOLDER) & _
114: " nor " & oFso.GetAbsolutePathName(".") & vbcrlf & vbcrlf
115: end if
116: 117: end if
118: 119: Make sure that the folder (especially the ones passed as Param) exists
120: 121: if not oFso.FolderExists(sFolderPath) then 122: sErr = sErr & "Processing Folder: " & sFolderPath & _ 123: " does not exist." & vbcrlf & vbcrlf 124: end if 125: 126: If something is not right, show error and abort the script
127: 128: if sErr <> "" then
129: if bQuiet = false then Wscript.echo sErr
130: CleanUpAndQuit 4 131: end if
132: 133: Dim sLogOutput: sLogOutput = oFso.BuildPath(sFolderPath,DeDupeLogFName) 134: 135: Okay.. Lets get started
136: ------------------------------------------------------------------------
137: 138: Set oFolder = oFso.GetFolder(sFolderPath) 139: Set oFiles = oFolder.Files 140: iFilesCnt = oFiles.count 141: 142: if iFilesCnt > 0 then 143: ReDim MyArray(oFiles.count,3) 144: Build 2 Dimensional Array with CheckSum of 145: Filename & File Name itself for all files in
146: current directory. Looking like this
147: (x = dimention 2 and y = dimention 1)
148: the 3rd column is MD5 +[]+ lower case file name for sorting purposes
149: 43a52d14577de0299146aa9f8f0c062f, file1.ext, 43a52d14577de0299146aa9f8f0c062f[]file1.ext
150: 0052d12577de56567546aa9f8f0c0af3, file2.ext, 0052d12577de56567546aa9f8f0c0af3[]file2.ext 151: 152: if bQuiet = false then
153: Launch Status Bar
154: StartIE "De-Dupeing Files in " & sFolderPath 155: SetLine1 "Step 1/4: Reading Files and MD5 Check Sums. Path:" & sFolderPath 156: end if 157: 158: For each oFile in oFiles 159: iCounter = iCounter + 1 160: sMD5CS = GetMd5Sum(oFile.name) 161: MyArray(iCounter - 1,0) = sMD5CS 162: MyArray(iCounter - 1,1) = oFile.name 163: MyArray(iCounter - 1,2) = sMD5CS & "[]" & lcase(oFile.name) 164: Check if Abort Button was pressed
165: 166: if bQuiet = false then
167: 168: If IsQuit() = True Then 169: bAbort = true
170: Exit For 171: End If 172: 173: Set Status Bar Value
174: SetLine2 "Files Processed: " & CStr(iCounter) & " of " & cstr(iFilesCnt) 175: end if 176: 177: Next 178: 179: end if 180: 181: iCounter = iCounter - 1 182: 183: if bAbort = true and bQuiet = false then 184: Close Status Bar 185: CloseIE 186: end if
187: 188: if iCounter >= 0 and bAbort = false then
189: 190: if bAbort = false then
191: 192: if bQuiet = false then
193: Set Status Bar Value
194: SetLine1 "Step 2/4: Sort Files" 195: SetLine2 "Processing " & cstr(iCounter - 1) & " Files" 196: end if 197: 198: Sort the Array by File Name 199: Call QuickSort(MyArray,0,ubound(MyArray,1),2) 200: 201: if bQuiet = false then
202: Check if Abort Button was pressed
203: 204: If IsQuit() = True Then 205: bAbort = true 206: End If 207: 208: end if 209: 210: end if 211: 212: if WriteFileList = 1 then 213: Write File List out into Text File
214: 215: if bQuiet = false then
216: Set Status Bar Value
217: SetLine1 "Step 3/4: Writing File List" 218: SetLine2 oFso.BuildPath(sFolderPath,FileListFName) 219: end if 220: 221: Call WriteFile(MyArray) 222: Check if Abort Button was pressed
223: 224: if bQuiet = false then
225: 226: If IsQuit() = True Then 227: bAbort = true
228: End If 229: 230: end if
231: 232: end if
233: 234: if bAbort = false then
235: 236: if bQuiet = false then
237: Set Status Bar Value
238: SetLine1 "Step 4/4: Detect and Process Duplicates" 239: SetLine2 "" 240: end if 241: 242: Detect Duplicates 243: Call FindDupes(MyArray) 244: 245: Wrapping up
246: iDupProc = iDupCnt - iDupErr 247: 248: if bQuiet = false then 249: Close Status Bar 250: CloseIE 251: end if
252: 253: sMsg = "Number of Files Found: " & iFilesCnt & vbcrlf & _
254: "Number of MD5 Sum Errors: " & iMD5SumErr & vbcrlf & _
255: "Number of Files Processed: " & iFilesProc & vbcrlf & _
256: "------------------------------------" & vbcrlf & _
257: "Number of Dupes Found: " & iDupCnt & vbcrlf & _
258: "Number of Dupe Processing Errors: " & iDupErr & vbcrlf & _
259: "Number of Dupes Processed: " & iDupProc & vbcrlf
260: 261: ErrorLogWrite "Number of Files Found: " & iFilesCnt
262: ErrorLogWrite "Number of MD5 Sum Errors: " & iMD5SumErr
263: ErrorLogWrite "Number of Files Processed: " & iFilesProc
264: ErrorLogWrite "Number of Dupes Found: " & iDupCnt
265: ErrorLogWrite "Number of Dupe Processing Errors: " & iDupErr
266: ErrorLogWrite "Number of Dupes Processed: " & iDupProc
267: 268: if WriteFileList = 1 then
269: sMsg = sMsg & vbcrlf & "List of Files Generated at:" & vbcrlf & _
270: oFso.BuildPath(sFolderPath,FileListFName) & vbcrlf 271: ErrorLogWrite "List of Files Generated at: " & _
272: oFso.BuildPath(sFolderPath,FileListFName) 273: end if
274: 275: if WriteDeDupeLog = 1 then
276: sMsg = sMsg & vbcrlf & "Log File Generated at: " & vbcrlf & sLogOutput
277: end if
278: 279: if bQuiet = false then
280: WScript.echo sMSg 281: end if
282: 283: else
284: 285: if bQuiet = false then
286: Close Status Bar
287: CloseIE 288: end if 289: 290: end if 291: 292: else 293: 294: if bAbort = false then 295: No Files Found to dedupe 296: 297: if bQuiet = false then
298: Wscript.echo "No Files to de-dupe found in " & sFolderPath
299: end if
300: 301: CleanUpAndQuit 1 302: end if
303: 304: end if
305: 306: if bAbort = true then
307: Aborted Message
308: 309: if bQuiet = false then 310: Wscript.echo "The De-Dupe Script Was abborted." 311: end if 312: 313: CleanUpAndQuit 2 314: end if 315: 316: CleanUpAndQuit 0 317: 318: ============================================================================== 319: Function GetMd5Sum(ByVal strFile) 320: Declare the FileSystemObject object constants and variables.
321: Dim objTS, strTempFile, strCmdLine, objRE 322: 323: With oFso 324: Construct a temporary filename. 325: Do 326: strTempFile = .BuildPath(.GetSpecialFolder(TEMP_FOLDER), "!" & .GetTempName)
327: Loop While .FileExists(strTempFile) 328: 329: Use cmd.exe to construct a command that will execute md5sum.exe
330: strCmdLine = .BuildPath(.GetSpecialFolder(SYSTEM_FOLDER), "cmd.exe") _ 331: & " /c " & md5sumPath & " """ & strFile & """>" & strTempFile 332: 333: End With 334: 335: Execute the command in a hidden window. Wait for the command
336: to complete before continuing.
337: CreateObject("WScript.Shell").Run strCmdLine, 0, True 338: 339: Open the temporary file. 340: s = ""
341: On Error Resume Next 342: Set objTS = oFso.OpenTextFile(strTempFile, 1) 343: s = objTS.ReadAll 344: On Error Goto 0 345: 346: check that it didnt fail and has the checksum
347: 348: if trim(s) <> "" and instr(s," *") > 0 then
349: GetMD5Sum = left(s,instr(s," *") - 1)
350: iFilesProc = iFilesProc + 1 351: else
352: Error... not good
353: iMD5SumErr = iMD5SumErr + 1 354: GetMD5Sum = "" 355: end if 356: 357: objTS.Close 358: oFso.DeleteFile strTempFile 359: End Function 360: 361: ================================================================================== 362: Array Sort Functions
363: Sub SwapRows(ary,row1,row2) 364: == This proc swaps two rows of an array 365: Dim x,tempvar 366: 367: For x = 0 to Ubound(ary,2) 368: tempvar = ary(row1,x) 369: ary(row1,x) = ary(row2,x) 370: ary(row2,x) = tempvar 371: Next 372: 373: End Sub SwapRows
374: Sub QuickSort(vec,loBound,hiBound,SortField) 375: ==--------------------------------------------------------== 376: == Sort a 2 dimensional array on SortField ==
377: == == 378: == This procedure is adapted from the algorithm given in: ==
379: == ~ Data Abstractions & Structures using C++ by ~ ==
380: == ~ Mark Headington and David Riley, pg. 586 ~ ==
381: == Quicksort is the fastest array sorting routine for ==
382: == unordered arrays. Its big O is n log n ==
383: == == 384: == Parameters: ==
385: == vec - array to be sorted == 386: == SortField - The field to sort on (2nd dimension value) ==
387: == loBound and hiBound are simply the upper and lower == 388: == bounds of the arrays 1st dimension. Its probably ==
389: == easiest to use the LBound and UBound functions to == 390: == set these. ==
391: ==--------------------------------------------------------== 392: Dim pivot(),loSwap,hiSwap,temp,counter 393: Redim pivot (Ubound(vec,2)) 394: 395: == Two items to sort
396: 397: if hiBound - loBound = 1 then 398: 399: if vec(loBound,SortField) > vec(hiBound,SortField) _ 400: then Call SwapRows(vec,hiBound,loBound) 401: End If 402: 403: == Three or more items to sort 404: 405: For counter = 0 to Ubound(vec,2) 406: pivot(counter) = vec(int((loBound + hiBound) / 2),counter)
407: vec(int((loBound + hiBound) / 2),counter) = vec(loBound,counter)
408: vec(loBound,counter) = pivot(counter) 409: Next 410: 411: loSwap = loBound + 1 412: hiSwap = hiBound 413: 414: do
415: == Find the right loSwap
416: while loSwap < hiSwap and vec(loSwap,SortField) <= pivot(SortField) 417: loSwap = loSwap + 1 418: wend 419: == Find the right hiSwap 420: while vec(hiSwap,SortField) > pivot(SortField)
421: hiSwap = hiSwap - 1 422: wend 423: == Swap values if loSwap is less then hiSwap
424: if loSwap < hiSwap then Call SwapRows(vec,loSwap,hiSwap) 425: 426: loop while loSwap < hiSwap 427: 428: For counter = 0 to Ubound(vec,2) 429: vec(loBound,counter) = vec(hiSwap,counter) 430: vec(hiSwap,counter) = pivot(counter) 431: Next 432: 433: == Recursively call function .. the beauty of Quicksort 434: == 2 or more items in first section
435: if loBound < (hiSwap - 1) then Call QuickSort(vec,loBound,hiSwap - 1,SortField) 436: == 2 or more items in second section
437: if hiSwap + 1 < hibound then Call QuickSort(vec,hiSwap + 1,hiBound,SortField)
438: 439: End Sub QuickSort
440: Sub PrintArray(vec,lo,hi,mark) 441: ==-----------------------------------------== 442: == Print out an array from the lo bound ==
443: == to the hi bound. Highlight the column == 444: == whose number matches parm mark ==
445: ==-----------------------------------------== 446: Dim i,j 447: sRes = ""
448: 449: For i = lo to hi 450: 451: For j = 0 to Ubound(vec,2) 452: sRes = sRes & vec(i,j) & vbTab & vbTab 453: Next 454: 455: sRes = sRes & vbcrlf 456: Next 457: 458: wscript.echo sRes 459: End Sub 460: 461: ===================================================================================
462: Actual De-Duper Functions 463: Sub FindDupes(Arr) 464: Dim a, b, s, iCnt, sOrg 465: sKey = ""
466: iCnt = Ubound(Arr,1) 467: 468: For a = 0 to iCnt 469: 470: s = trim(Arr(a,0)) 471: 472: if s <> "" then
473: 474: if sKey = "" then
475: First CheckSum Value in Array, Set Key, dont check further
476: sKey = s 477: sOrg = Arr(a,1) 478: else
479: CheckSum from previous file set, check if identical
480: 481: if s = sKey then 482: Dupe 483: DupeHandling s,Key, Arr(a,1), sOrg 484: else
485: Set key to Checksum of new file, because it is different
486: sKey = s 487: sOrg = Arr(a,1) 488: end if 489: 490: end if 491: 492: end if 493: 494: if bQuiet = false then 495: Set Status Bar Value 496: SetLine2 "Files Processed: " & CStr(a + 1) & " of " & cstr(iCnt + 1)
497: 498: Check if Abort Button was pressed
499: 500: If IsQuit() = True Then 501: Exit For 502: bAbort = true 503: End If 504: 505: end if 506: 507: Next 508: 509: End Sub 510: 511: Sub DupeHandling(MD5dupe, MD5Org, FNameDupe, FNameOrg) 512: Here is where You decide what to do with the found duplicate
513: You could for example perform additional checks
514: beyond the MD5 Checksum also 515: Dim sSrc, sOrg, sOrgExt, sOrgBase, sDest, sDestName, sDestExt, sDestBase 516: Increase Dupe Counter
517: iDupCnt = iDupCnt + 1 518: 519: Determine the action to take 520: 521: Select Case DupeAction 522: Case 1 523: Rename Dupe by appending Original File name as prefix with
524: an _ as separator. Also the extension of the original file
525: Full Path of Dupe File
526: sSrc = oFso.BuildPath(sFolderPath,FNameDupe) 527: FUll Path of Org File 528: sOrg = oFso.BuildPath(sFolderPath,FNameOrg) 529: Get Extension of Org File
530: sOrgExt = oFso.GetExtensionName(sOrg) 531: 532: Get Base File Name of Org File without Extension 533: sOrgBase = left(FNameOrg, InStrRev(FNameOrg, "." & sOrgExt, - 1,1) - 1)
534: 535: Build New File name/path for Dupe Path\OrgBase_OrgExt_DupeBase.DupeExt
536: sDestExt = oFso.GetExtensionName(FNameDupe) 537: sDestBase = left(FNameDupe, InStrRev(FNameDupe, "." & sDestExt, - 1,1) - 1) 538: sDestName = sOrgBase & "_" & sOrgExt & "_" & sDestBase & "[DEDUPED]" & "." & sDestExt 539: sDest = oFso.BuildPath(sFolderPath, sDestName) 540: 541: Move 542: 543: if oFso.FileExists(sDest) then
544: New File already exist, cannot rename dupe, Increase Dupe Processing Error Count
545: iDupErr = iDupErr + 1 546: ErrorLogWrite "Rename Failed! Org: " & FNameOrg & ", Dupe Src: , " & _ 547: FNameDupe & ", Dest: " & sDestName 548: Else 549: oFso.MoveFile sSrc, sDest 550: ErrorLogWrite "Dupe Processed! Org: " & FNameOrg & ", Dupe Src: , " & _ 551: FNameDupe & ", Dest: " & sDestName 552: End if 553: 554: Case Else 555: Not implemented yet 556: End Select 557: 558: End Sub 559: 560: ==============================================================
561: Support Funtions 562: Sub WriteFile(arr) 563: Write List of Files with their MD5 Sums to a Text file
564: Dim a loop count 565: 566: Dim f: f = oFso.BuildPath(sFolderPath,FileListFName) 567: Check if an old Listings File Already Exists and Delete it
568: 569: if oFso.FileExists(f) then 570: oFso.DeleteFile f, true 571: end if 572: 573: Dim oF: Set oF = oFso.OpenTextFile(f, ForAppending, true, - 2) 574: 575: File Name + TAB + MD5 Sum of File 576: 577: For a = 0 to Ubound(arr,1) 578: oF.writeline trim(arr(a,1)) & vbtab & trim(arr(a,0)) 579: Next 580: 581: oF.Close 582: Set oF = Nothing 583: End Sub 584: 585: Function ErrorLogWrite(sErrLogMsg) 586: 587: Dim bOpenLog: bOpenLog = false
588: Dim sFullErrMsg 589: 590: if WriteDeDupeLog = 1 then
591: 592: if not isObject(oLogfile) then
593: set oLogfile = nothing 594: end if
595: 596: if not (oLogfile is nothing) then
597: else
598: bOpenLog = true
599: end if
600: 601: if bOpenLog = true then
602: Set oLogfile = oFSO.OpenTextFile(sLogOutput, ForWriting, True, - 2) 603: ErrorLogWrite("----------------------------------------------")
604: ErrorLogWrite("New DeDupe Batch Started")
605: ErrorLogWrite("Work Path: " & sFolderPath)
606: ErrorLogWrite("-----------------------------------------------")
607: end if
608: 609: sFullErrMsg = LogDateFormat(now) & chr(9) & sErrLogMsg 610: 611: oLogFile.Writeline sFullErrMsg 612: 613: end if
614: 615: end function 616: 617: function LogDateFormat(dSourceDate) 618: Const sLogDtNumbers = "0000"
619: Dim sLgDtYYYY, sLgDtMM, sLgDtDD, sLgDtHH, sLgDtNN, sLgDtSS 620: 621: sLgDtYYYY = right(sLogDtNumbers & year(dSourceDate),4) 622: sLgDtMM = right(sLogDtNumbers & month(dSourceDate),2) 623: sLgDtDD = right(sLogDtNumbers & day(dSourceDate),2) 624: sLgDtHH = right(sLogDtNumbers & hour(dSourceDate),2) 625: sLgDtNN = right(sLogDtNumbers & minute(dSourceDate),2) 626: sLgDtSS = right(sLogDtNumbers & second(dSourceDate),2) 627: LogDateFormat = sLgDtYYYY & "-" & sLgDtMM & "-" & sLgDtDD & _
628: " " & sLgDtHH & ":" & sLgDtNN & ":" & sLgDtSS
629: End Function 630: 631: =================================================================
632: Progress Bar Code 633: 634: --------------------------------------------------------
635: Function StartIE 636: Abstract Launch IE Dialog Box and Progress bar
637: Parameters Titel of the box 638: --------------------------------------------------------
639: 640: Private Sub StartIE(strTitel) 641: Dim objDocument 642: Dim objWshShell 643: 644: Set objIE = CreateObject("InternetExplorer.Application") 645: objIE.height = 230 646: objIE.width = 400 647: objIE.menubar = False 648: objIE.toolbar = false 649: objIE.statusbar = false 650: objIE.addressbar = false 651: objIE.resizable = False 652: objIE.navigate ("about:blank") 653: 654: wait till ie is loaded
655: While (objIE.busy) 656: wend 657: 658: set objDocument = objIE.document 659: setup the dialog box
660: WriteHtmlToDialog objDocument, strTitel 661: 662: with ie/html loaded, define assorted objects... 663: set objTextLine1 = objIE.document.all("txtMilestone")
664: set objTextLine2 = objIE.document.all("txtRemarks")
665: Set objProgressBar = objIE.document.all("pbText")
666: set objQuitFlag = objIE.document.Secret.pubFlag 667: 668: objTextLine1.innerTEXT = ""
669: objTextLine2.innerTEXT = ""
670: 671: objIE.document.body.innerHTML = "Building Document..."
672: + "<br>load time= " + n
673: objIE.visible = True 674: 675: set focus to ie
676: Set objWSHShell = WScript.CreateObject("WScript.Shell") 677: objWshShell.AppActivate("Microsoft Internet Explorer") 678: End Sub 679: 680: -------------------------------------------------------- 681: Function CloseIE
682: Abstract Close the IE Browser Windows 683: --------------------------------------------------------
684: 685: Private Function CloseIE() 686: On Error Resume Next 687: objIE.quit 688: End Function 689: 690: -------------------------------------------------------- 691: Function SetLine1
692: Abstract Set Text Line in the Progress Bar Dialog Box
693: Parameters Progress Text
694: -------------------------------------------------------- 695: 696: Private sub SetLine1(sNewText) 697: On Error Resume Next 698: objTextLine1.innerTEXT = sNewText 699: End Sub 700: 701: --------------------------------------------------------
702: Function SetLine2 703: Abstract Set Text Line in the Progress Bar Dialog Box
704: Parameters Progress Text 705: --------------------------------------------------------
706: 707: Private sub SetLine2(sNewText) 708: On Error Resume Next 709: objTextLine2.innerTEXT = sNewText 710: End Sub 711: 712: -------------------------------------------------------- 713: Function IsQuit
714: Abstract Checks if the quit button was pressed
715: Parameters Progress Text
716: -------------------------------------------------------- 717: 718: Private function IsQuit() 719: On Error Resume Next 720: IsQuit = False 721: 722: If objQuitFlag.Value = "quit" Then
723: IsQuit = True 724: End If 725: 726: End function 727: 728: --------------------------------------------------------
729: Function WriteHtmlToDialog 730: Abstract Set HTML Text for the IE Dialog box
731: Parameters IE Document Object, Title Text 732: --------------------------------------------------------
733: 734: Private Sub WriteHtmlToDialog(objDocument, strTitel) 735: objDocument.Open 736: objDocument.Writeln "<title>" & strTitel & "</title> " 737: objDocument.Writeln "<style>" 738: objDocument.Writeln " BODY {background: Silver} BODY { overflow:hidden }" 739: objDocument.Writeln " P.txtStyle {color: Black; font-family: Arial; " _ 740: & " font-size: 10pt; font-weight: normal; margin-left: 10px; " _ 741: & " width: 340px } " 742: objDocument.Writeln " input.pbStyle {color: Navy; font-family: Wingdings; " _ 743: & " font-size: 10pt; background: Silver; height: 20px; " _ 744: & " width: 340px } " 745: objDocument.Writeln "</style>" 746: objDocument.Writeln "<div id=""objProgress"" class=""Outer""></div>" 747: write out text lines...
748: objDocument.Writeln "<P id=txtMilestone class=txtStyle style=margin-left: 10px> </P>"
749: objDocument.Writeln "<P id=txtRemarks class=txtStyle style=margin-left: 10px ></P>"
750: objDocument.Writeln "<CENTER>"
751: write progbar
752: objDocument.Writeln "<input type=text id=pbText class=pbStyle value= >"
753: objDocument.Writeln "<br><br>" space down a little 754: write cancel button...
755: objDocument.Writeln "<input type=button value=Cancel " _
756: & " onclick=SetReturnFlag(""quit"") >"
757: objDocument.Writeln "</CENTER>" 758: write hidden object...
759: objDocument.Writeln "<form name=secret >" _
760: & " <input type=hidden name=pubFlag value=run >" _
761: & "</form>"
762: objDocument.Writeln "<SCRIPT LANGUAGE=VBScript >"
763: write "local script" to handle cmdCancel_Click event...
764: objDocument.Writeln "Sub SetReturnFlag(sFlag)" 765: objDocument.Writeln " secret.pubFlag.Value = sFlag" 766: objDocument.Writeln " txtMileStone.style.color = ""Red"" " 767: objDocument.Writeln " txtRemarks.style.color = ""Red"" " 768: objDocument.Writeln "End Sub" 769: progress bar 770: objDocument.Writeln "Function PctComplete(nPct)"
771: objDocument.Writeln "pbText.Value = String(nPct,"" "") & String(4,""n"")"
772: objDocument.Writeln "End Function"
773: calc progress bar and direction
774: objDocument.Writeln "Sub UpdateProgress()" 775: objDocument.Writeln "Dim intStep" 776: objDocument.Writeln "Dim intDirection" 777: objDocument.Writeln "If (IsNull(objProgress.getAttribute(""Step"")) = True) Then" 778: objDocument.Writeln "intStep = 0" 779: objDocument.Writeln "Else" 780: objDocument.Writeln "intStep = objProgress.Step" 781: objDocument.Writeln "End If" 782: objDocument.Writeln "if (IsNull(objProgress.GetAttribute(""Direction""))=True) Then" 783: objDocument.Writeln "intDirection = 0" 784: objDocument.Writeln "Else" 785: objDocument.Writeln "intDirection = objProgress.Direction" 786: objDocument.Writeln "End If" 787: objDocument.Writeln "if intDirection=0 then" 788: objDocument.Writeln "intStep = intStep + 1" 789: objDocument.Writeln "else" 790: objDocument.Writeln "intStep = intStep - 1" 791: objDocument.Writeln "end if" 792: objDocument.Writeln "Call PctComplete(intStep)" 793: objDocument.Writeln "if intStep>=23 then" 794: objDocument.Writeln "intDirection=1" 795: objDocument.Writeln "end if" 796: objDocument.Writeln "if intStep<=0 then" 797: objDocument.Writeln "intDirection=0" 798: objDocument.Writeln "end if" 799: objDocument.Writeln "objProgress.SetAttribute ""Step"", intStep" 800: objDocument.Writeln "objProgress.SetAttribute ""Direction"", intDirection" 801: objDocument.Writeln "Window.setTimeout GetRef(""UpdateProgress""), " & conBarSpeed 802: objDocument.Writeln "End Sub" 803: timeout function 804: objDocument.Writeln "Sub DialogHardTimeout()"
805: objDocument.Writeln "SetReturnFlag(""quit"")"
806: objDocument.Writeln "End sub" 807: objDocument.Writeln "Sub Window_OnLoad()" 808: objDocument.Writeln "theleft = (screen.availWidth - document.body.clientWidth) / 2" 809: objDocument.Writeln "thetop = (screen.availHeight - document.body.clientHeight) / 2" 810: objDocument.Writeln "window.moveTo theleft,thetop" 811: objDocument.Writeln "Window.setTimeout GetRef(""UpdateProgress""), " & conBarSpeed 812: objDocument.Writeln "Window.setTimeout GetRef(""DialogHardTimeout""), " & conForcedTimeOut 813: objDocument.Writeln "End Sub" 814: objDocument.Writeln "</SCRIPT>" 815: objDocument.Close 816: End Sub 817: 818: Sub CleanUpAndQuit(RetCode) 819: House Cleaning 820: 821: if not isObject(oLogfile) then
822: set oLogfile = nothing 823: end if
824: 825: if not (oLogfile is nothing) then
826: else
827: oLogFile.Close 828: set oLogfile = nothing 829: end if
830: 831: Set oFso = Nothing 832: WScript.Quit(RetCode) 833: 834: End Sub
DeDupeInstall.bat
1: @echo off 2: Cls 3: echo. 4: echo Installing De-dupe Shell Extension and Its Support Tools 5: echo ============================================================= 6: echo. 7: 8: echo 1. Copy md5sum.exe to %SystemRoot%\system32\ 9: copy md5sum.exe %SystemRoot%\system32\md5sum.exe 10: echo. 11: 12: echo 2. Copy DedupeFilesInFolder.vbs to %SystemRoot%\system32\ 13: copy DedupeFilesInFolder.vbs %SystemRoot%\system32\DedupeFilesInFolder.vbs 14: echo. 15: 16: echo 3.Register "DeDupe" Directory Shell Extension
17: regedit /s DedupeInstall.reg 18: echo. 19: 20: echo 4. done... DeDupe Shell Extension Installed Successfully 21: echo. 22: echo. 23: pause DeDupeInstall.reg
Note the default registry value in HEX. This was necessary to be able to add the value:
“wscript.exe %SystemRoot%\system32\DedupeFilesInFolder.vbs” to the registry. Notice the %SystemRoot% environment variable? That needs to be expanded by windows, when you activate the shell extension. If you would create the value as Reg_SZ string, it would not work.
It needs to be an expandable string (because it will be expanded at runtime), REG_Expand_SZ. Expandable string values can only be created by the registry import, if you provide the value of a string in hexadecimal format. This tells Windows to create REG_EXPAND_SZ value instead of the basic REG_SZ.After each character follows the hex value 00 and at the end two more 00 are added as well. Why this is necessary, I have no idea, but this is how you have to do it.
1: Windows Registry Editor Version 5.00 2: 3: [HKEY_CLASSES_ROOT\Directory\shell\DeDupe] 4: @="DeDupe"
5: 6: [HKEY_CLASSES_ROOT\Directory\shell\DeDupe\command] 7: @=hex(2):77,00,73,00,63,00,72,00,69,00,70,00,74,00,2E,00,65,00,78,00,65,00,20,\ 8: 00,25,00,53,00,79,00,73,00,74,00,65,00,6D,00,52,00,6F,00,6F,00,74,00,25,00,\ 9: 5C,00,73,00,79,00,73,00,74,00,65,00,6D,00,33,00,32,00,5C,00,44,00,65,00,64,\ 10: 00,75,00,70,00,65,00,46,00,69,00,6C,00,65,00,73,00,49,00,6E,00,46,00,6F,00,\ 11: 6C,00,64,00,65,00,72,00,2E,00,76,00,62,00,73,00,00,00 DeDupeUnInstall.bat
1: @echo off 2: CLS 3: echo. 4: echo Un-Installing De-dupe Shell Extension and Its Support Tools 5: echo ============================================================= 6: echo. 7: 8: echo 1. Delete md5sum.exe from %SystemRoot%\system32\ 9: if EXIST %SystemRoot%\system32\md5sum.exe del /Q /F "%SystemRoot%\system32\md5sum.exe"
10: echo. 11: 12: echo 2. Delete DedupeFilesInFolder.vbs from %SystemRoot%\system32\ 13: if EXIST %SystemRoot%\system32\DedupeFilesInFolder.vbs del /Q /F
14: "%SystemRoot%\system32\DedupeFilesInFolder.vbs"
15: echo. 16: 17: echo 3. Unregister "DeDupe" Directory Shell Extension
18: regedit /s DedupeUnInstall.reg 19: echo. 20: 21: echo 4. done... DeDupe was Uninstalled Successfully 22: echo. 23: echo. 24: pause DedupeUnInstall.reg
1: Windows Registry Editor Version 5.00 2: 3: [-HKEY_CLASSES_ROOT\Directory\shell\DeDupe] 4: @="DeDupe"
This is another tool created by Carsten Cumbrowski aka Roy/SAC and I hope that you will find this one helpful as well. For comments, praise or complaints, please use the comments section of the blog below. Thanks.
Cheers!
Carsten aka Roy/SAC

No comments:
Post a Comment
Hi, thanks for taking the time to comment at my blog.
Due to spam issues comments are not immediately posted on the site and require my manual approval first, before they become visible.
I try to approve comments as quickly as possible and usually within 24 hours.
To be notified about follow up comments that are made after yours, use the subscribe option with your email address and you will receive an email alert, if somebody else comments at this post in the future.
Also check out the rest of the website beyond this blog, visit RoySAC.com. Also see my YouTube channels, SACReleases for intros and demos.
Cheers!
Carsten aka Roy/SAC
Note: Only a member of this blog may post a comment.