You can skip the background story about the why, when and how I decided to write this little File De-Duping script, if you are not interested in it and jump right the section about how the script works, what it does, where to download it, how to install and de-install it and the source code of it as well. The tool is Freeware, but any donation (money or goods) and/or simply a “Thank You” (e.g. via the comments section at the end of this post) are appreciated nevertheless.
Important Node: I updated the script and this post, because of some bugs that I found and issues with the 3rd party tool “touch.exe”. I had to remove it and come up with another solution for the problem that it solved. I also added some nice stuff, so it was more than just bug fixing :).
The Background Story
Who did not have the problem yet to have hundreds of any type of files, text files, documents, spread sheets, images, videos and others in one directory with the a high chance that you have duplicates, identical files that only have a different file name. I often have to deal with duplicate images that I downloaded from the internet.
I cannot always remember, if I downloaded a particular image already or not and go by the slogan, archive / save / back up first, sort later, because there will be nothing to sort, if you don’t save a copy and go back to the place where you found them at a later time, to learn that the stuff isn’t there anymore for any reason. Maybe even the whole web site is gone the way of the Dodo.
The original file names are often useless, either to generic, like image1.jpg, logo.gif or simply 1.jpg, 2.jpg etc. or they are long and cryptic without any meaning, like: 3104458219_cc0dfd3980_o_d.jpg or something like that. So you end up giving the files your own name and thus make it highly likely that you download the same file at a different time again and give it a different file name than you did the first time around. Voila, duplicate.
I looked at a number of tools and options and found some that were decent, but all of them had always something (or didn’t have something) to make things that should be semi automatic to a manual time consuming ordeal. What I wanted is something that finds duplicates, does not delete them straight out or have me right then go over each dupe found to make a decision about it right away. I also wanted to be able to verify that the dupes found are really identical so it must be easy to know quickly which file is a duplicate of another file.
Non of the solutions that I tried delivered on all those aspects so de-duping of files was painstaking and time consuming, if I did it or I simply would not do it at all and carry a bunch of dupe garbage around .
The idea for this simple script of mine came when I did the sorting and inventory of the SAC art pack releases back in December last year and January this year. I uploaded files, like MODs that I converted to Mpeg-1 Audio Layer 3, short MP3, and other files to my file sharing account at Mediafire.com. I learned that Mediafire.com has a de-duper function build into their system that prevents user from uploading the same file twice to their account. You can upload it a second time, but you will get the message that the file already exists and that your new upload was being deleted. I got that message and first thought that it was an error on Mediafire’s part.
What dupes? There were no dupes, I thought. Yeah, there were some SID Adlib music files that had the same size and all, but there where like a dozen of them, all with a different name and even different prefix, indicating that the files were created by different musicians. Also the dates were different. Being close to send an email to Mediafire customer support, I decided to listen to the songs that could be potential duplicates to the file that was rejected (Mediafire didn’t tell me the file name or shared URL of the original file). And there it was, the same tune with a different file name and different file dates by the same artist.
I was curious about how Mediafire noticed that the files are identical and did some research. They are using the MD5 Check Sum value of a file. Chances are astronomical that two files that are not byte by byte identical will have the same MD5 check sum value. That’s a smart, fast and easy way to find dupes and the idea of writing a de-dupe script that does exactly what I want it to do was born.
This is enough of a background story I believe. Let’s get busy with the Script itself.
How the Script Works
The script detects duplicate files within a directory. Duplicate files are files that have the same MD5 Check Sum value. Two DIFFERENT/NON IDENTICAL files having the same MD5 Check Sum is not impossible, but highly unlikely. This allows the script to detect duplicate files regardless of their file name or other characteristics, such as "date created" or "date modified".
The tool scans all files within a directory. It does not include files in sub directories of the processed folder.
If a duplicate file is found, it will be renamed by by appending the original file name as prefix with an _ as separator, which is also used to replace the "." that indicates the file extension of the original file name (other "." in the file name itself remain). At the end of the file name is the string [DEDUPED] added.
Example
For Example aFile1.EXT and bFile2.EXT are identical. After the script was executed, one of the two files will remain as it is and the other one is being renamed. Which file will be considered the "original" is determined by which file was found first. The script sorts the files by name first, before it de-dupes them.
In this example bFile1.EXT would be considered the original and bFile2.EXT will be renamed to aFile1_EXT_bFile2[DEDUPED].EXT. This makes dupes appear right after the original, if you sort the directory by file name. To be able to filter the dupes to copy/move them away or to delete them, use the copy, move or del command in MS DOS. For example "DEL *[DEDUPED].*" would delete all duplicate files found and renamed by the script.
More New Stuff
The script creates two files by default in the processed directory:
- "!DeDupe-FileList.txt" - a list of all files in the directory and their MD5 Check Sum Values (tab separated)
- "!DeDupeLog.txt" - a processing log file where you can find the list of dupes that were detected, their old & new file name and the corresponding original file
If you do not want any of the files to be created, change the options for "WriteFileList" and "WriteDeDupeLog" to "0" in the beginning of the code of "DedupeFilesInFolder.vbs". Alternatively use the command line options:
”/log:[0/1]“ and “/list:[0/1]“ to turn the creation of the list and/or log on/off.
You can also suppress all dialogs via the command line option “/quite:[0/1]”.
“/quite:1” would disable the progress dialog, results message and all error messages.
Note, the script returns error levels for batch processing regardless of the "quiet" settings.
The Error Level codes are:
0 = Script Ran Successful
1 = Script Ran, but there were no files to process
2 = The script was aborted (only relevant if progress dialog is on)
4 = Script Error (md5sum.exe not or processing path not found)
Also new, a nice progress dialog using MS Internet Explorer and an extended results message box. Here are some screen shots of the new and updated windows.
Installation/De-Installation - Download
Download this small 36 KB ZIP file with the name Roy-DeDupeScript11b.zip and extract the archive to a folder on your local hard drive. The ZIP archive contains the following EIGHT files:
File_ID.diz and Readme.txt are simply text files, md5sum.exe is a 3rd party command line utility that was developed by somebody else and is being needed for my script. The two .BAT files and two .REG are only needed for the installation and de-installation of the tool. The .VBS file is the main tool script written in Visual Basic Script (VBScript).
Use the provided Batch Scripts "DeDupeInstall.bat" and "DeDupeUnInstall.bat" to install or un-install the De-Dupe Shell Extension.
Installation
Double click on the Batch Script File "DeDupeInstall.bat"
Thats it.
Notes:
The install batch file copies md5sum.exe and DedupeFilesInFolder.vbs into your System32 directory under your windows installation directory and Imports the registry file "DedupeInstall.reg" into your systems registry database. It creates entries under the Registry Key:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Directory\shell\
Non of the files in the installation directory will be needed anymore to run the script itself. You will need them only to uninstall the tool or to re-install it again, if necessary.
De-installation
Double click on the Batch Script File "DeDupeUnInstall.bat"
The Un-Install batch file deletes the three files from your System32 directory and utilizes the registry file DedupeUnInstall.reg to remove the entries for the script from your systems registry database. If you want to continue to use md5sum.exe and only want to disable the shell extension, either simply double click on the file DedupeUnInstall.reg without executing the uninstall batch file (the script DedupeFilesInFolder.vbs will remain in your System32 folder though) or you can copy the tool back into your system folder manually after you ran the uninstall batch file.
About the Software
The De-Dupe Windows Explorer Shell Extension Script Tool is written in VBScript and is executed by the system tool WScript.exe. The De-Dupe script (DedupeFilesInFolder.vbs) uses a small support tool that it requires to work properly.
"md5sum.exe" is a small command line tool that return the MD5 Check Sum value for a file.
It can also validate MD5 check sums, which is a feature that is not used by the De-Dupe script.
You can find out more information about it at http://etree.org/md5com.html
Md5Sum was written by bruce@gridpoint.com
Legal Stuff/Copyright and Disclaimer
The 3rd party tool that come with the De-Dupe script is freeware and can be used and copied by anybody without the need of a license or to pay a fee. Since I did not write that tool, I cannot take any responsibility for any issues that they might cause by it, via my script or without out.
This De-Dupe script is also freeware and can be used, copied and modified for free,
Important Disclaimer!
The author, of this software accepts no responsibility for damages resulting from the use of this product and
makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose.
This software is provided "AS IS", and you, its user, assume all risks when using it.
Depending if I have the time and the urge to extend on the script, new features might get added to this tool in the future. I could envision additional configuration options and alternative options for what to do with duplicate files that were found by the script. Since you are free to do modifications to the script yourself and improve on it, I would appreciate, if you would let me know and send me the enhanced version of yours, if you decide to take matters into your own hands. :)
Change Log
V1.1
Source Code
Here is the Source Code of the script. After that is also the code of the install and uninstall batch files and registry key values and settings.
Again, you can download the whole code and the 3rd part command line tool md5sum.exe in a single ZIP archive called Roy-DeDupeScript11b. You may need a ZIP extracting utility, although Windows XP and later should be able to open the file without the need to install additional software. However, if this does not work for any reasons, either download the commercial program WinZip at this web site or download and install the open source Zip and other archiver’s processing tool called PeaZip.
DedupeFilesInFolder.vbs Source Code
1: =======================================================================
2: Parameters you might want to change
3: =======================================================================
4: Specifies the action to take if dupes are found, used by DupeHandling
5: Values other than 1 are not supported/implemented yet
6: Dim DupeAction: DupeAction = 1
7:
8: If you want to suppress the progress dialog, the results message popup
9: and Error Messages set bQuiet = true
10: Note, the script returns error levels for batch processing regardless
11: of the bQuiet settings. The ErrorLevel codes are:
12: 0 = Script Ran Successful
13: 1 = Script Ran, but there were no files to process
14: 2 = The script was aborted (only relevant if progress dialog is on)
15: 4 = Script Error (md5sum.exe not or processing path not found)
16: Dim bQuiet: bQuiet = false
17:
18: LOGFILES
19: Set to 1 to generate file list with name and md5 sum, set to 0 to disable
20: Dim WriteFileList: WriteFileList = 1
21: File name for the file list. File is saved in processing path folder
22: Dim FileListFName: FileListFName = "!DeDupe-FileList.txt"
23:
24: Write a log file with all dupes that were processed
25: Dim WriteDeDupeLog: WriteDeDupeLog = 1
26: File name for the file list. File is saved in processing path folder
27: Dim DeDupeLogFName: DeDupeLogFName = "!DeDupeLog.txt"
28:
29: =======================================================================
30: Dont touch the stuff below this line, unless you know what
31: you are doing.
32: =======================================================================
33: Dim oFso: set oFso = Wscript.createobject("scripting.fileSystemObject")
34: Dim oFolder, oFiles, oFile, oLogFile
35: Dim iCounter: iCounter = 0 File Counter
36: Dim sFolderPath Work Folder Path (Current folder)
37: Dim arguments: Set arguments = Wscript.arguments
38: Dim md5sumPath,sErr, sMsg, sMD5CS
39: Dim MyArray() Array with MD5 Checksum and File Names
40: Dim iDupCnt: iDupCnt = 0 Counter for Dupes
41: Dim iDupErr: iDupErr = 0 Error Count for Dupe Action
42: Dim iMD5SumErr: iMD5SumErr = 0 Error Count for MD5 Check Sum Calc
43: Dim iFilesCnt: iFilesCnt = 0 Number of files proc
44: Dim iFilesProc:iFilesProc = 0 Number of files processed (iFilesCnt - iMD5SumErr)
45: Dim iDupProc Dupe Processed Count (iDupCnt - iDupErr)
46:
47: global const and vars for Statusbar
48: Const conBarSpeed = 80
49: Const conForcedTimeOut = 900000
50: Dim objIE
51: Dim objProgressBar
52: Dim objTextLine1
53: Dim objTextLine2
54: Dim objQuitFlag
55:
56: Dim bAbort: bAbort = false
57:
58: System Constants
59: Const SYSTEM_FOLDER = 1, TEMP_FOLDER = 2
60: Const ForAppending = 8
61: Const ForReading = 1
62: Const ForWriting = 2
63:
64: ========================================================================
65: Initialization of Work Environment
66:
67: if arguments.Named.Exists("quiet") then
68:
69: if arguments.Named.Item("quiet") = 1 then
70: bQuiet = true
71: end if
72:
73: if arguments.Named.Item("quiet") = 0 then
74: bQuiet = false
75: end if
76:
77: end if
78:
79: if arguments.Named.Exists("list") then
80:
81: if arguments.Named.Item("list") = 0 or arguments.Named.Item("list") = 1 then
82: WriteFileList = arguments.Named.Item("list")
83: end if
84:
85: end if
86:
87: if arguments.Named.Exists("log") then
88:
89: if arguments.Named.Item("log") = 0 or arguments.Named.Item("log") = 1 then
90: WriteDeDupeLog = arguments.Named.Item("log")
91: end if
92:
93: end if
94:
95: Check for command line paramater passed
96:
97: if arguments.unnamed.count = 0 then
98: Set Path to Current Path
99: sFolderPath = ofso.GetAbsolutePathName(".")
100: else
101: Set Path to folder that was passed as argument for the script call
102: sFolderPath = arguments.unnamed(0)
103: end if
104:
105: Make sure that 3rd party tools md5sum.exe and touch.exe are either in
106: the System32 directory or the current path (I dont check the whole Path Env)
107: md5sumPath = oFso.BuildPath(oFso.GetSpecialFolder(SYSTEM_FOLDER), "md5sum.exe")
108:
109: if not oFso.FileExists(md5sumPath) then
110: md5sumPath = oFso.BuildPath(oFso.GetAbsolutePathName("."), "md5sum.exe")
111:
112: if not oFso.FileExists(md5sumPath) then
113: sErr = sErr & "Md5sum.exe not found in " & oFso.GetSpecialFolder(SYSTEM_FOLDER) & _
114: " nor " & oFso.GetAbsolutePathName(".") & vbcrlf & vbcrlf
115: end if
116:
117: end if
118:
119: Make sure that the folder (especially the ones passed as Param) exists
120:
121: if not oFso.FolderExists(sFolderPath) then
122: sErr = sErr & "Processing Folder: " & sFolderPath & _
123: " does not exist." & vbcrlf & vbcrlf
124: end if
125:
126: If something is not right, show error and abort the script
127:
128: if sErr <> "" then
129: if bQuiet = false then Wscript.echo sErr
130: CleanUpAndQuit 4
131: end if
132:
133: Dim sLogOutput: sLogOutput = oFso.BuildPath(sFolderPath,DeDupeLogFName)
134:
135: Okay.. Lets get started
136: ------------------------------------------------------------------------
137:
138: Set oFolder = oFso.GetFolder(sFolderPath)
139: Set oFiles = oFolder.Files
140: iFilesCnt = oFiles.count
141:
142: if iFilesCnt > 0 then
143: ReDim MyArray(oFiles.count,3)
144: Build 2 Dimensional Array with CheckSum of
145: Filename & File Name itself for all files in
146: current directory. Looking like this
147: (x = dimention 2 and y = dimention 1)
148: the 3rd column is MD5 +[]+ lower case file name for sorting purposes
149: 43a52d14577de0299146aa9f8f0c062f, file1.ext, 43a52d14577de0299146aa9f8f0c062f[]file1.ext
150: 0052d12577de56567546aa9f8f0c0af3, file2.ext, 0052d12577de56567546aa9f8f0c0af3[]file2.ext
151:
152: if bQuiet = false then
153: Launch Status Bar
154: StartIE "De-Dupeing Files in " & sFolderPath
155: SetLine1 "Step 1/4: Reading Files and MD5 Check Sums. Path:" & sFolderPath
156: end if
157:
158: For each oFile in oFiles
159: iCounter = iCounter + 1
160: sMD5CS = GetMd5Sum(oFile.name)
161: MyArray(iCounter - 1,0) = sMD5CS
162: MyArray(iCounter - 1,1) = oFile.name
163: MyArray(iCounter - 1,2) = sMD5CS & "[]" & lcase(oFile.name)
164: Check if Abort Button was pressed
165:
166: if bQuiet = false then
167:
168: If IsQuit() = True Then
169: bAbort = true
170: Exit For
171: End If
172:
173: Set Status Bar Value
174: SetLine2 "Files Processed: " & CStr(iCounter) & " of " & cstr(iFilesCnt)
175: end if
176:
177: Next
178:
179: end if
180:
181: iCounter = iCounter - 1
182:
183: if bAbort = true and bQuiet = false then
184: Close Status Bar
185: CloseIE
186: end if
187:
188: if iCounter >= 0 and bAbort = false then
189:
190: if bAbort = false then
191:
192: if bQuiet = false then
193: Set Status Bar Value
194: SetLine1 "Step 2/4: Sort Files"
195: SetLine2 "Processing " & cstr(iCounter - 1) & " Files"
196: end if
197:
198: Sort the Array by File Name
199: Call QuickSort(MyArray,0,ubound(MyArray,1),2)
200:
201: if bQuiet = false then
202: Check if Abort Button was pressed
203:
204: If IsQuit() = True Then
205: bAbort = true
206: End If
207:
208: end if
209:
210: end if
211:
212: if WriteFileList = 1 then
213: Write File List out into Text File
214:
215: if bQuiet = false then
216: Set Status Bar Value
217: SetLine1 "Step 3/4: Writing File List"
218: SetLine2 oFso.BuildPath(sFolderPath,FileListFName)
219: end if
220:
221: Call WriteFile(MyArray)
222: Check if Abort Button was pressed
223:
224: if bQuiet = false then
225:
226: If IsQuit() = True Then
227: bAbort = true
228: End If
229:
230: end if
231:
232: end if
233:
234: if bAbort = false then
235:
236: if bQuiet = false then
237: Set Status Bar Value
238: SetLine1 "Step 4/4: Detect and Process Duplicates"
239: SetLine2 ""
240: end if
241:
242: Detect Duplicates
243: Call FindDupes(MyArray)
244:
245: Wrapping up
246: iDupProc = iDupCnt - iDupErr
247:
248: if bQuiet = false then
249: Close Status Bar
250: CloseIE
251: end if
252:
253: sMsg = "Number of Files Found: " & iFilesCnt & vbcrlf & _
254: "Number of MD5 Sum Errors: " & iMD5SumErr & vbcrlf & _
255: "Number of Files Processed: " & iFilesProc & vbcrlf & _
256: "------------------------------------" & vbcrlf & _
257: "Number of Dupes Found: " & iDupCnt & vbcrlf & _
258: "Number of Dupe Processing Errors: " & iDupErr & vbcrlf & _
259: "Number of Dupes Processed: " & iDupProc & vbcrlf
260:
261: ErrorLogWrite "Number of Files Found: " & iFilesCnt
262: ErrorLogWrite "Number of MD5 Sum Errors: " & iMD5SumErr
263: ErrorLogWrite "Number of Files Processed: " & iFilesProc
264: ErrorLogWrite "Number of Dupes Found: " & iDupCnt
265: ErrorLogWrite "Number of Dupe Processing Errors: " & iDupErr
266: ErrorLogWrite "Number of Dupes Processed: " & iDupProc
267:
268: if WriteFileList = 1 then
269: sMsg = sMsg & vbcrlf & "List of Files Generated at:" & vbcrlf & _
270: oFso.BuildPath(sFolderPath,FileListFName) & vbcrlf
271: ErrorLogWrite "List of Files Generated at: " & _
272: oFso.BuildPath(sFolderPath,FileListFName)
273: end if
274:
275: if WriteDeDupeLog = 1 then
276: sMsg = sMsg & vbcrlf & "Log File Generated at: " & vbcrlf & sLogOutput
277: end if
278:
279: if bQuiet = false then
280: WScript.echo sMSg
281: end if
282:
283: else
284:
285: if bQuiet = false then
286: Close Status Bar
287: CloseIE
288: end if
289:
290: end if
291:
292: else
293:
294: if bAbort = false then
295: No Files Found to dedupe
296:
297: if bQuiet = false then
298: Wscript.echo "No Files to de-dupe found in " & sFolderPath
299: end if
300:
301: CleanUpAndQuit 1
302: end if
303:
304: end if
305:
306: if bAbort = true then
307: Aborted Message
308:
309: if bQuiet = false then
310: Wscript.echo "The De-Dupe Script Was abborted."
311: end if
312:
313: CleanUpAndQuit 2
314: end if
315:
316: CleanUpAndQuit 0
317:
318: ==============================================================================
319: Function GetMd5Sum(ByVal strFile)
320: Declare the FileSystemObject object constants and variables.
321: Dim objTS, strTempFile, strCmdLine, objRE
322:
323: With oFso
324: Construct a temporary filename.
325: Do
326: strTempFile = .BuildPath(.GetSpecialFolder(TEMP_FOLDER), "!" & .GetTempName)
327: Loop While .FileExists(strTempFile)
328:
329: Use cmd.exe to construct a command that will execute md5sum.exe
330: strCmdLine = .BuildPath(.GetSpecialFolder(SYSTEM_FOLDER), "cmd.exe") _
331: & " /c " & md5sumPath & " """ & strFile & """>" & strTempFile
332:
333: End With
334:
335: Execute the command in a hidden window. Wait for the command
336: to complete before continuing.
337: CreateObject("WScript.Shell").Run strCmdLine, 0, True
338:
339: Open the temporary file.
340: s = ""
341: On Error Resume Next
342: Set objTS = oFso.OpenTextFile(strTempFile, 1)
343: s = objTS.ReadAll
344: On Error Goto 0
345:
346: check that it didnt fail and has the checksum
347:
348: if trim(s) <> "" and instr(s," *") > 0 then
349: GetMD5Sum = left(s,instr(s," *") - 1)
350: iFilesProc = iFilesProc + 1
351: else
352: Error... not good
353: iMD5SumErr = iMD5SumErr + 1
354: GetMD5Sum = ""
355: end if
356:
357: objTS.Close
358: oFso.DeleteFile strTempFile
359: End Function
360:
361: ==================================================================================
362: Array Sort Functions
363: Sub SwapRows(ary,row1,row2)
364: == This proc swaps two rows of an array
365: Dim x,tempvar
366:
367: For x = 0 to Ubound(ary,2)
368: tempvar = ary(row1,x)
369: ary(row1,x) = ary(row2,x)
370: ary(row2,x) = tempvar
371: Next
372:
373: End Sub SwapRows
374: Sub QuickSort(vec,loBound,hiBound,SortField)
375: ==--------------------------------------------------------==
376: == Sort a 2 dimensional array on SortField ==
377: == ==
378: == This procedure is adapted from the algorithm given in: ==
379: == ~ Data Abstractions & Structures using C++ by ~ ==
380: == ~ Mark Headington and David Riley, pg. 586 ~ ==
381: == Quicksort is the fastest array sorting routine for ==
382: == unordered arrays. Its big O is n log n ==
383: == ==
384: == Parameters: ==
385: == vec - array to be sorted ==
386: == SortField - The field to sort on (2nd dimension value) ==
387: == loBound and hiBound are simply the upper and lower ==
388: == bounds of the arrays 1st dimension. Its probably ==
389: == easiest to use the LBound and UBound functions to ==
390: == set these. ==
391: ==--------------------------------------------------------==
392: Dim pivot(),loSwap,hiSwap,temp,counter
393: Redim pivot (Ubound(vec,2))
394:
395: == Two items to sort
396:
397: if hiBound - loBound = 1 then
398:
399: if vec(loBound,SortField) > vec(hiBound,SortField) _
400: then Call SwapRows(vec,hiBound,loBound)
401: End If
402:
403: == Three or more items to sort
404:
405: For counter = 0 to Ubound(vec,2)
406: pivot(counter) = vec(int((loBound + hiBound) / 2),counter)
407: vec(int((loBound + hiBound) / 2),counter) = vec(loBound,counter)
408: vec(loBound,counter) = pivot(counter)
409: Next
410:
411: loSwap = loBound + 1
412: hiSwap = hiBound
413:
414: do
415: == Find the right loSwap
416: while loSwap < hiSwap and vec(loSwap,SortField) <= pivot(SortField)
417: loSwap = loSwap + 1
418: wend
419: == Find the right hiSwap
420: while vec(hiSwap,SortField) > pivot(SortField)
421: hiSwap = hiSwap - 1
422: wend
423: == Swap values if loSwap is less then hiSwap
424: if loSwap < hiSwap then Call SwapRows(vec,loSwap,hiSwap)
425:
426: loop while loSwap < hiSwap
427:
428: For counter = 0 to Ubound(vec,2)
429: vec(loBound,counter) = vec(hiSwap,counter)
430: vec(hiSwap,counter) = pivot(counter)
431: Next
432:
433: == Recursively call function .. the beauty of Quicksort
434: == 2 or more items in first section
435: if loBound < (hiSwap - 1) then Call QuickSort(vec,loBound,hiSwap - 1,SortField)
436: == 2 or more items in second section
437: if hiSwap + 1 < hibound then Call QuickSort(vec,hiSwap + 1,hiBound,SortField)
438:
439: End Sub QuickSort
440: Sub PrintArray(vec,lo,hi,mark)
441: ==-----------------------------------------==
442: == Print out an array from the lo bound ==
443: == to the hi bound. Highlight the column ==
444: == whose number matches parm mark ==
445: ==-----------------------------------------==
446: Dim i,j
447: sRes = ""
448:
449: For i = lo to hi
450:
451: For j = 0 to Ubound(vec,2)
452: sRes = sRes & vec(i,j) & vbTab & vbTab
453: Next
454:
455: sRes = sRes & vbcrlf
456: Next
457:
458: wscript.echo sRes
459: End Sub
460:
461: ===================================================================================
462: Actual De-Duper Functions
463: Sub FindDupes(Arr)
464: Dim a, b, s, iCnt, sOrg
465: sKey = ""
466: iCnt = Ubound(Arr,1)
467:
468: For a = 0 to iCnt
469:
470: s = trim(Arr(a,0))
471:
472: if s <> "" then
473:
474: if sKey = "" then
475: First CheckSum Value in Array, Set Key, dont check further
476: sKey = s
477: sOrg = Arr(a,1)
478: else
479: CheckSum from previous file set, check if identical
480:
481: if s = sKey then
482: Dupe
483: DupeHandling s,Key, Arr(a,1), sOrg
484: else
485: Set key to Checksum of new file, because it is different
486: sKey = s
487: sOrg = Arr(a,1)
488: end if
489:
490: end if
491:
492: end if
493:
494: if bQuiet = false then
495: Set Status Bar Value
496: SetLine2 "Files Processed: " & CStr(a + 1) & " of " & cstr(iCnt + 1)
497:
498: Check if Abort Button was pressed
499:
500: If IsQuit() = True Then
501: Exit For
502: bAbort = true
503: End If
504:
505: end if
506:
507: Next
508:
509: End Sub
510:
511: Sub DupeHandling(MD5dupe, MD5Org, FNameDupe, FNameOrg)
512: Here is where You decide what to do with the found duplicate
513: You could for example perform additional checks
514: beyond the MD5 Checksum also
515: Dim sSrc, sOrg, sOrgExt, sOrgBase, sDest, sDestName, sDestExt, sDestBase
516: Increase Dupe Counter
517: iDupCnt = iDupCnt + 1
518:
519: Determine the action to take
520:
521: Select Case DupeAction
522: Case 1
523: Rename Dupe by appending Original File name as prefix with
524: an _ as separator. Also the extension of the original file
525: Full Path of Dupe File
526: sSrc = oFso.BuildPath(sFolderPath,FNameDupe)
527: FUll Path of Org File
528: sOrg = oFso.BuildPath(sFolderPath,FNameOrg)
529: Get Extension of Org File
530: sOrgExt = oFso.GetExtensionName(sOrg)
531:
532: Get Base File Name of Org File without Extension
533: sOrgBase = left(FNameOrg, InStrRev(FNameOrg, "." & sOrgExt, - 1,1) - 1)
534:
535: Build New File name/path for Dupe Path\OrgBase_OrgExt_DupeBase.DupeExt
536: sDestExt = oFso.GetExtensionName(FNameDupe)
537: sDestBase = left(FNameDupe, InStrRev(FNameDupe, "." & sDestExt, - 1,1) - 1)
538: sDestName = sOrgBase & "_" & sOrgExt & "_" & sDestBase & "[DEDUPED]" & "." & sDestExt
539: sDest = oFso.BuildPath(sFolderPath, sDestName)
540:
541: Move
542:
543: if oFso.FileExists(sDest) then
544: New File already exist, cannot rename dupe, Increase Dupe Processing Error Count
545: iDupErr = iDupErr + 1
546: ErrorLogWrite "Rename Failed! Org: " & FNameOrg & ", Dupe Src: , " & _
547: FNameDupe & ", Dest: " & sDestName
548: Else
549: oFso.MoveFile sSrc, sDest
550: ErrorLogWrite "Dupe Processed! Org: " & FNameOrg & ", Dupe Src: , " & _
551: FNameDupe & ", Dest: " & sDestName
552: End if
553:
554: Case Else
555: Not implemented yet
556: End Select
557:
558: End Sub
559:
560: ==============================================================
561: Support Funtions
562: Sub WriteFile(arr)
563: Write List of Files with their MD5 Sums to a Text file
564: Dim a loop count
565:
566: Dim f: f = oFso.BuildPath(sFolderPath,FileListFName)
567: Check if an old Listings File Already Exists and Delete it
568:
569: if oFso.FileExists(f) then
570: oFso.DeleteFile f, true
571: end if
572:
573: Dim oF: Set oF = oFso.OpenTextFile(f, ForAppending, true, - 2)
574:
575: File Name + TAB + MD5 Sum of File
576:
577: For a = 0 to Ubound(arr,1)
578: oF.writeline trim(arr(a,1)) & vbtab & trim(arr(a,0))
579: Next
580:
581: oF.Close
582: Set oF = Nothing
583: End Sub
584:
585: Function ErrorLogWrite(sErrLogMsg)
586:
587: Dim bOpenLog: bOpenLog = false
588: Dim sFullErrMsg
589:
590: if WriteDeDupeLog = 1 then
591:
592: if not isObject(oLogfile) then
593: set oLogfile = nothing
594: end if
595:
596: if not (oLogfile is nothing) then
597: else
598: bOpenLog = true
599: end if
600:
601: if bOpenLog = true then
602: Set oLogfile = oFSO.OpenTextFile(sLogOutput, ForWriting, True, - 2)
603: ErrorLogWrite("----------------------------------------------")
604: ErrorLogWrite("New DeDupe Batch Started")
605: ErrorLogWrite("Work Path: " & sFolderPath)
606: ErrorLogWrite("-----------------------------------------------")
607: end if
608:
609: sFullErrMsg = LogDateFormat(now) & chr(9) & sErrLogMsg
610:
611: oLogFile.Writeline sFullErrMsg
612:
613: end if
614:
615: end function
616:
617: function LogDateFormat(dSourceDate)
618: Const sLogDtNumbers = "0000"
619: Dim sLgDtYYYY, sLgDtMM, sLgDtDD, sLgDtHH, sLgDtNN, sLgDtSS
620:
621: sLgDtYYYY = right(sLogDtNumbers & year(dSourceDate),4)
622: sLgDtMM = right(sLogDtNumbers & month(dSourceDate),2)
623: sLgDtDD = right(sLogDtNumbers & day(dSourceDate),2)
624: sLgDtHH = right(sLogDtNumbers & hour(dSourceDate),2)
625: sLgDtNN = right(sLogDtNumbers & minute(dSourceDate),2)
626: sLgDtSS = right(sLogDtNumbers & second(dSourceDate),2)
627: LogDateFormat = sLgDtYYYY & "-" & sLgDtMM & "-" & sLgDtDD & _
628: " " & sLgDtHH & ":" & sLgDtNN & ":" & sLgDtSS
629: End Function
630:
631: =================================================================
632: Progress Bar Code
633:
634: --------------------------------------------------------
635: Function StartIE
636: Abstract Launch IE Dialog Box and Progress bar
637: Parameters Titel of the box
638: --------------------------------------------------------
639:
640: Private Sub StartIE(strTitel)
641: Dim objDocument
642: Dim objWshShell
643:
644: Set objIE = CreateObject("InternetExplorer.Application")
645: objIE.height = 230
646: objIE.width = 400
647: objIE.menubar = False
648: objIE.toolbar = false
649: objIE.statusbar = false
650: objIE.addressbar = false
651: objIE.resizable = False
652: objIE.navigate ("about:blank")
653:
654: wait till ie is loaded
655: While (objIE.busy)
656: wend
657:
658: set objDocument = objIE.document
659: setup the dialog box
660: WriteHtmlToDialog objDocument, strTitel
661:
662: with ie/html loaded, define assorted objects...
663: set objTextLine1 = objIE.document.all("txtMilestone")
664: set objTextLine2 = objIE.document.all("txtRemarks")
665: Set objProgressBar = objIE.document.all("pbText")
666: set objQuitFlag = objIE.document.Secret.pubFlag
667:
668: objTextLine1.innerTEXT = ""
669: objTextLine2.innerTEXT = ""
670:
671: objIE.document.body.innerHTML = "Building Document..."
672: + "<br>load time= " + n
673: objIE.visible = True
674:
675: set focus to ie
676: Set objWSHShell = WScript.CreateObject("WScript.Shell")
677: objWshShell.AppActivate("Microsoft Internet Explorer")
678: End Sub
679:
680: --------------------------------------------------------
681: Function CloseIE
682: Abstract Close the IE Browser Windows
683: --------------------------------------------------------
684:
685: Private Function CloseIE()
686: On Error Resume Next
687: objIE.quit
688: End Function
689:
690: --------------------------------------------------------
691: Function SetLine1
692: Abstract Set Text Line in the Progress Bar Dialog Box
693: Parameters Progress Text
694: --------------------------------------------------------
695:
696: Private sub SetLine1(sNewText)
697: On Error Resume Next
698: objTextLine1.innerTEXT = sNewText
699: End Sub
700:
701: --------------------------------------------------------
702: Function SetLine2
703: Abstract Set Text Line in the Progress Bar Dialog Box
704: Parameters Progress Text
705: --------------------------------------------------------
706:
707: Private sub SetLine2(sNewText)
708: On Error Resume Next
709: objTextLine2.innerTEXT = sNewText
710: End Sub
711:
712: --------------------------------------------------------
713: Function IsQuit
714: Abstract Checks if the quit button was pressed
715: Parameters Progress Text
716: --------------------------------------------------------
717:
718: Private function IsQuit()
719: On Error Resume Next
720: IsQuit = False
721:
722: If objQuitFlag.Value = "quit" Then
723: IsQuit = True
724: End If
725:
726: End function
727:
728: --------------------------------------------------------
729: Function WriteHtmlToDialog
730: Abstract Set HTML Text for the IE Dialog box
731: Parameters IE Document Object, Title Text
732: --------------------------------------------------------
733:
734: Private Sub WriteHtmlToDialog(objDocument, strTitel)
735: objDocument.Open
736: objDocument.Writeln "<title>" & strTitel & "</title> "
737: objDocument.Writeln "<style>"
738: objDocument.Writeln " BODY {background: Silver} BODY { overflow:hidden }"
739: objDocument.Writeln " P.txtStyle {color: Black; font-family: Arial; " _
740: & " font-size: 10pt; font-weight: normal; margin-left: 10px; " _
741: & " width: 340px } "
742: objDocument.Writeln " input.pbStyle {color: Navy; font-family: Wingdings; " _
743: & " font-size: 10pt; background: Silver; height: 20px; " _
744: & " width: 340px } "
745: objDocument.Writeln "</style>"
746: objDocument.Writeln "<div id=""objProgress"" class=""Outer""></div>"
747: write out text lines...
748: objDocument.Writeln "<P id=txtMilestone class=txtStyle style=margin-left: 10px> </P>"
749: objDocument.Writeln "<P id=txtRemarks class=txtStyle style=margin-left: 10px ></P>"
750: objDocument.Writeln "<CENTER>"
751: write progbar
752: objDocument.Writeln "<input type=text id=pbText class=pbStyle value= >"
753: objDocument.Writeln "<br><br>" space down a little
754: write cancel button...
755: objDocument.Writeln "<input type=button value=Cancel " _
756: & " onclick=SetReturnFlag(""quit"") >"
757: objDocument.Writeln "</CENTER>"
758: write hidden object...
759: objDocument.Writeln "<form name=secret >" _
760: & " <input type=hidden name=pubFlag value=run >" _
761: & "</form>"
762: objDocument.Writeln "<SCRIPT LANGUAGE=VBScript >"
763: write "local script" to handle cmdCancel_Click event...
764: objDocument.Writeln "Sub SetReturnFlag(sFlag)"
765: objDocument.Writeln " secret.pubFlag.Value = sFlag"
766: objDocument.Writeln " txtMileStone.style.color = ""Red"" "
767: objDocument.Writeln " txtRemarks.style.color = ""Red"" "
768: objDocument.Writeln "End Sub"
769: progress bar
770: objDocument.Writeln "Function PctComplete(nPct)"
771: objDocument.Writeln "pbText.Value = String(nPct,"" "") & String(4,""n"")"
772: objDocument.Writeln "End Function"
773: calc progress bar and direction
774: objDocument.Writeln "Sub UpdateProgress()"
775: objDocument.Writeln "Dim intStep"
776: objDocument.Writeln "Dim intDirection"
777: objDocument.Writeln "If (IsNull(objProgress.getAttribute(""Step"")) = True) Then"
778: objDocument.Writeln "intStep = 0"
779: objDocument.Writeln "Else"
780: objDocument.Writeln "intStep = objProgress.Step"
781: objDocument.Writeln "End If"
782: objDocument.Writeln "if (IsNull(objProgress.GetAttribute(""Direction""))=True) Then"
783: objDocument.Writeln "intDirection = 0"
784: objDocument.Writeln "Else"
785: objDocument.Writeln "intDirection = objProgress.Direction"
786: objDocument.Writeln "End If"
787: objDocument.Writeln "if intDirection=0 then"
788: objDocument.Writeln "intStep = intStep + 1"
789: objDocument.Writeln "else"
790: objDocument.Writeln "intStep = intStep - 1"
791: objDocument.Writeln "end if"
792: objDocument.Writeln "Call PctComplete(intStep)"
793: objDocument.Writeln "if intStep>=23 then"
794: objDocument.Writeln "intDirection=1"
795: objDocument.Writeln "end if"
796: objDocument.Writeln "if intStep<=0 then"
797: objDocument.Writeln "intDirection=0"
798: objDocument.Writeln "end if"
799: objDocument.Writeln "objProgress.SetAttribute ""Step"", intStep"
800: objDocument.Writeln "objProgress.SetAttribute ""Direction"", intDirection"
801: objDocument.Writeln "Window.setTimeout GetRef(""UpdateProgress""), " & conBarSpeed
802: objDocument.Writeln "End Sub"
803: timeout function
804: objDocument.Writeln "Sub DialogHardTimeout()"
805: objDocument.Writeln "SetReturnFlag(""quit"")"
806: objDocument.Writeln "End sub"
807: objDocument.Writeln "Sub Window_OnLoad()"
808: objDocument.Writeln "theleft = (screen.availWidth - document.body.clientWidth) / 2"
809: objDocument.Writeln "thetop = (screen.availHeight - document.body.clientHeight) / 2"
810: objDocument.Writeln "window.moveTo theleft,thetop"
811: objDocument.Writeln "Window.setTimeout GetRef(""UpdateProgress""), " & conBarSpeed
812: objDocument.Writeln "Window.setTimeout GetRef(""DialogHardTimeout""), " & conForcedTimeOut
813: objDocument.Writeln "End Sub"
814: objDocument.Writeln "</SCRIPT>"
815: objDocument.Close
816: End Sub
817:
818: Sub CleanUpAndQuit(RetCode)
819: House Cleaning
820:
821: if not isObject(oLogfile) then
822: set oLogfile = nothing
823: end if
824:
825: if not (oLogfile is nothing) then
826: else
827: oLogFile.Close
828: set oLogfile = nothing
829: end if
830:
831: Set oFso = Nothing
832: WScript.Quit(RetCode)
833:
834: End Sub
DeDupeInstall.bat
1: @echo off
2: Cls
3: echo.
4: echo Installing De-dupe Shell Extension and Its Support Tools
5: echo =============================================================
6: echo.
7:
8: echo 1. Copy md5sum.exe to %SystemRoot%\system32\
9: copy md5sum.exe %SystemRoot%\system32\md5sum.exe
10: echo.
11:
12: echo 2. Copy DedupeFilesInFolder.vbs to %SystemRoot%\system32\
13: copy DedupeFilesInFolder.vbs %SystemRoot%\system32\DedupeFilesInFolder.vbs
14: echo.
15:
16: echo 3.Register "DeDupe" Directory Shell Extension
17: regedit /s DedupeInstall.reg
18: echo.
19:
20: echo 4. done... DeDupe Shell Extension Installed Successfully
21: echo.
22: echo.
23: pause
DeDupeInstall.reg
Note the default registry value in HEX. This was necessary to be able to add the value:
“wscript.exe %SystemRoot%\system32\DedupeFilesInFolder.vbs” to the registry. Notice the %SystemRoot% environment variable? That needs to be expanded by windows, when you activate the shell extension. If you would create the value as Reg_SZ string, it would not work.
It needs to be an expandable string (because it will be expanded at runtime), REG_Expand_SZ. Expandable string values can only be created by the registry import, if you provide the value of a string in hexadecimal format. This tells Windows to create REG_EXPAND_SZ value instead of the basic REG_SZ.After each character follows the hex value 00 and at the end two more 00 are added as well. Why this is necessary, I have no idea, but this is how you have to do it.
1: Windows Registry Editor Version 5.00
2:
3: [HKEY_CLASSES_ROOT\Directory\shell\DeDupe]
4: @="DeDupe"
5:
6: [HKEY_CLASSES_ROOT\Directory\shell\DeDupe\command]
7: @=hex(2):77,00,73,00,63,00,72,00,69,00,70,00,74,00,2E,00,65,00,78,00,65,00,20,\
8: 00,25,00,53,00,79,00,73,00,74,00,65,00,6D,00,52,00,6F,00,6F,00,74,00,25,00,\
9: 5C,00,73,00,79,00,73,00,74,00,65,00,6D,00,33,00,32,00,5C,00,44,00,65,00,64,\
10: 00,75,00,70,00,65,00,46,00,69,00,6C,00,65,00,73,00,49,00,6E,00,46,00,6F,00,\
11: 6C,00,64,00,65,00,72,00,2E,00,76,00,62,00,73,00,00,00
DeDupeUnInstall.bat
1: @echo off
2: CLS
3: echo.
4: echo Un-Installing De-dupe Shell Extension and Its Support Tools
5: echo =============================================================
6: echo.
7:
8: echo 1. Delete md5sum.exe from %SystemRoot%\system32\
9: if EXIST %SystemRoot%\system32\md5sum.exe del /Q /F "%SystemRoot%\system32\md5sum.exe"
10: echo.
11:
12: echo 2. Delete DedupeFilesInFolder.vbs from %SystemRoot%\system32\
13: if EXIST %SystemRoot%\system32\DedupeFilesInFolder.vbs del /Q /F
14: "%SystemRoot%\system32\DedupeFilesInFolder.vbs"
15: echo.
16:
17: echo 3. Unregister "DeDupe" Directory Shell Extension
18: regedit /s DedupeUnInstall.reg
19: echo.
20:
21: echo 4. done... DeDupe was Uninstalled Successfully
22: echo.
23: echo.
24: pause
DedupeUnInstall.reg
1: Windows Registry Editor Version 5.00
2:
3: [-HKEY_CLASSES_ROOT\Directory\shell\DeDupe]
4: @="DeDupe"
This is another tool created by Carsten Cumbrowski aka Roy/SAC and I hope that you will find this one helpful as well. For comments, praise or complaints, please use the comments section of the blog below. Thanks.
Cheers!
Carsten aka Roy/SAC
No comments:
Post a Comment
Hi, thanks for taking the time to comment at my blog.
Due to spam issues comments are not immediately posted on the site and require my manual approval first, before they become visible.
I try to approve comments as quickly as possible and usually within 24 hours.
To be notified about follow up comments that are made after yours, use the subscribe option with your email address and you will receive an email alert, if somebody else comments at this post in the future.
Also check out the rest of the website beyond this blog, visit RoySAC.com. Also see my YouTube channels, SACReleases for intros and demos.
Cheers!
Carsten aka Roy/SAC
Note: Only a member of this blog may post a comment.