资源说明:Searches for similar images.
imgfind - Searches for similar image files. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Copyright (C) 2011 Joseph Woodruff imgfind is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INSTALL: Build and installation instructions included in INSTALL file. USAGE: Usage: imgfind [path...] search_criteria path is a path to a directory or image file. Default is the current working directory. search_criteria is a file of gif, jpeg, png or bmp format HOW DOES IT WORK? (DESIGN): imgfind uses the pHash perceptual hashing library to generate non-avalanching discrete cosine transform-based hash values of images, for which hamming distances are computed to determine similarity. The hashing algorithms in the pHash library are designed to be robust against rotation, skew and contrast adjustments and should also be resistant to scaling to a limited extent. The default settings for the hamming distance similarity criteria will tend to produce some false positives against a non-trivial image collection; this is by design, to reduce the possibility of false negatives. Thus, I wouldn't incorporate imgfind into an automated file deduplication script, unless it enforces manual/interactive verification of the results ;) imgfind is written in C++, and makes considerable use the STL, boost, and C++ standard classes such as string. Unfortunately, pHash is written predominantly in standard C, without any of the helpful features provided by C++, like classes or templates or automated memory management. This has, unsurprisingly, presented some challenges with regards to integrating the two. It is my hope that I will be able to write a class-based wrapper around the pHash API to make it easier to incorporate into object-oriented C++ applications. For now though, I'll make due with what I have. WHY THIS?: To be perfectly honest, I've been using imageboards since high school, and now have what constitutes an unmanageably large collection of image macros, funny cat pictures and other assorted mental detritus. As a result, there are duplicate files taking up disk space and cluttering up my image directories. Unfortunately, most image files such as these don't contain any relevant searchable metadata describing their contents, which leaves manual inspection the directory or directories as the only option for trying to track down duplicates or lower-resolution copies. This is where imgfind comes in. I've written it as a nice little search utility modeled after the classic UNIX find, that can be used to match image files by their actual contents. It can detect duplicates and even images which are merely 'similar' (using pHash's suggested values for hamming distance it can reliably pick out image macros using the same image with different texts, for instance). I imagine this could also eventually be useful for digital forensics and copyright enforcement applications, if one were so inclined. In any event, happy image hunting. Comments are welcome; patches / pull requests are even more welcome. - Joseph Woodruff(aka sectoidman)
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。