Learning OpenCV 2nd Early Release 非常实用易懂的Opencv学习资料,C++实现
文件大小:
11390k
资源说明:What Is OpenCV?
OpenCV [OpenCV] is an open source (see http://opensource.org) computer vision library
available from http://opencv.org. The library is written in C and C++1 and runs under
Linux, Windows, Mac OS X, iOS, and Android. Interfaces are available for Python, Java,
Ruby, Matlab, and other languages.
OpenCV was designed for computational efficiency with a strong focus on real-time
applications: optimizations were made at all levels, from algorithms to multicore and
CPU instructions. For example, OpenCV supports optimizations for SSE, MMX, AVX,
NEON, OpenMP, and TBB. If you desire further optimization on Intel architectures
[Intel] for basic image processing, you can buy Intel’s Integrated Performance Primitives
(IPP) libraries [IPP], which consist of low-level optimized routines in many different
algorithmic areas. OpenCV automatically uses the appropriate instructions from IPP at
runtime. The GPU module also provides CUDA-accelerated versions of many routines
(for Nvidia GPUs) and OpenCL-optimized ones (for generic GPUs).
One of OpenCV’s goals is to provide a simple-to-use computer vision infrastructure that
helps people build fairly sophisticated vision applications quickly. The OpenCV library
contains over 500 functions that span many areas, including factory product inspection,
medical imaging, security, user interface, camera calibration, stereo vision, and robotics.
Because computer vision and machine learning often go hand-in-hand, OpenCV also
contains a full, general-purpose Machine Learning Library (MLL). This sub-library is
focused on statistical pattern recognition and clustering. The MLL is highly useful for the
vision tasks that are at the core of OpenCV’s mission, but it is general enough to be used
for any machine learning problem.
1
The legacy C interface is still supported, and will remain so for the foreseeable future.
Who Uses OpenCV?
Most computer scientists and practical programmers are aware of some facet of the role
that computer vision plays. But few people are aware of all the ways in which computer
vision is used. For example, most people are somewhat aware of its use in surveillance,
and many also know that it is increasingly being used for images and video on the Web.
A few have seen some use of computer vision in game interfaces. Yet few people realize
that most aerial and street-map images (such as in Google’s Street View) make heavy use
of camera calibration and image stitching techniques. Some are aware of niche
applications in safety monitoring, unmanned aerial vehicles, or biomedical analysis. But
few are aware how pervasive machine vision has become in manufacturing: virtually
everything that is mass-produced has been automatically inspected at some point using
computer vision.
The BSD [BSD] open source license for OpenCV has been structured such that you can
build a commercial product using all or part of OpenCV. You are under no obligation to
open-source your product or to return improvements to the public domain, though we
hope you will. In part because of these liberal licensing terms, there is a large user
community that includes people from major companies (Google, IBM, Intel, Microsoft,
Nvidia, SONY, and Siemens, to name only a few) and research centers (such as Stanford,
MIT, CMU, Cambridge, Georgia Tech and INRIA). OpenCV is also present on the web
for users at http://opencv.org, a website that hosts documentation, developer information,
and other community resources including links to compiled binaries for various
platforms. For vision developers, code, development notes and links to GitHub are at
http://code.opencv.org. User questions are answered at
http://answers.opencv.org/questions/ but there is still the original Yahoo groups user
forum at http://groups.yahoo.com/group/OpenCV; it has almost 50,000 members.
OpenCV is popular around the world, with large user communities in China, Japan,
Russia, Europe, and Israel. OpenCV has a Facebook page at
https://www.facebook.com/opencvlibrary.
Since its alpha release in January 1999, OpenCV has been used in many applications,
products, and research efforts. These applications include stitching images together in
satellite and web maps, image scan alignment, medical image noise reduction, object
analysis, security and intrusion detection systems, automatic monitoring and safety
systems, manufacturing inspection systems, camera calibration, military applications, and
unmanned aerial, ground, and underwater vehicles. It has even been used in sound and
music recognition, where vision recognition techniques are applied to sound spectrogram
images. OpenCV was a key part of the vision system in the robot from Stanford,
“Stanley”, which won the $2M DARPA Grand Challenge desert robot race [Thrun06],
and continues to play an important part in other many robotics challenges.
What Is Computer Vision?
Computer vision2 is the transformation of data from 2D/3D stills or videos into either a
decision or a new representation. All such transformations are done for achieving some
particular goal. The input data may include some contextual information such as “the
camera is mounted in a car” or “laser range finder indicates an object is 1 meter away”.
The decision might be “there is a person in this scene” or “there are 14 tumor cells on this
slide”. A new representation might mean turning a color image into a grayscale image or
removing camera motion from an image sequence.
Because we are such visual creatures, it is easy to be fooled into thinking that computer
vision tasks are easy. How hard can it be to find, say, a car when you are staring at it in
an image? Your initial intuitions can be quite misleading. The human brain divides the
vision signal into many channels that stream different pieces of information into your
brain. Your brain has an attention system that identifies, in a task-dependent way,
important parts of an image to examine while suppressing examination of other areas.
There is massive feedback in the visual stream that is, as yet, little understood. There are
widespread associative inputs from muscle control sensors and all of the other senses that
allow the brain to draw on cross-associations made from years of living in the world. The
feedback loops in the brain go back to all stages of processing including the hardware
sensors themselves (the eyes), which mechanically control lighting via the iris and tune
the reception on the surface of the retina.
In a machine vision system, however, a computer receives a grid of numbers from the
camera or from disk, and, in most cases, that’s it. For the most part, there’s no built-in
pattern recognition, no automatic control of focus and aperture, no cross-associations
with years of experience. For the most part, vision systems are still fairly naïve. Figure 1-
1 shows a picture of an automobile. In that picture we see a side mirror on the driver’s
side of the car. What the computer “sees” is just a grid of numbers. Any given number
within that grid has a rather large noise component and so by itself gives us little
information, but this grid of numbers is all the computer “sees”. Our task then becomes to
turn this noisy grid of numbers into the perception: “side mirror”. Figure 1-2 gives some
more insight into why computer vision is so hard.
2
Computer vision is a vast field. This book will give you a basic grounding in the field, but we also recommend texts
by Szeliski [Szeliski2011] for a good overview of practical computer vision algorithms, and Hartley [Hartley06] for
how 3D vision really works.
Figure 1-1. To a computer, the car’s side mirror is just a grid of numbers
In fact, the problem, as we have posed it thus far, is worse than hard; it is formally
impossible to solve. Given a two-dimensional (2D) view of a 3D world, there is no
unique way to reconstruct the 3D signal. Formally, such an ill-posed problem has no
unique or definitive solution. The same 2D image could represent any of an infinite
combination of 3D scenes, even if the data were perfect. However, as already mentioned,
the data is corrupted by noise and distortions. Such corruption stems from variations in
the world (weather, lighting, reflections, movements), imperfections in the lens and
mechanical setup, finite integration time on the sensor (motion blur), electrical noise and
compression artifacts after image capture. Given these daunting challenges, how can we
make any progress?
Figure 1-2: The ill-posed nature of vision: the 2D appearance of objects
can change radically with viewpoints
In the design of a practical system, additional contextual knowledge can often be used to
work around the limitations imposed on us by visual sensors. Consider the example of a
mobile robot that must find and pick up staplers in a building. The robot might use the
facts that a desk is an object found inside offices and that staplers are mostly found on
desks. This gives an implicit size reference; staplers must be able to fit on desks. It also
helps to eliminate falsely “recognizing” staplers in impossible places (e.g., on the ceiling
or a window). The robot can safely ignore a 200-foot advertising blimp shaped like a
stapler because the blimp lacks the prerequisite wood-grained background of a desk. In
contrast, with tasks such as image retrieval, all stapler images in a database may be of
real staplers and so large sizes and other unusual configurations may have been implicitly
precluded by the assumptions of those who took the photographs. That is, the
photographer perhaps took pictures only of real, normal-sized staplers. Also, when taking
pictures, people tend to center objects and put them in characteristic orientations. Thus,
there is often quite a bit of unintentional implicit information within photos taken by
people.
Contextual information can also be modeled explicitly with machine learning techniques.
Hidden variables such as size, orientation to gravity, and so on can then be correlated
with their values in a labeled training set. Alternatively, one may attempt to measure
hidden bias variables by using additional sensors. The use of a laser range finder to
measure depth allows us to accurately infer the size of an object.
The next problem facing computer vision is noise. We typically deal with noise by using
statistical methods. For example, it may be impossible to detect an edge in an image
merely by comparing a point to its immediate neighbors. But if we look at the statistics
over a local region, edge detection becomes much easier. A real edge should appear as a
string of such immediate neighbor responses over a local region, each of whose
orientation is consistent with its neighbors. It is also possible to compensate for noise by
taking statistics over time. Still, other techniques account for noise or distortions by
building explicit models learned directly from the available data. For example, because
lens distortions are well understood, one need only learn the parameters for a simple
polynomial model in order to describe—and thus correct almost completely—such
distortions.
The actions or decisions that computer vision attempts to make based on camera data are
performed in the context of a specific purpose or task. We may want to remove noise or
damage from an image so that our security system will issue an alert if someone tries to
climb a fence or because we need a monitoring system that counts how many people
cross through an area in an amusement park. Vision software for robots that wander
through office buildings will employ different strategies than vision software for
stationary security cameras because the two systems have significantly different contexts
and objectives. As a general rule: the more constrained a computer vision context is, the
more we can rely on those constraints to simplify the problem and the more reliable our
final solution will be.
OpenCV is aimed at providing the basic tools needed to solve computer vision problems.
In some cases, high-level functionalities in the library will be sufficient to solve the more
complex problems in computer vision. Even when this is not the case, the basic
components in the library are complete enough to enable creation of a complete solution
of your own to almost any computer vision problem. In the latter case, there are some
tried-and-true methods of using the library; all of them start with solving the problem
using as many available library components as possible. Typically, after you’ve
developed this first-draft solution, you can see where the solution has weaknesses and
then fix those weaknesses using your own code and cleverness (better known as “solve
the problem you actually have, not the one you imagine”). You can then use your draft
solution as a benchmark to assess the improvements you have made. From that point,
whatever weaknesses remain can be tackled by exploiting the context of the larger system
in which your problem solution is embedded, or by setting out to improve some
component of the system with your own novel contributions.
The Origin of OpenCV
OpenCV grew out of an Intel Research initiative to advance CPU-intensive applications.
Toward this end, Intel launched many projects including real-time ray tracing and 3D
display walls. One of the authors (Gary) working for Intel at that time was visiting
universities and noticed that some top university groups, such as the MIT Media Lab, had
well-developed and internally open computer vision infrastructures—code that was
passed from student to student and that gave each new student a valuable head start in
developing his or her own vision application. Instead of reinventing the basic functions
from scratch, a new student could begin by building on top of what came before.
Thus, OpenCV was conceived as a way to make computer vision infrastructure
universally available. With the aid of Intel’s Performance Library Team,3 OpenCV
started with a core of implemented code and algorithmic specifications being sent to
members of Intel’s Russian library team. This is the “where” of OpenCV: it started in
Intel’s research lab with collaboration from the Software Performance Libraries group
together with implementation and optimization expertise in Russia.
Chief among the Russian team members was Vadim Pisarevsky, who managed, coded,
and optimized much of OpenCV and who is still at the center of much of the OpenCV
effort. Along with him, Victor Eruhimov helped develop the early infrastructure, and
Valery Kuriakin managed the Russian lab and greatly supported the effort. There were
several goals for OpenCV at the outset:
• Advance vision research by providing not only open but also optimized code for basic vision
infrastructure. No more reinventing the wheel.
• Disseminate vision knowledge by providing a common infrastructure that developers could build on,
so that code would be more readily readable and transferable.
• Advance vision-based commercial applications by making portable, performance-optimized code
available for free—with a license that did not require commercial applications to be open or free
themselves.
Those goals constitute the “why” of OpenCV. Enabling computer vision applications
would increase the need for fast processors. Driving upgrades to faster processors would
generate more income for Intel than selling some extra software. Perhaps that is why this
open and free code arose from a hardware vendor rather than a software company.
Sometimes, there is more room to be innovative at software within a hardware company.
In any open source effort, it is important to reach a critical mass at which the project
becomes self-sustaining. There have now been around seven million downloads of
3
Shinn Lee was of key help as was Stewart Taylor.
OpenCV, and this number is growing by hundreds of thousands every month4. The user
group now approaches 50,000 members. OpenCV receives many user contributions, and
central development has long since moved outside of Intel.5 OpenCV’s past timeline is
shown in Figure 1-3. Along the way, OpenCV was affected by the dot-com boom and
bust and also by numerous changes of management and direction. During these
fluctuations, there were times when OpenCV had no one at Intel working on it at all.
However, with the advent of multicore processors and the many new applications of
computer vision, OpenCV’s value began to rise. Similarly, rapid growth in the field of
robotics has driven much use and development of the library. After becoming an open
source library, OpenCV spent several years under active development at Willow Garage
and Itseez, and now is supported by the OpenCV foundation at http//opencv.org. Today,
OpenCV is actively being developed by the OpenCV.org foundation, Google supports on
order of 15 interns a year in the Google Summer of Code program6, and Intel is back
actively supporting development. For more information on the future of OpenCV, see
Chapter 14.
Figure 1-3: OpenCV timeline
4
It is noteworthy, that at the time of the publication of “Learning OpenCV” in 2006, this rate was 26,000 per month.
Seven years later, the download rate has grown to over 160,000 downloads per month.
5
As of this writing, Itseez (http://itseez.com/) is the primary maintainer of OpenCV
6
Google Summer of Code https://developers.google.com/open-source/soc/
Who Owns OpenCV?
Although Intel started OpenCV, the library is and always was intended to promote
commercial and research use. It is therefore open and free, and the code itself may be
used or embedded (in whole or in part) in other applications, whether commercial or
research. It does not force your application code to be open or free. It does not require
that you return improvements back to the library—but we hope that you will.
Downloading and Installing OpenCV
The main OpenCV site is at http://opencv.org, from which you can download the
complete source code for the latest release, as well as many recent releases. The
downloads themselves are found at the downloads page:
http://opencv.org/downloads.html. However, if you want the very most up-to-date
version it is always found on GitHub at https://github.com/Itseez/opencv, where the
active development branch is stored. The computer vision developer’s site (with links to
the above) is at http://code.opencv.org/.
Installation
In modern times, OpenCV uses Git as its development version control system, and
CMake to build7. In many cases, you will not need to worry about building, as compiled
libraries exist for supported environments. However, as you become a more advanced
user, you will inevitably want to be able to recompile the libraries with specific options
tailored to your application and environment. On the tutorial pages at
http://docs.opencv.org/doc/tutorials/tutorials.html under “introduction to OpenCV”, there
are descriptions of how to set up OpenCV to work with a number of combinations of
operating systems and development tools.
Windows
At the page: http://opencv.org/downloads.html, you will see a link to download the latest
version of OpenCV for Windows. This link will download an executable file which you
can run, and which will install OpenCV, register DirectShow filters, and perform various
post-installation procedures. You are now almost ready to start using OpenCV.8
The one additional detail is that you will want to add is an OPENCV_DIR environment
variable to make it easier to tell your compiler where to find the OpenCV binaries. You
can set this by going to a command prompt and typing9:
setx -m OPENCV_DIR D:\OpenCV\Build\x86\vc10
If you built the library to link statically, this is all you will need. If you built the library
to link dynamically, then you will also need to tell your system where to find the library
7
In olden times, OpenCV developers used Subversion for version control and automake to build. Those days, however,
are long gone.
8
It is important to know that, although the Windows distribution contains binary libraries for release builds, it does not
contain the debug builds of these libraries. It is therefore likely that, before developing with OpenCV, you will want to
open the solution file and build these libraries for yourself.
9
Of course, the exact path will vary depending on your installation, for example if you are installing on an ia64
machine, then the path will not include “x86”, but rather “ia64”.
binary. To do this, simply add %OPENCV_DIR%\bin to your library path. (For example,
in Windows 7, right-click on your Computer icon, select Properties, and then click on
Advanced System Settings. Finally select Environment Variables and add the OpenCV
binary path to the Path variable.)
To add the commercial IPP performance optimizations to Windows, obtain and install
IPP from the Intel site (http://www.intel.com/software/products/ipp/index.htm); use
version 5.1 or later. Make sure the appropriate binary folder (e.g., c:/program
files/intel/ipp/5.1/ia64/bin) is in the system path. IPP should now be automatically
detected by OpenCV and loaded at runtime (more on this in Chapter 3).
Linux
Prebuilt binaries for Linux are not included with the Linux version of OpenCV owing to
the large variety of versions of GCC and GLIBC in different distributions (SuSE, Debian,
Ubuntu, etc.). In many cases however, your distribution will include OpenCV. If your
distribution doesn’t offer OpenCV, you will have to build it from sources. As with the
Windows installation, you can start at the http://opencv.org/downloads.html page, but in
this case the link will send you to Sourceforge10, where you can select the tarball for the
current OpenCV source code bundle.
To build the libraries and demos, you’ll need GTK+ 2.x or higher, including headers.
You’ll also need pkgconfig, libpng, libjpeg, libtiff, and libjasper with development files
(i.e., the versions with -dev at the end of their package names). You’ll need Python 2.6 or
later with headers installed (developer package). You will also need libavcodec and the
other libav* libraries (including headers) from ffmpeg 1.0 or later .
Download ffmpeg from http://ffmpeg.mplayerhq.hu/download.html.11 The ffmpeg
program has a lesser general public license (LGPL). To use it with non-GPL software
(such as OpenCV), build and use a shared ffmpg library:
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。