polilist - 源码 - 源码 - 免费下载

polilist

文件大小： unknow

源码售价： 5 个金币积分规则积分充值

资源说明：A tool for parsing Australian government websites into a contact database

This project is an effort to parse the contents of Australian parliament
websites into a database suitable for mail merging, filtering, etc.

Information for users
---------------------
To use this tool, you will need to the following components installed:
* python 2.6
* python-httplib2
* python-beautifulsoup (for some parsers)
While this tool is designed to work on any operating system, it has only been
tested on Linux. If you run into compatibility problems, please file a bug
report as described below.

To check the command line arguments for this tool run:
./polilist --help

More comprehensive information will be added to this README file as the tool
matures. If you come across a bug or would like a feature added please request
this at https://github.com/Smattr/polilist/issues.

Information for developers
--------------------------
For now, while this project is in its early stages this README file will
contain a basic overview of the proposed design of this system.

Please follow these guidelines during development:
- Unless you have a very good reason, always use the hardcache provider during
development. If not, you will quickly upset website admins.
- Don't commit uncommented code. I don't want to have to have to divine the
meaning of your code.
- If you are making a major architectural change, please speak to me about it
before committing.
- If you can see problems in the design that will cause issues down the road,
please flag them with me as soon as possible.
- If you come across an issue that you can't (or don't want to) fix, please
raise it at https://github.com/Smattr/polilist/issues.
- If possible, make sure your changes don't break any of the tests (ensure
tests/runtests executes correctly).
Along with these, follow general DVCS-project politeness (informative commit
messages, <80 characters in first line of commit message, don't create remote
heads, don't commit code that doesn't compile, etc.). If you're about to do
something that you think might upset people, ask them before pushing.

Design
------
Apologies for the confusing manner in which this is described. It's not very
clear in my head right now.
+-----------+
++-|Base Parser|
+----+ || +-----------+
+-------------+<-------|Main|------+ +--------+
|Base Provider| +----+ +---->|Parser 1|
+-------------+ | | +--------+
/ \ | | |
/ \ | | +--------+
/ \ | +---->|Parser 2|
+----------+ +----------+ | +--------+
|Provider 1| |Provider 2| |
+----------+ +----------+ |
|
\ /
+-----------+
|Base Output|
+-----------+
/ \
/ \
/ \
+----------+ +-------------+
|CSV Output| |SQLite Output|
+----------+ +-------------+

The main logic of this tool has a provider (an object with the ability to
retrieve documents over HTTP). The reason for having more than one type of
provider is to allow different providers to implement different caching
policies; in particular to allow a provider with a very lazy cache coherence
protocol for use in testing so as not to overload websites with unnecessary
HTTP requests.

Main has a set of parsers, each specific to a given webpage. Each parser
encapsulates the format of a page by providing a function to retrieve the
contacts on that page (or pages linked from there). In this way, Main should
just be able to iterate through its known parsers and call each one.

Main has a single Output that it uses when it comes time to write out the
contact information it has discovered. Each Output defines a format for the
file(s) it writes.

That should all be as clear as mud now.

Testing
-------
The directory tests contains all the unit and regression tests for this project.
To run them, execute the script runtests in this directory. Note that running
the tests may cause some data to be downloaded and you should expect some tests
to fail if you do not have an available connection to the internet.

If you look at runtests, you will see that it just executes every executable
file in the subdirectories. To create a new test just create an executable file
(can be any interpreted language) in one of the subdirectories. To decide where
your new test should live, refer to the following list:
- tests/integration Tests that only interact with the project through its
external interfaces. These are tests designed to exercise
the project functionality that is exposed to end users.
- tests/regression Tests that prove the absence of some previously observed
bug. The purpose of these tests is to ensure a bug that has
been fixed in the past is not reintroduced by new changes.
When closing a bug you should also commit a regression test
that validates the fix.
- tests/unit Tests that probe a feature of a specific module in
isolation. These do not have to run through the same setup
as the main program logic.

Links
-----
Directory of NSW local councils: http://www.dlg.nsw.gov.au/dlg/dlghome/dlg_localgovdirectory.asp

部分文件列表（点击文件名可查看文件内容）

					
									本源码包内暂不包含可直接显示的源代码文件，请下载源码包。