资源说明:plagg is a weblog/news aggregator that works in conjunction with Rael Dornfest’s blosxom.
# plagg, a RSS aggregator ## 0. What is this? plagg is a weblog/news aggregator that works in conjunction with [Rael Dornfest's](http://www.raelity.org) [blosxom](http://www.blosxom.com). It can be easily extended to support other blogging tools. plagg reads an OPML file containing a list of RSS or Atom feeds, and generates blosxom blog entries from these feeds. The items of each feed are generated into their own directory/blosxom category, which allows to read the news all at once or per feed. You can see examples of plagg's output [on my news page](http://drbeat.li/news). ## 1. Installation 1. Download [plagg](http://drbeat.li/py/plagg/plagg.tar.gz) 2. Untar the distribution file to a directory of your choice 3. Run `python setup.py install` as root 4. Set up an [OPML](#opml) file containing the feeds you'd like to read 5. Run `plagg -d` _newsdir_ _opmlfile_ as often as you like from a cron job, where _newsdir_ is somewhere within your blosxom data directory 6. Enjoy your personalized news feed! ## 2. Usage ### 2.1. Synopsis plagg -fFnovVh [-d newsdir] [opmlfile [nickname ...]] ### 2.2. Options * -f: Don't write the entry footers. Use this option if your blosxom template includes a footer. * -F: Run `plagg` for a single feed whose URL is _opmlfile_. One _nickname_ is mandatory and indicates the name of the folder within _newsdir_ where the entries get written. * -n: Write a file _newsdir_/`Latest.txt` that contains the new entries. * -o: Also generate entries older than one week. These are normally suppressed. * -v: Be verbose. May be repeated for additional effect. * -V: Display version information and exit. * -h: Display usage information and exit. * -d _newsdir_: The destination directory in subdirectories of which the news items are stored. This should be inside your blosxom data directory so that blosxom can find and display the items. ### 2.3. Arguments * _opmlfile_: The OPML file containing the feeds to read and generate news items from, or the feed URL if the `-F` option was given. * _nickname_: If given, updates only the feeds with the given nicknames (ignoring their `hours` attribute), otherwise updates all feeds. If `-F` was give, the name of the feed. The default arguments for _opmlfile_ and _destdir_ can be set in the `plagg` script. ## 3. The OPML file The distribution contains my OPML file as an example. The basic OMPL syntax is defined in the [OPML specification](http://www.opml.org/spec). ### 3.1. RSS/Atom feeds Set the `type` attribute to `"rss"`. This is the default feed type. Plagg reads the feed given by the `xmlUrl` attribute and generates news items from its content. Example:The `htmlUrl` attribute is not used by `plagg` itself, but by `opml.xsl`, which I use to generate my [blogroll](http://drbeat.li/news/news.opml). ### 3.2. HTML scraping Set the `type` to `"x-plagg-html"`. In this case, plagg reads the HTML page whose URL is in the `htmlUrl` attribute. There are two ways of specifying how to scrape: Using a regex or using [XPath][XPATH] expressions. The result of the scraping is either an image link or an `