repub
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Simple HTML to ePub converter.
== Repub

by Invisible Llama (dg at invisiblellama dot net)

{RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master]

== DESCRIPTION:

Repub is a simple HTML to ePub converter.

It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
for title and table of contents. In return, it provides you with greater control over generated
ePub documents.

== FEATURES:

Repub accepts the following parameters:

* Source document URL
* List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections
* List of XPath expressions for describing elements that will be removed from the converted document
* List of regular expressions for editing the source document
* Publication information metadata tags

All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't
broken too bad) be readable but will be lacking any metadata or TOC.

Few examples:

* Project Gutenberg's The Adventures Of Sherlock Holmes (with proper table of contents)

    repub -x 'title:div[@class="book"]//h1' \
      -x 'toc://table' \
      -x 'toc_item://tr' \
      http://www.gutenberg.org/dirs/etext99/advsh12h.htm

This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is
located in the first TABLE and TOC item can be found inside TR.
The above will produce readable ePub which can be further enhanced by removing some "noise" content: 

    repub -x 'title:div[@class="book"]//h1' \
      -x 'toc://table' \
      -x 'toc_item://tr' \
      -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
      http://www.gutenberg.org/dirs/etext99/advsh12h.htm

In addition to parsing, the above command also removes from the final version of document all PREs, HRs and 
first H1 and H2 elements from the body.

A bit more complicated example:

* Git User's Manual

    repub -x 'title://h1' \
      -x 'toc://div[@class="toc"]/dl' \
      -x 'toc_item:dt' \
      -x 'toc_section:following-sibling::*[1]/dl' \
      -w git-manual \
      http://www.kernel.org/pub/software/scm/git/docs/user-manual.html

This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and
that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when
DL element immediately follows DT. 

The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes.
For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do

    repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html

Few more examples:

* Open Packaging Format (OPF) 2.0 (one of the ePub standards, in ePub)

    repub -x 'title://p[@class="Title"]' \
      -x 'toc://div[@class="TOC"]' \
      -x 'toc_item:.//p' \
      -x 'toc_section:.//div[@class="TOCSection"]' \
      http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html

* GNU Wget Manual

    repub -m 'creator:gnu.org' \
      -x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \
      -X '//div[@class="contents"]' \
      http://www.gnu.org/software/wget/manual/wget.html

* And finally, the "Hello World" of e-books, Alice's Adventures In Wonderland

    repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \
      http://www.gutenberg.org/files/11/11-h/11-h.htm

== SYNOPSIS:

Repub is a simple HTML to ePub converter.

Usage: repub [options] url

General options:
  -D, --downloader NAME            Which downloader to use to get files (wget or httrack).
                                   Default is wget.
  -o, --output PATH                Output path for generated ePub file.
                                   Default is /Users/dg/Projects/repub/.epub
  -w, --write-profile NAME         Save given options for later reuse as profile NAME.
  -l, --load-profile NAME          Load options from saved profile NAME.
  -W, --write-default              Save given options for later reuse as default profile.
  -L, --list-profiles              List saved profiles.
  -C, --cleanup                    Clean up download cache.
  -v, --verbose                    Turn on verbose output.
  -q, --quiet                      Turn off any output except errors.
  -V, --version                    Show version.
  -h, --help                       Show this help message.

Parser options:
  -x, --selector NAME:VALUE        Set parser XPath selector NAME to VALUE.
                                   Recognized selectors are: [title toc toc_item toc_section]
  -m, --meta NAME:VALUE            Set publication information metadata NAME to VALUE.
                                   Valid metadata names are: [creator date description
                                   language publisher relation rights subject title]
  -e, --encoding NAME              Set source document encoding. Default is to autodetect.

Post-processing options:
  -s, --stylesheet PATH            Use custom stylesheet at PATH. Use -s- to remove
                                   all links to stylesheets and