repub - 源码 - 源码 - 免费下载

开通博客赚积分

发布资源赚积分

repub

文件大小： unknow

源码售价： 5 个金币积分规则积分充值

充值1元得10金币

资源说明：Simple HTML to ePub converter.

== Repub

by Invisible Llama (dg at invisiblellama dot net)

{RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master]

== DESCRIPTION:

Repub is a simple HTML to ePub converter.

It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
for title and table of contents. In return, it provides you with greater control over generated
ePub documents.

== FEATURES:

Repub accepts the following parameters:

* Source document URL
* List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections
* List of XPath expressions for describing elements that will be removed from the converted document
* List of regular expressions for editing the source document
* Publication information metadata tags

All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't
broken too bad) be readable but will be lacking any metadata or TOC.

Few examples:

* Project Gutenberg's The Adventures Of Sherlock Holmes (with proper table of contents)

repub -x 'title:div[@class="book"]//h1' \
-x 'toc://table' \
-x 'toc_item://tr' \
http://www.gutenberg.org/dirs/etext99/advsh12h.htm

This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is
located in the first TABLE and TOC item can be found inside TR.
The above will produce readable ePub which can be further enhanced by removing some "noise" content:

repub -x 'title:div[@class="book"]//h1' \
-x 'toc://table' \
-x 'toc_item://tr' \
-X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
http://www.gutenberg.org/dirs/etext99/advsh12h.htm

In addition to parsing, the above command also removes from the final version of document all PREs, HRs and
first H1 and H2 elements from the body.

A bit more complicated example:

* Git User's Manual

repub -x 'title://h1' \
-x 'toc://div[@class="toc"]/dl' \
-x 'toc_item:dt' \
-x 'toc_section:following-sibling::*[1]/dl' \
-w git-manual \
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html

This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and
that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when
DL element immediately follows DT.

The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes.
For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do

repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html

Few more examples:

* Open Packaging Format (OPF) 2.0 (one of the ePub standards, in ePub)

repub -x 'title://p[@class="Title"]' \
-x 'toc://div[@class="TOC"]' \
-x 'toc_item:.//p' \
-x 'toc_section:.//div[@class="TOCSection"]' \
http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html

* GNU Wget Manual

repub -m 'creator:gnu.org' \
-x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \
-X '//div[@class="contents"]' \
http://www.gnu.org/software/wget/manual/wget.html

* And finally, the "Hello World" of e-books, Alice's Adventures In Wonderland

repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \
http://www.gutenberg.org/files/11/11-h/11-h.htm

== SYNOPSIS:

Repub is a simple HTML to ePub converter.

Usage: repub [options] url

General options:
-D, --downloader NAME Which downloader to use to get files (wget or httrack).
Default is wget.
-o, --output PATH Output path for generated ePub file.
Default is /Users/dg/Projects/repub/.epub
-w, --write-profile NAME Save given options for later reuse as profile NAME.
-l, --load-profile NAME Load options from saved profile NAME.
-W, --write-default Save given options for later reuse as default profile.
-L, --list-profiles List saved profiles.
-C, --cleanup Clean up download cache.
-v, --verbose Turn on verbose output.
-q, --quiet Turn off any output except errors.
-V, --version Show version.
-h, --help Show this help message.

Parser options:
-x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
Recognized selectors are: [title toc toc_item toc_section]
-m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
Valid metadata names are: [creator date description
language publisher relation rights subject title]
-e, --encoding NAME Set source document encoding. Default is to autodetect.

Post-processing options:
-s, --stylesheet PATH Use custom stylesheet at PATH. Use -s- to remove
all links to stylesheets and