资源说明:Simple HTML to ePub converter.
== Repub by Invisible Llama (dg at invisiblellama dot net) {RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master] == DESCRIPTION: Repub is a simple HTML to ePub converter. It lacks imagination and won't try to guess the source document structure, you will have to describe where to look for title and table of contents. In return, it provides you with greater control over generated ePub documents. == FEATURES: Repub accepts the following parameters: * Source document URL * List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections * List of XPath expressions for describing elements that will be removed from the converted document * List of regular expressions for editing the source document * Publication information metadata tags All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't broken too bad) be readable but will be lacking any metadata or TOC. Few examples: * Project Gutenberg's The Adventures Of Sherlock Holmes (with proper table of contents) repub -x 'title:div[@class="book"]//h1' \ -x 'toc://table' \ -x 'toc_item://tr' \ http://www.gutenberg.org/dirs/etext99/advsh12h.htm This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is located in the first TABLE and TOC item can be found inside TR. The above will produce readable ePub which can be further enhanced by removing some "noise" content: repub -x 'title:div[@class="book"]//h1' \ -x 'toc://table' \ -x 'toc_item://tr' \ -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \ http://www.gutenberg.org/dirs/etext99/advsh12h.htm In addition to parsing, the above command also removes from the final version of document all PREs, HRs and first H1 and H2 elements from the body. A bit more complicated example: * Git User's Manual repub -x 'title://h1' \ -x 'toc://div[@class="toc"]/dl' \ -x 'toc_item:dt' \ -x 'toc_section:following-sibling::*[1]/dl' \ -w git-manual \ http://www.kernel.org/pub/software/scm/git/docs/user-manual.html This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when DL element immediately follows DT. The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes. For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html Few more examples: * Open Packaging Format (OPF) 2.0 (one of the ePub standards, in ePub) repub -x 'title://p[@class="Title"]' \ -x 'toc://div[@class="TOC"]' \ -x 'toc_item:.//p' \ -x 'toc_section:.//div[@class="TOCSection"]' \ http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html * GNU Wget Manual repub -m 'creator:gnu.org' \ -x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \ -X '//div[@class="contents"]' \ http://www.gnu.org/software/wget/manual/wget.html * And finally, the "Hello World" of e-books, Alice's Adventures In Wonderland repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \ http://www.gutenberg.org/files/11/11-h/11-h.htm == SYNOPSIS: Repub is a simple HTML to ePub converter. Usage: repub [options] url General options: -D, --downloader NAME Which downloader to use to get files (wget or httrack). Default is wget. -o, --output PATH Output path for generated ePub file. Default is /Users/dg/Projects/repub/.epub -w, --write-profile NAME Save given options for later reuse as profile NAME. -l, --load-profile NAME Load options from saved profile NAME. -W, --write-default Save given options for later reuse as default profile. -L, --list-profiles List saved profiles. -C, --cleanup Clean up download cache. -v, --verbose Turn on verbose output. -q, --quiet Turn off any output except errors. -V, --version Show version. -h, --help Show this help message. Parser options: -x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE. Recognized selectors are: [title toc toc_item toc_section] -m, --meta NAME:VALUE Set publication information metadata NAME to VALUE. Valid metadata names are: [creator date description language publisher relation rights subject title] -e, --encoding NAME Set source document encoding. Default is to autodetect. Post-processing options: -s, --stylesheet PATH Use custom stylesheet at PATH. Use -s- to remove all links to stylesheets and