eol
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Encyclopedia of Life
= Encyclopedia of Life
= www.eol.org

== INTRODUCTION

Welcome to the Encyclopedia of Life project.  The bulk of the code needed to run http://www.eol.org is written
in Ruby on Rails and is made available to anyone for re-use, repurposing or for improvement.  This is both an
ambitious project and an ambitious codebase and we are excited to share it with the open source community.  The
code has been under development since approximately September 2007, but has undergone many revisions and
updates from June 2008-December 2008.  There is much work to be done, both in adding new features, and in the
ongoing process of code refactoring and performance improvements.  If you see something you like, share it with
your colleagues and friends and reuse it in your own projects.  If you see something you don't like, help us
fix it or join the discussion at the "Developers" forum at http://forum.eol.org, or on GitHub or Google Code.

== LICENSE

The full code base is released under the MIT License.  Details are available in the "MIT-LICENSE.txt" file at
the root of the code folder.

== GETTING STARTED

This is a big Rails project.  Clearly, some of the installation steps below are big and require multiple steps,
but if you are currently a Rails developer, some are already done (like installing Ruby).  If you are not a
Rails developer, we suggest you first visit www.rubyonrails.org for more information on getting started with
Rails and then return to us when you have your feet wet.  The www.eol.org codebase probably shouldn't be the
first Rails project you've ever seen.

For seasoned Rails developers, you'll also notice the codebase does some mix and matching --- both restful
controllers and regular controllers, both ERB and HAML.  We like the restful way of doing things and plan to
move in that direction.  HAML vs ERB is still an open topic for discussion, although we are sticking with ERB
until it's settled.

Hint: start with the TaxonConcept model.  Everything stems from there.

=== INSTALLATION


To get things up and running, these are the steps you need to take.  If you actually run 
through this process, please update this list with any changes you notice being necessary!

Note that many of these steps require root access on your machine.  You have been warned and may need to run
them as "sudo" on a Mac/Linux or as an administrator on Windows (there, I acknowledged the existence of
Windows).

=== FIRST THINGS FIRST

1. Install Ruby and the Rails framework on your local machine (www.ruby-lang.org and www.rubyonrails.org).
2. Install MySQL on your local machine (www.mysql.com).
3. Get the code from the public repository (http://repository.eol.org/eol) or GitHub.

Next, continue with the following sections:  "INSTALLING REQUIRED GEMS", "SETTING UP THE DATABASES",
"INSTALLING AND RUNNING MEMCACHED", and "WATCHING IT COME TO LIFE"

=== SOLR

EOL requires Solr (a fast indexing engine) to run properly.  Solr is run using Java, and the require JAR
file is included with the EOL project's source code.  (TODO - we need to check the legal situtaion with
this--we may need to include a LICENCSE for it.)  There is a rake task for starting (solr:start) and
stopping (solr:stop) the Solr server.  Searches will not function without Solr, and at some point, the
entire codebase (rendering a taxa page) may require Solr as well.

To re-build your indexes for Solr searching (up to 100 TaxonConcepts), run the command:

  rake solr:build

...Note that this command first deletes all existing entries, then adds entires (max: 100) for each
TaxonConcept in the development.  If you want to build indexes based on the data in your test (or
integration, or...) environments, specify the RAILS_ENV:

  rake solr:build RAILS_ENV=test

=== INSTALLING REQUIRED GEMS

The following gems are required on your local machine for development but NOT deployment (deployment gems
should always be frozen into the project):

1. ZenTest
sudo gem install ruby-debug
3. piston

If you are on a Mac, you will also need ruby-growl. ... well, if you want to run autotest, you will.

Plugins required to run the site in production should be frozen into the vendor folder as is the version of
Rails the app depends on.



To install required gems:
  sudo gem install nokogiri

  rake gems         # This could give you some errors...
  rake gems:build   # if there are gems that aren't installed or you got that big error...
  rake gems:install # if there were gems that are STILL missing...
  # please manually install gems that are still missing or failed to install.

==== INSTALLING NEW GEMS

If you want to add a new gem to the project, please:

1. Add the config.gem line to the config/environment.rb file.
2. Install the gem using "rake gems:install".
3. Unpack the gem into the vendor directory with "rake gems:unpack".
4. Be sure to "svn add" the new directories that were created.


=== SETTING UP THE DATABASES

1. Setup the config/database.yml

   a. Copy "config/database.sample.yml" to "config/database.yml"
   b. Create/update the appropriate entries for your development and test environments
   c. For development purposes, the +demo+, +integration+, and +production+ environments may be removed
   d. For production purposes, set the "master_database" to the master database for the core rails database and the "master_data_database" to the master database for the data database (see more info below in the "READ/WRITE SPLITTING" topic)

2. Installing and running memcached

This project doesn't run without memcached.  The reason for this is that no other caching mechanism seems to properly
persist Ruby objects.  If you *really* need to run some other caching system, modify your
config/environments/development.rb to use what you prefer, but beware of "Stack level too deep" errors, which will
occur if object serialization failed. You may need to find all of the "Rails.cache" calls in the code and use YAML to
serialize and restore the objects being stored.

To install memcached on OSX with MacPorts already on the system is quite easy:

  a. Run "sudo port install memcached".  This will install the packages required.
  b. At the end of the output from this, it should recommend you run another command, probably "sudo launchctl load \
     -w /Library/LaunchDaemons/org.macports.memcached.plist", but you should just copy/paste and run whatever it
     suggests, as the path may be different.

That's it!  Memcached is now running and will work for development.  Be mindful of this--it may require periodic
restarting or flushing (from a Rails console, you can just type "Rails.cache.clear").

If you don't have MacPorts (or don't trust it), you will need to find another way to install and launch memcached,
and that is beyond the scope of this document (but plenty of tutorials exist online).

Note that you do NOT need to install the "memcache" Ruby gem.


3. Run the following rake commands:

  rake eol:db:create:all           # Note the "EOL".  This keeps bad things from happening.  Please use it.
  rake eol:db:create:all RAILS_ENV=test
  rake db:migrate
  rake db:migrate RAILS_ENV=test
  rake truncate
  rake scenarios:load NAME=bootstrap #Be sure you installed memcached according to the step 2

  # Your "URL" (or an alternative) will be provided to you privately if needed.  Most developers who will not run code in production can skip this one.
  rake eol:site_specific repo=URL

=== WATCHING IT COME TO LIFE

Run the following two commands:
  
  rake solr:start
  script/server

Go to http://localhost:3000 and see stuff.

== TESTING

We're using RSpec for our testing (see the spec/ directory).  Run 'rake spec' (or 'rake specdoc' for specdoc output)

Note that Solr must be running in the test RAILS_ENV before tests will pass:

  rake solr:stop RAILS_ENV=test
  rake solr:start RAILS_ENV=test

Also note that the solr:stop command works even if the server was started in the development environment.

=== Test Logins

admin user
username: admin
password: admin

content partner
username: test_cp
password: test password

user
username: test_user2
password: password

curator of Animalia (page #11 ATM)
username: test_curator
password: password

=== Additional tests

You can log in as any of the users or content partners with the password "test password". There are no other
consistent users, so either create your own, log in with OpenID, or use the console to find a username.

There are several consistent content partners, and the password is also "test password".  There will be IUCN,
catalogueoflife, and test_cp. 

Once logged in, every taxon concept should have various images that will only show up based on
Vetted/Visibility values: each combination is represented.

Finally, pages/6 (TaxonConcept ID 6) has one additional image which is in "preview" mode for the test_cp
Content Partner, and should be visibile only to her.

=== Browser acceptance testing with Selenium

To setup Selenium IDE point your Firefox browser at:

  https://addons.mozilla.org/en-US/firefox/addon/2079

Once installed, it can be opened via the Tools menu of Firefox. All Selenium test suites and test cases are located
under the spec/selenium directory. Before running the development test suite, the following should be run:

  rake truncate && rake scenarios:load NAME=bootstrap && script/server

Acceptance testing with the browser is accomplished with Selenium RC and the selenium-client Rubygem. To run the
Selenium acceptance tests, open a console:

  1. Tab 1: ./script/selenium-server
  2. Tab 2: rake test:acceptance:web 

The Selenium RC and Selenium-Client should be installed in the `vendor' directory.

=== Measuring Code Quality

The EOL code base uses metric_fu-1.0.2 to evaluate the overall code quality.

==== Details

In order to run the various metrics gems, an environment variable METRICS is required. This variable is observed if
and only if the Rails environment RAILS_ENV=test.

It is the team's opinion that metric-fu, and it's various dependencies, are system level tools (like ruby-debug) and
should not be checked into the code base. A user or a CI server will require the installation of the
jscruggs-metric_fu gem which properly install various dependencies such as flay, flog, reek, and roodi.

==== Conclusion

The following Rake tasks are available when METRICS=true and RAILS_ENV=test:

  rake eol:quality:all # Runs metrics on EOL
  rake eol:quality:flay # Analyze for code duplication
  rake eol:quality:flog # Analyze for code complexity
  rake eol:quality:roodi # Check for design issues in: app/*/.rb, lib/*/.rb

An example of how to run them:

  RAILS_ENV=test METRICS=true rake eol:quality:all

== MULTI-DATABASE AND MASTER/SLAVE DATABASES SETUP

The site is built to allow for master/slave database read/write splitting for the core rails database and the
core "data" database.  There are two plugins involved in the use of multiple databases and read/write
splitting:

  *use_db*:: used to direct some models to a different database
             (http://rails.elctech.com/blog/using-and-testing-rails-with-multiple-databases)
  *masochism*:: used to split read/writes when using ActiveRecord
                (http://www.planetrubyonrails.org/tags/view/masochism)

=== MULTIPLE DATABASES

New abstract class models are created which make connections to the other databases required, and then any
models which need to connect to the other databases are subclassed from the new abstract class.  In our case,
we have two abstract classes representing connections to the data database and the logging database:

  - SpeciesSchemaModel
  - LoggingModel

These extra two databases are referenced in the database.yml in the following way:

  - environment_data (e.g. development_data)
  - environment_logging (e.g. development_logging)

=== READ/WRITE SPLITTING

Read/write splitting is accomplished with the masochism plugin by adding two new database connections to the
config/database.yml file:

  - master_database (the master database connection for the core rails database)
  - master_data_database (the master database connection for the "data" database)
    
In addition there are new abstract classes representing a connection to each master database that can be
used to run direct SQL queries against the masters:

  - MasterDatabase   (for the core rails database)
  - SpeciesSchemaWriter  (for the species data database)
      
The logging database does not require read/write splitting since there is only a single server for this purpose.

To enable read/write splitting via ActiveRecord, include the following in the approriate environment.rb file
(e.g. config/environments/production.rb):

  config.after_initialize do 
    ActiveReload::ConnectionProxy.setup!  
    ActiveReload::ConnectionProxy.setup_for SpeciesSchemaWriter, SpeciesSchemaModel          
  end

Note that you *must* also enable class caching for this to work (this is the default in production, but not in
development, which is important to note if you wish to test this functionality in development mode):

  config.cache_classes = true

Manually crafted SQL queries with SELECT statement will be redirected to slave while all other queries like in
the following example will be redirected to master:

  SpeciesSchemaModel.connection.execute("DELETE FROM data_objects WHERE id in (#{data_objects})")

You don't have to worry about master/slave databases in development mode unless you want to test your code
against splitting queries.  When in development, you could make the master_database and master_data_database
must point to the same place as development and development_data respectively.  Things should work even if
these entries are left out (since the master databases are only connected in a configuration entry in the
production environment) but it doesn't hurt if they are there.

== FINDING THINGS TODO

Spots in the code requiring some attention for refactoring, cleanup or further work are marked with a "TODO"
comment and sometimes with a level of priority.  You can quickly locate all these comments with your IDE, an
app like TextMate, or with a rake command:

  rake notes:todo

== TESTING

Here are our rake stats as of May 7th:

      +----------------------+-------+-------+---------+---------+-----+-------+
      | Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
      +----------------------+-------+-------+---------+---------+-----+-------+
      | Controllers          |  3848 |  2665 |      34 |     262 |   7 |     8 |
      | Helpers              |   713 |   503 |       2 |      49 |  24 |     8 |
      | Models               |  7120 |  3981 |     104 |     473 |   4 |     6 |
      | Libraries            |   980 |   676 |       9 |      69 |   7 |     7 |
      | Model specs          |  1644 |   637 |       1 |       5 |   5 |   125 |
      | View specs           |     0 |     0 |       0 |       0 |   0 |     0 |
      | Controller specs     |     0 |     0 |       0 |       0 |   0 |     0 |
      | Helper specs         |    10 |     7 |       0 |       0 |   0 |     0 |
      | Library specs        |     0 |     0 |       0 |       0 |   0 |     0 |
      +----------------------+-------+-------+---------+---------+-----+-------+
      | Total                | 14315 |  8469 |     150 |     858 |   5 |     7 |
      +----------------------+-------+-------+---------+---------+-----+-------+
        Code LOC: 7825     Test LOC: 644     Code to Test Ratio: 1:0.1

We really need to start improving our tests.  To that end, I would like to start using: 

  rails stats (we need to tweak this to add blackbox tests)
  reek
  flay
  flog
  roodi
  reek
  saikuro
  rcov
  churn

== CONFIGURATION SETTINGS LOAD ORDER

There are lots of configuration settings, and they load in the following order:

  1) config/environment.rb
  2) config/environments/[RAILS_ENV].rb
  3) config/environments/[RAILS_ENV]_eol_org.rb
  4) config/environment_eol_org.rb

== USERS/ROLES/RIGHTS

The site is using the acl_system plugin (http://brainspl.at/articles/2006/02/20/new-plugin-acl_system).

Basically:

1. Users can be assigned to zero or more roles.
2. Controllers are restricted by role when needed.

So you can protect controllers and actions in the following ways:

1. To indicate a user must be logged in to view particular actions or controllers, just put this at the top of
   the controller (restricted to certain actions if you wish):

   before_filter :check_authentication

   This does not perform any checks for roles, just does a simple logged in check and redirects to the login page if
   needed.

2. To indicate a user must have the rights to a particular controller, just put this at the top of the controller
   (restricted to certain actions if you wish):

     access_control :DEFAULT => 'ROLE NAME GOES HERE'

   This checks to be sure the user has access rights to that controller (based on the role you specified)
   before letting them use actions in the controller.  For more advanced control over only certain methods, see
   the acl_system2 plugin README.

3. A convenience controller/helper method is provided to check if a user is in a particular role called
   "is_user_in_role?"

   For example: 

     do_this_method_only_for_admins if is_user_in_role?('Administrator')

4. You can also use the built in acl plugin helpers to only show links or other snippets of code in your view
   based on roles membership.

   For example: 

     <% restrict_to "(Administrator) & !blacklist" do %> admin stuff here <% end %>

== ADMINISTRATIVE SECTION

The admin section uses the roles setup described above.  To create a new admin function:

1. Create a new controller in the "administrator" folder under controllers, e.g. Administrator::NewContollerName
2. Derive that controller from the Admin controller, e.g. class Administrator::NewContollerName < AdminController
3. Create your views in the correct subfolder under "views/administrator", e.g.
   "views/administrator/new_controller_name"
4. Decide if you need a new role to access this controller or if an existing role will suffice.  Add the new role
   into the "roles" database if required (you can use the admin interface or a migration).
5. At the top of your controller, restrict access to the role by adding the following line to the top:

     access_control :DEFAULT => 'ROLE NAME GOES HERE'

   Note that by default, users must also have the generic "administrator" role to access any of the controllers
   derived from the main admin controller.
6. Add your new controller to the admin navigation menu by editing "views/admin/_navigation.html.erb"
   See the existing links for restricting the display of menu items to the specific role you have identified in
   step 5.
7. Make sure you've got at least one admin user with access to the correct role, log in and test.

== LOGGING

The logging model is intended to be thought of as a data mining system. A separate database is used to store
all log data, which must be defined in your config/database.yml file. (See sample file for naming.) Models and
operations tend to fall into two categories: dimensions and facts.  In short, dimensions represent collected
(primary) data. Facts are derived (secondary) caches of information which is more meaningful to the user.  Fact
table design is highly dependent on the user interface design, because we only need to generate facts if the
information will actually be shown.  For performance reasons regarding the expected database size, fact tables
are also intended to be highly denormalized, non-authoritive sources of information.

Location-based facts require the primary data to go through a geocoding process which requires an external web
service.  This process is thus performed asynchronously from the main site. Results of IP location lookups are
cached and reused whenever possible.  While IP location lookups are non-authoritative "best guesses", they
nevertheless provide meaningful information.

In production mode it is CRITICALLY important to understand the automated logging tasks before invoking them to
avoid deletion of precious data.  To develop logging features, run the following tasks in the given order to
populate your logging database with mock data... 

  rake logging:clear						           # Deletes all logging-related records. (WARNING: NEVER run in production.)
  rake logging:dimension:mock THOUSANDS=2	 # Creates 2,000 psedo-random mock log entries (a.k.a. primary data).
  rake logging:geocode:all					       # Performs geocoding on the primary data, using caches where possible.
  rake logging:fact:all						         # Derives secondary data from primary data.

...at this point you should see data in the graph pages of the web application. Alternatively, run the
following which does all of the above in one step....

  script/runner script/logging_mock

For cron jobs, you'll likely want to log all facts for a particular date range:

   rake logging:fact:today
   rake logging:fact:yesterday
   rake logging:fact:range FROM='01/15/2007' TO='12/19/2008'

== EXTERNAL LINK TRACKING

Any links to external sites that need to be tracked should use the following two helpers:

  external_link_to(text, url)
  external_link_to(image_tag(image), url)

Both will generate a link (with either the supplied text or the supplied image url) to the supplied URL.  The
link will be logged in the database, and if the $USE_EXTERNAL_LINK_POPUPS parameter is set to TRUE in the
environment.rb file, a javascript pop-up warning window is shown prior to following the link.  The following
additional parameters can be passed after the URL for both methods:

  +:new_window        => true or false+::
    determines if link appears in new browser window (defaults to true)
  +:show_only_if_link => true or false+::
    determines if image or text is shown if no URL was supplied (defaults to false)
  +:show_link_icon    => true or false+::
    determines if the external icon image is shown after the link (defaults to true for text links and false for
    image links)

For images, the following parameters can also be passed:

  +:alt   => 'value'+:: alt tag is set with the value passed
  +:title => 'value'+:: title tag is set with the value passed

Currently no reports are provided for external link tracking, all links are stored in the "external_link_logs"
in the logging database for later reporting.

== FRAGMENT CACHING

Fragment caching is enabled in the specific environment file (e.g. config/production.rb) and the storage
mechanism (i.e. memcached) must be set as well.

For memcached:   

  config.cache_store = :mem_cache_store, '10.0.0.1:11211', '10.0.0.2:11211'

To enable caching:

  config.action_controller.perform_caching = true

All "static" pages coming out of the CMS are fragment cached and the home page cache is cleared each hour (or
as set in the $CACHE_CLEAR_IN_HOURS value set in the config/environment.rb file), using language as key to
enable multiple fragments.  The header and footer navigation of each page is also fragment cached on cleared at
the same time interval.  When changes are made in the admin interface, these caches are automatically cleared.

Names searches are cached by query type, language and vetted/non-vetted status.

Species pages are cached using the following attributes as keys (since each will cause a different species page
to be created).  Note that when logged in as an administrator or content partner, the pages are not cached and
are generated dynamically each time.

Variables for naming species page fragment caches:

  - taxon_id
  - language
  - expertise level
  - vetted or all information
  - default taxonomic browser
  - curator for page

Species page caches can be cleared by taxon ID by a CMS Site Administrator by logging into the admin console,
and going to "General Site Admin".  Clearing a species page cache automatically clears all of its ancestors as
well.

The following URLs can be used to trigger page expiration either manually in the browser or via a web service
call.  They only work if called from "allowed" IPs (i.e. as specified in configuration) as defined in the
application level method "allowed_request" (which returns TRUE or FALSE).

  localhost:3000/expire_all    # expire all non-species pages
  localhost:3000/expire_taxon/TAXON_ID  # expire specified taxon ID (and it's ancestors)
  localhost:3000/expire_taxa/?taxa_ids=ID1,ID2,ID3 # will expire a list of taxon IDs (and their unique ancestors) specified in the querystring (or post) parameter "taxa_ids" (separate by commas)
  localhost:3000/clear_caches # expire all fragment caches (if supported by data store mechanism)

From within the Rails applications, use the following application level methods:

  expire_all   # expire all non-species pages
  expire_taxon(taxon_ID)  # expire specified taxon id and ancestors (unless :expire_ancestors=>false is set)
  expire_taxa(taxon_ID_array)# expire specified array of taxon ID and unique ancestors (unless :expire_ancestors=>false is set)
  clear_all_caches # expire all fragment caches (everything!)

For testing purposes, to install memcached on a Mac see
http://readystate4.com/2008/08/19/installing-memcached-on-os-x-1054-leopard/ If you have a local memcached
server installed (Mac/Linux), start it with: "memcached -d -m 24 -p 11211" and stop it with "killall memcached"

== ASSET PACKAGER (CSS and JS)

This is now using the asset_packager plugin, see details at http://synthesis.sbecker.net/pages/asset_packager

If you add a javascript include files and you want them included in the page, you must edit the
"config/asset_packager.yml" file and place them in the order you wish them to be loaded.  When running in
development mode, the JS and CSS are included separately each time. When running in production mode, assets are
included from packaged entities.  A rake task is used to combine and minify CSS and JS  referenced.  Note that
the order the JS files are listed in the config file is the order they are merged together and this order can
matter.

You must run this rake task each time the JS/CSS is updated to ensure the latest version is present when
running in production mode.  The minification process is very sensitive to missing semicolons, extra commas and
what not that are dealt with by modern browsers (not IE though...).  You have been warned - minification can
and will break your JS if you are not careful (watch those semicolons)!  

To update/create the combined versions:

  rake asset:packager:build_all

In production, this rake command is run as part of the capistrano deploy script.

For testing purposes, you can force the minified/combined version to be referenced in your pages for a
particular environment by adding the following line to your "config/environments/development.rb" or
"config/environment.rb" file:

  Synthesis::AssetPackage.merge_environments = ["development", "production"]

== TAXONCONCEPT ATTRIBUTION NOTES

To get attribution for a given taxon concept ID:

1. Get TaxonConcept
   e.g.
     t=TaxonConcept.find(101)
2. Look at hierarchy_entry for that taxon (could be many) 
   e.g.
     he_all=t.hierarchy_entries  OR  he=t.entry (for the default)
3. Look at the associated hierarchy (could be one of many if you get them all)
   e.g.
      h=he_all[0].hierarchy #   OR  h=he.hierarcy
      h.label
      h.agent.full_name
      h.agent.hompage
      h.agent.logo_cache_url
4. Look at the associated agents for the hierarchy_entry e.g.

     agents=he[0].agents  # OR  agents=he.agents
     agents.each {|agent| puts agent.full_name + " " + agent.homepage + " " + agent.logo_cache_url}

== CREATING A GOOGLE SITEMAP

To create Google SiteMap files in the correct format, run the following rake task for your requested environment:

  rake sitemap:create RAILS_ENV=production[,BASEURL="http://www.eol.org/pages/",BASEURL_SITEMAP="http://www.eol.org/sitemaps/",MAXPERFILE="50000",OUTPUTPREFIX="eol_sitemap",PRIORITY="1",CHANGEFREQ="monthly",LASTMOD="2009-03-01"]

All of the parameters in brackets are optional and have the default shown (except for 'lastmod' which defaults
to today).

The URLS placed into the site map file are based on 'BASEURL/XXX' where XXX is a valid published, trusted taxon
concept ID pulled from the specified environment.

== DOCUMENTATION

We're not keeping the docs in the svn repo, so you'll have to generate them on your own.  We recommend
installing the allison gem, after which you can run this command:

  template=`allison --path` rake doc:reapp

...Just remove the template bit, if you don't care to have the pretty formatting.

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。