synaptome
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Toolkit for Synaptome Visualization, Smith Lab Autumn Rotation
Toolkit for Synaptome Visualization
===================================

Smith Lab @ Stanford University  


Initial Author: Dan O'Shea [ dan {at} djoshea.com ]  
Source URL: 


Overview
--------

This code is a collection of related utilities for operating on array 
tomography images of synaptic protein expression in synaptogram format.
A synaptogram is a 3d image stack with multiple named immunohistochemistry
fluorescence channels. Typically these images are displayed in grids where
each channel is a row, and each column represents a serial section, i.e.
one plane lying orthogonal to the z-direction. This code provides means to:

* Load in extracted synaptogram data cubes
* Filter and perform morphological computations on this data
* Calculate features and utilize machine learning to classify synapses
* Visualize feature vectors among different classes in an interactive scatter
  plot
* Display imaging channels and filtered, transformed, transposed, or annotated
  channels in the synaptogram format

The general purpose of this software is to ease the process of working with
data in synaptogram format. It facilitates filtering or manipulating this data
so as to remove noise and locate or quantify morphological structure and using
the results of this filtering and morphological image processing to compute
useful features. These features are easily plugged into supervised learning
algorithms that predict synapse "class" or "type" from their features, which
are ultimately derived from their protein expression characteristics. These
synapses may then be plotted on a scatter plot according to 2 feature
dimensions (although extending this to 3 should be trivial). Each synapse is
a dot colored according to its human provided training label, and optionally
marked with an X if it was misclassified by the last used supervised learning
classifier. Clicking on a synapse dot brings up the synaptogram representation
of that synapse, which is a completely customizable, color annotated version
of the original imaging channels and of filtered channels computed from the
imaging data, from which the feature vectors are ultimately extracted. This
workflow allows for new feature extraction schemes to be implemented and
evaluated quickly and for error cases and outliers to be studied in detail.
This process also allows for dataset exploration, in which filtered and
annotated channels highlight the structure of the data and provide more
helpful information than raw synaptograms.

Setting Up the Path and the 'ds' Structure
------------------------------------------

Most of the code lies in the root synaptome directory, whereas code specific
to a particular dataset, such as a particular loading script or filtering
script, is generally located in a directory used only for that dataset. For
this reason, you should *add the synaptome directory with subdirectories (or
at least its filters subdirectory) to your matlab path* and then operate out 
of the local dataset directory. 

Synaptome makes heavy use of Matlab structures to store and organize data. In
particular, a variety of information related to a given dataset is stored
within a structure referred to as ds within the code. Most functions take 
ds as their first argument, and will often write data into the ds
structure and pass along the augmented ds structure as the return value, allowing 
these functions to be called as:

    ds = somefunction(ds, otherargs, ...);

Some key fields of the ds structure, organized by category, are described
below.

Example Usage
-------------

Add the synaptome directory and all subdirectories to the Matlab path.

Load in a preloaded dataset as a .mat file.

	cd /path/to/synaptome/kdm200
	load trainds.mat

Run the example filtering and feature extraction script.

	features_kdm200watershed

Click on a few points in the scatter plots to view the annotated synaptograms.

Run a simple supervised learning algorithm to evaluate the utility of the
features computed for classification.

	testft

Imaging Data
------------

Within imaging data arrays, the dimensions are ordered as [Z Y X], which
makes, image view orientation of pixels identical to the standard matrix
orientation (i.e. Z addresses pages, Y addresses rows, X address columnds)

### Fields

dsname : name of dataset or imaging session

chlist : names of original imaging channels present in the dataset

nimch : number of original imaging channels

sgdim : [Z Y X] dimensions of synaptogram

sgaspect: [Z Y X] aspect ratio of each image voxel

sg : struct with individual synapse structures beneath

sg.im : image channel data (4d: nimch x sgdim)

sg.str : descriptive string for this synapse (coordinatess, label, etc.)

sg.im : image channel data (4d: nimch x sgdim)

sg.str : descriptive string for this synapse (coordinatess, label, etc.)

### Functions

#### chdat = getimchannel(ds, chname)

Returns data from a specific imaging channel by name for all synapses
chname is the name of the original imaging channel to search in chlist.

Filtered and Color Annotated Channels
-------------------------------------

Filtered imaging data resides in ds.trd (as in "training data") instead of
ds.sg(:).im. This is grayscale data of the same size (ds.sgdim) and aspect ratio 
(ds.sgaspect) as the original imaging channels. Typically, the original imaging
data are copied verbatim to the filtered channel array (ds.trd). Filtered
channels are used for both feature computation and for visualization.

Color annotation channels are essentially red green blue versions of filtered
channels. They are used solely for visualization as rows in the synaptograms.
For example, once distinct puncta of a certain protein are isolated in a
specific channel, they can be colored so as to distinguish each puncta from
all others according to a colormap. The result of this coloring would be saved
as a color annotation channel and referred to by name when building the
synaptogram.

Both types of channels are typically generated by using the filtdat function
and added to ds using the addchannel or addcolorchannel function, which are
described in detail below. There are a variety of filters within filtdat.m. To
get a sense of how to effectively utilize and chain filters in filtdat.m, see
one of the sample features_kdm200.m or features_gw300.m scripts 
along with the detailed documentation within filtdat.m adjacent to each filter.

### Fields

trd : number of data channels x ntrain x sgdim (5d array)

trdlist : names of each data channel (same as first dimension of trd)

colorch : number of color channels x ntrain x sgdim x 3 (6d array last is RGB)

colorchname : names of color channels (same as first dimension of colorch)

### Functions

#### ds = addchannel(ds, dat, dname, normalize)

Adds a filtered image channel to ds.trd with name dname.

dat : ntrain x sgdim data to store as channel

dname : name string for new channel

normalize : boolean, indicates whether all data should be as a whole linearly shifted to fit the [0, 1] range. defaults to 1.

#### ds = addcolorchannel(ds, dat, dname)

Adds a color channel to ds.colorch with name dname. 

dat : ntrain x sgdim x 3 RGB color values, all values in range [0, 1].

#### [filt info] = filtdat(ds, dat, params)
	
Performs some kind of filtering or computational operation on some
imaging data and returns the results and optionally some metadata.
filtdat merely returns the filtered output and metadata, it is not 
designed to make any changes directly to ds. This enables filters to be
chained easily before adding the output to the trd list via addchannel.
The filtered output may be easily added back to ds.trd via:

        filtparams = struct('type', 'typename', 'option1', value1);
        ds = addchannel(ds, filtdat(ds, 'InputChannelName', filtparams), ...
                'OutputChannelName');

Individual filters are typically implemented as typename_filt.m files in
the filters directory (but may be placed anywhere on the matlab path).
They may also be implemented within the filtdat.m where indicated.
	
See the comments for each individual filter function within filtdat.m for
more specific documentation on usage.

dat : the primary input to the operation, specified either directly
as an array ntrain x sgdim or by name (to be retrieved using getchannel

params : a struct specifying the details of the operation to perform
params.type specifies which operation to perform and must be 
specified. Other values under params depend on the operation being
performed. These values can include parameters used for all synapses
globally, arrays of parameters used for individual synapses, or even
entire additional imaging data arrays.

filt : the primary output of the operation as an array ntrain x sgdim

info : a structure containing metadata returned by some operations.
These fields may contain arrays of values generated for each synapse
individually or secondary output channel arrays.

#### ds = getchannel(ds, dname, syn)

Retrieves data for a particular channel for all or a particular
synapse. Searches trdlist first (filtered channels), then original
imaging channels (using getimchannel), and then colorchannels. Error
if channel still not found.

dname : name of the channel for which to search.

syn : optional id number in the interval [1 ds.ntrain]. If used,
the returned data will be for the specified synapse only, instead of
an array for all synapses.

Visualization
-------------

Individual synapses may be visualized as a synaptogram, a 2D grid format where
each row is a channel and each column is a serial section. The rows may
represent either filtered channels (typically image channels are copied over
into filtered channels, essentially transferring them from ds.sg.im to ds.trd
for convenience) or color annotated channels. 

The rows used to build a synaptogram for any of the synapses are specified in
the ds.vis cell array, where each element is a channel name or an array of up
to 3 channel names which will be merged as RGB to make a color composite row on
the fly. Channels which are already color annotated may only be used by
themselves and not included in an RGB composite row. Typically visualizations 
are constructed by clearing the ds.vis and ds.visname lists and using addvisrow
to add each row. For example, to view Synapsin in row 1 as grayscale, VGlut1 as 
red and VGlut2 as blue in row 2, and a color annotation channel Synapsin_puncta 
in row 3, use:

    ds.vis = {};
    ds.visname = {};
    ds = addvisrow(ds, 'Synapsin');
    ds = addvisrow(ds, {'VGlut1', 'VGlut2'});
    ds = addvisrow(ds, 'Synapsin_puncta');

### Fields

vis : array of channel names (or arrays of 3 channel names for RGB composites)
used for each row of the synaptogram visualization

visname : array of names printed next to each row of the synaptogram.
Typically these are automatically filled out when addvisrow() is used.

### Functions

#### ds = addvisrow(ds, visdat, name)

Used to add a row to the ds.vis list.
	
visdat : either the name of a channel (filtered or color annotation)
or an array of 3 channel names. If one of the colors is not
specified (i.e. array of length 2) or is given by '', the color will
be set to 0 in the compositing.

name : an optional string used to override the default name printed
next to each row on the synaptogram. The default is the original
name or each of the channel names joined by ' / ' in between.

#### syntable(ds, i)

Displays a data table containing the value of each feature for a
particular synapse, typically evoked within viewsg to be displayed
alongside the synaptogram.

i : synapse index in the interval [1 ds.ntrain]

#### viewsg(ds, i, fignum)

Displays the synaptogram for synapse i in figure(fignum). This is the
method called from a synplot's on click callback function to bringup
the synaptogram for the clicked point. Uses ds.vis and ds.visname to
build up the synaptogram image row by row using viewsgrow and then to
display the full synaptogram on a dark background with text labels to
either side.

i : synapse index in the interval [1 ds.ntrain]

fignum : number of figure to display in. default is gcf.

#### rowdat = viewsgrow(dat, bordercol, vspacing, hspacing)
	
dat : either a 3d intensity array in [z y x] coords, or a
struct where dat.R, dat.G, dat.B are 3d intensity arrays in [z y x]
coords (for RGB color)

bordercol : 3x1 RGB value used to fill the margins around each XY tile

vspacing : padding above and below each tile

hspacing : padding left of each tile and to the right of the last tile

Supervised Learning
-------------------

The assignment of synapses into classes is complicated because of variability
in whether synapses are allowed to belong to multiple classes. The distinction
is delineated in the code as "classes" versus "labels". A neuron can be
assigned to multiple classes, e.g. glut1 and glut2, but it belongs to one and
only one label, e.g. "glutBoth". Within the kdm200/loadds.m sample loading
script, there are 3 classes and 5 labels, and a special labelmatch field
contains a 3-dim vector for each of the labels indicating which combination of
classes each label represents. This approach is confusing, but resulted from
the different ways in which human voting was performed. The classes list
represents human names for different known types of synapses, and some
synapses ambiguously seemed to express markers indicative of multiple classes.
To make the supervised learning problem straightforward, labels were
introduced such that each neuron belonged to exactly 1 label. However, this
system must be taken into account when switching from an n-class machine
learning problem to a 2 class problem such as glut1 vs. NOT glut1, which must
properly assign glut1 membership to neurons marked as glutBoth.

See testft.m for an example of training a simple classifier on the training
set and evaluating it's training and cross-validation error.

### Fields

classes : names for each class

labelnames : names for each label

labelcolors : RGB values used to represent each label

ntrain : number of synapses in training set

votes : full voting table by classes (synapses x classes x voters)

nvotes : number of voters

votesbylabel : voting table by labels (synapses x labels x voters)

trainlabel : actual label id assigned to each synapse (indexes labelnames)

trainlabelrunnerup : label id with the second most votes

trainlabelconf : confidence measure determined by (votes to
winner - votes to runner up) / nvotes

trainactive : boolean indicator of whether to include this synapse in
supervised learning training and testing, typically this is a thresholded
version of trainlabelconf to take only examples upon which humans agree

### Functions

#### testft

Sample script for performing cross-validation error tests using a simple linear classifier.


Feature Computation
-------------------

For each synapse in the training set, a feature vector may be computed for use
in supervised learning. Each feature is a scalar value typically computed
based on that channels filtered image data. One example is integrated
brightness, which is simply the sum of all voxel values in a particular
channel for that synapse. Each feature has a name, which is used throughout
the code to refer to and retrieve the value of that feature for particular
synapses.

### Fields

ft : ntrain x number of features, rowwise matrix of feature vectors

ftname : list of feature names

### Functions

#### ds = addfeature(ds, ftdat, name)

Adds a new feature columnn to ds

ftdat : array of scalar values for the feature column for each
of the ntrain synapses

name : unique string used to refer to that feature

#### [ftmat idx] = getfeature(ds, names)

Retrieves a vector or matrix of values for each synapse for one or
more features by name.

names : either a string or an array of strings

ftmat : either a vector or matrix of feature values for all synapses

idx : column index into the feature matrix ds.ft at which each
named feature was found

Feature Scatter Plots
---------------------

Once features vectors have been computed for all training synapses, the
training set may be visualized as a 2D scatter plot where each synapse is
represented as a dot colored according to its human provided trainlabel. The
two axes are specified by named features within ds.ft. 3D plots are
not implemented but should be trivial if the need arises. Optionally, synapses
which were misclassified by the most recently trained supervised learning
classifier may be marked with an X. Clicking on a synapse dot in this plot
will load a synaptogram visualization for that particular synapse.

### Functions

#### synplot(ds, features, errors, twoclass)

Displays the interactive scatter plot of feature values
features is an array of strings representing feature names to be used
for the X and Y axes. 
	
errors : binary vector indicating which synapses to mark with an X, 
defaulting to zeros(ds.ntrain,1)

twoclass : array of label names used when the scatter plot should 
color synapses according to only 2 categories, those whose label is 
in twoclass (as red) and all others (as black)


本源码包内暂不包含可直接显示的源代码文件,请下载源码包。