Oligo/BAC CGH Array 解析パッケージをまとめてみる

書きかけです‥下書きで保存すると結局公開しないことが多発するので。
また来週に改訂するつもり。

目的・動機

今回は普段自分が使っていないプラットフォームの CGH マイクロアレイを解析 & 自分の実験データと比較するために、他社のプラットフォームを解析するためのツール (パッケージ) をここに集めておく *1
対象とするのは BAC とアジレント社製の CGH アレイ。# だったけど Affy と共用のものもある

背景

今やっている研究では Affymetrix 社製の GenomeWideSNP 6.0 array というアレイを用いて染色体の構造異常を解析している。
また最近になって各種の精神疾患で多くの染色体構造異常が報告されまくっている*2。そしてこれらデータは論文で発表されるとともに NCBI GEO でも利用可能になっている (ものもある)。

ここからパッケージたち

cghMCR (R/BioC package)

DNAcopy, marray, arrayQuality に依存

Find chromosome regions showing common gains/losses.
Based on the algothrim proposed by Dr. Lynda Chin's lab, this package provides functions that identify chromosome regions that show gains/losses commonly observed across different samples profiled using arrayCGH platform.

http://www.bioconductor.org/packages/bioc/html/cghMCR.html

CGHcall (R/BioC package)

CGHbase, impute, DNAcopy に依存
Tumor profile 用みたいなので染色体構造異常には向いてないかも (パラメーター的な意味で)

Calling aberrations for array CGH tumor profiles.
Calls aberrations for array CGH data using a six state mixture model as well as several biological concepts that are ignored by existing algorithms. Visualization of profiles is also provided.

http://www.bioconductor.org/packages/bioc/html/CGHcall.html

DNAcopy (R/BioC package)

DNA copy number data analysis
Segments DNA copy number data using circular binary segmentation to detect regions with abnormal copy number

サンプルデータを見る

> library(DNAcopy) 
> data(coriell) 
> is(coriell)
[1] "data.frame" "list"       "oldClass"   "vector"    
> head(coriell)
        Clone Chromosome Position Coriell.05296 Coriell.13330
1  GS1-232B23          1        0            NA      0.207470
2  RP11-82d16          1      468      0.008824      0.063076
3  RP11-62m23          1     2241     -0.000890      0.123881
4  RP11-60j11          1     4504      0.075875      0.154343
5 RP11-111O05          1     5440      0.017303     -0.043890
6  RP11-51b04          1     7000     -0.006770      0.094144
> 

Coriell.05296、Coriell.13330はサンプル名で、data.frame の形にしておく必要があるようだ。
http://www.bioconductor.org/packages/bioc/html/DNAcopy.html

aCGH (R/BioC package)

cluster, survival, multtest, sma に依存

Classes and functions for Array Comparative Genomic Hybridization data.
Functions for reading aCGH data from image analysis output files and clone information files, creation of aCGH S3 objects for storing these data. Basic methods for accessing/replacing, subsetting, printing and plotting aCGH objects.

http://www.bioconductor.org/packages/bioc/html/aCGH.html

snapCGH (R/BioC package)

limma, tilingArray, DNAcopy, GLAD, cluster, methods, aCGH に依存

Segmentation, normalisation and processing of aCGH data.
Methods for segmenting, normalising and processing aCGH data; including plotting functions for visualising raw and segmented data for individual and multiple arrays.

BioC の ML ではこれが勧められていた (n=1)。その他の R/BioC パッケージを統合したもの。
read.maimages() でスポットの蛍光量を記したファイルから読み込めるのも良い感じだ。

> datadir <- system.file("testdata", package = "snapCGH") 
> targets <- readTargets("targets.txt", path = datadir) 
> RG1 <- read.maimages(targets$FileName, path = datadir, source = "genepix") 

http://www.bioconductor.org/packages/bioc/html/snapCGH.html

SMAP (R/BioC package)

A Segmental Maximum A Posteriori Approach to Array-CGH Copy Number Profiling

Functions and classes for DNA copy number profiling of array-CGH data

http://www.bioconductor.org/packages/bioc/html/SMAP.html

MANOR (R/BioC package)

CGH Micro-Array NORmalization

We propose importation, normalization, visualization, and quality control functions to correct identified sources of variability in array-CGH experiments.

GLAD (R/BioC package)

Gain and Loss Analysis of DNA

Analysis of array CGH data : detection of breakpoints in genomic profiles and assignment of a status (gain, normal or lost) to each chromosomal regions identified.

http://www.bioconductor.org/packages/bioc/html/GLAD.html

ADaCGH (R/BioC package)

Analysis of data from aCGH experiments

Analysis and plotting of array CGH data. Allows usage of Circular Binary Segementation, wavelet-based smoothing, ACE method (CGH Explorer), HMM, BioHMM, GLAD, CGHseg, and Price's modification of Smith & Waterman's algorith. Most computations are parallelized. Figures are imagemaps with links to IDClight (http://idclight.bioinfo.cnio.es).

http://cran.r-project.org/web/packages/ADaCGH/index.html
WebApp version
http://adacgh2.bioinfo.cnio.es/

cgh (R)

Microarray CGH analysis using the Smith-Waterman algorithm
Functions to analyze microarray comparative genome hybridization data using the Smith-Waterman algorithm

CNVFinder (Perl)

The CNVFinder algorithm has been designed to detect copy number variants (CNVs) in human population from large-insert clone DNA microarray covering the entire human genome in tiling path resolution (WGTP platform).

http://www.sanger.ac.uk/humgen/cnv/software/

CAPWeb

local install できなさげ。

CAPweb is a web tool devoted to the analysis of copy number microarray data. CAPweb starts from image analysis results (a gpr file for example) and goes up to biological results. Different formats are supported by CAPweb:
MAIA - GENEPIX - SPOT - IMAGENE - AGILENT - Affymetrix SNP 100k/500k.

http://bioinfo-out.curie.fr/CAPweb/

CGHScan (Java Swing)

CGHScan analyzes comparative genomic hybridization data to delineate the boundaries of deleted or divergent regions in a particular genome as compared to a reference genome. For more information on how the program works, consult the documentation.

http://gel.ahabs.wisc.edu/cghscan/

CGH-Plotter (MATLAB)

MATLAB Toolbox for CGH-data Analysis

The CGH-Plotter is a MATLAB toolbox with a graphical user interface for comparative genomic hybridization (CGH) data analysis. The CGH-Plotter identifies putative groups of genes whose copy-number is deleted or amplified using k-means clustering and dynamic programming. The CGH-Plotter allows also representative illustrations of CGH-data. The CGH-Plotter is platform independent and requires MATLAB 6.1 or higher in order to operate.

http://bioinformatics.oxfordjournals.org/cgi/screenpdf/19/13/1714

CGHweb (R/BioC package)

waveslim, quantreg, snapCGH, cghFLasso, FASeg, GLAD, GDD, gplots あたりが関連パッケージ
http://compbio.med.harvard.edu/CGHweb/

CNIT

Affy GeneChip

What is CNIT program?
Copy number inferring tool (CNIT) is designed for Affymetrix GeneChip to analyze copy number of each SNP allele. CNIT can be applicable in chromosome-abnormal disease, cancer and copy number variation studies, and can provide accurate CN estimations with low false-positive rate.

http://140.109.41.16/EAG/Program%20list/CNIT.htm

CNVtools (R/BioC package)

Case-Control での Association test を行うパッケージ。

CNVtools is an R package for performing robust case control and quantitative trait association analyses of Copy Number Variants.
The package implements a robust association framework by unifying genotyping and association testing into a single model. This is done by incorporating a disease model, which is either a logistic regression disease model for a dichotomous disease variable or a standard regression for a quantitative trait, into the mixture model for the signal. Association is assessed via a likelihood ratio test. The procedure is assay/platform independent and can be applied whenever there is a univariate diploid copy number eg SNP genotyping assays (R coordinate), Array-CGH or quantitative PCR.

http://cnv-tools.sourceforge.net/CNVtools.html

PennCNV

Illumina and Affymetrix arrays (ただし file format を工夫することで、その他のアレイも処理できる)

PennCNV is a free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.

http://www.openbioinformatics.org/penncnv/
# Preparing signal intensity files from other types of arrays (Agilent, Nimblegen, Affy, etc)
http://www.openbioinformatics.org/penncnv/penncnv_input.html#_Toc214852007

QuantiSNP

SNP だけに Illumina と Affy

QuantiSNP is an analytical tool for the analysis of copy number variation using whole genome SNP genotyping data. In its first implementation it was developed for data arising from Illumina platforms and this is fully described in Colella and Yau et al., 2007. At present we are in the process of further developing QuantiSNP with a particular interest in adapting the algorithm for cancer sample analysis.

http://www.well.ox.ac.uk/QuantiSNP/

ISACGH

WebApp

Merging DNA copy number and gene expression to the analysis of Array CGH

http://gepas3.bioinfo.cipf.es/cgi-bin/tutoXX?c=/isacgh/isacgh.config

InSilicoArray CGH

WebApp
http://isacgh.bioinfo.cipf.es/

メーカー、サードパーティ製 (有償)

Copy Number Analysis Module (CNAM)

http://goldenhelix.com/SNP_Variation/CNAM/index.html

NimbleScan

DNACopy and segMNT algorithms are available in NimbleScan software to analyze CGH data.

Partek GS for Copy Number Data

Illumina, GeneChip compatible.
http://www.partek.com/partekgs_copynumber

CGH Analytics Software

The Agilent CGH Analytics 3.4 software provides an intuitive user interface for visually exploring, detecting and analyzing aberration patterns from multiple Comparative Genomic Hybridization (CGH) microarray profiles. It accepts data output from Agilent Feature Extraction software and displays chromosomal deletions and amplifications at multiple zoom levels simultaneously. Take advantage of the new joint analysis module to detect changes in both copy number and gene expression from experiments that have aCGH and gene expression data available.

http://www.chem.agilent.com/en-US/products/instruments/dnamicroarrays/cghanalyticssoftware/pages/default.aspx


世の中、親切な人はいるもので‥

同じようなリストを発見‥
http://www.nslij-genetics.org/cnv/programs.html

*1:誰かの役に立つかもしれないからね

*2:これは精神疾患が特に構造異常と結びついているのか (genomic disorder と mental disorder の関係が深いのか)、それともただ単に GWAS し終わったので次は CNV という流れなのかは僕にはまだ良くわからない。またアレイの性質から common SNP しか検出できないが、対象を CNV とすることで rare variant を捕捉することに成功したという見方も出来ると思う。