plink自分用メモ
ちょっと使って見ているけど、機能多すぎ。なので自分用メモ。本家のドキュメントを引用しつつ自分に優しくいじっているので要注意。
とりあえず adaptive permutation(ダメっぽいものはさっさと見切りをつけてしまう)が速くて良いなぁ。
ツールとしての概要が知りたい方は、id:ryamada22氏のコンテンツがあるのでそちらを参照して下さい。
http://pngu.mgh.harvard.edu/~purcell/plink/
- コマンドライン引数
http://pngu.mgh.harvard.edu/~purcell/plink/reference.shtml#options
- PEDファイルフォーマト
http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped
Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype (-9 missing; 0 missing; 1 unaffected; 2 affected)
#で始めるとコメント扱いできるらしい。
- MAPファイルフォーマット
http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#map
chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Genetic distance (morgans)
Base-pair position (bp units)
コマンドライン引数に--map3をつけるとMAPファイルはカラム3つとして認識されることになる。
plink --file mydata --map3
この場合は、以下のカラムと受け取られるので注意。
In this case, the three columns are expected to be
chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Base-pair position (bp units)
2008/11/01の時点で X/Y/MTはそれぞれ23/24/25と結果には出力されるようになったようだ (GWS6に対応のためかな)。
PED ファイルから適当に生成するスクリプトでとりあえずラン出来るようにしちゃった。
- (使ってないから分からない)covariate ファイル
http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#covar
logistic とかも出来るようだけど、covariate はどう与えるのか?試したくなったら調べる。
- (使ってないけど便利そう)Remove a subset of individuals
http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#remove
plink --file data --remove mylist.txt
- (便利)HWE のまとめ算出
http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#hardy
plink --file data --hardy
- (便利)MAF とかのまとめ算出
http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#freq
plink --file data --freq
- (便利)サンプル当たりの call rate でフィルター
http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#miss2
plink --file mydata --mind 0.1
ちなみにデフォルトは「Less than 0.10 Missing rate per individual --mind」だそうな。
ちなみにデフォルトは「Less than 0.10 Missing rate per SNP --geno」だそうな。
- (便利)MAF でフィルター
http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#maf
plink --file mydata --maf 0.05
ちなみにデフォルトは「Greater than 0.01 Minor allele frequency --maf」だそうな。
- (便利)SNP 当たりの call rate でフィルター
http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#miss1
plink --file mydata --geno 1
- (便利)HWE でフィルター
http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#hwd
plink --file mydata --hwe 0.001
- (便利)特定の SNP をフィルターする
plink --file plink_test_input03 --assoc --noweb --map3 --exclude chr.X.list.txt > removed.hoge
chr.X.list.txt には除きたい SNP の identifier がリストされている。
genotypeallele count と phenotype のクロステーブルでカイ二乗検定
http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#cc
plink --file mydata --assoc
which generates a file
plink.assoc
which contains the fields:
CHR Chromosome
SNP SNP ID
BP Physical position (base-pair)
A1 Minor allele name (based on whole sample)
F_A Frequency of this allele in cases
F_U Frequency of this allele in controls
A2 Major allele name
CHISQ Basic allelic test chi-square (1df)
P Asymptotic p-value for this test
OR Estimated odds ratio (for A1)
- Alternate / full model association tests
Cochran-Armitage trend test
Genotypic (2 df) test
Dominant gene action (1df) test
Recessive gene action (1df) test
http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#model
plink --file mydata --model
- Linear and logistic models
http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#glm
plink --bfile mydaya --logistic
depending on the phenotype/command used. The basic format is:
CHR Chromosome
SNP SNP identifier
TEST Code for the test (see below)
NMISS Number of non-missing individuals included in analysis
BETA Regression coefficient (--linear) or odds ratio (--logistic)
STAT Coefficient t-statistic
P Asymptotic p-value for t-statistic
The TEST column is by default ADD meaning the additive effects of allele dosage. Adding the option
--genotypic
will generate file which will have two extra tests per SNP, corresponding to two extra rows: DOMDEV and GENO_2DF which represent a separate test of the dominance component or a 2 df joint test of both additive and dominance (i.e. corresponding the the general, genotypic model in the --model command). Unlike the dominance model is the --model, DOMDEV refers to a variable coded 0,1,0 for the three genotypes AA,Aa,aa, i.e. representing the dominance deviation from additivity, rather specifying that a particular allele is dominant or recessive. That is, the DOMDEV term is fitted jointly with the ADD term in a single model.
NOTE! The coding PLINK uses with the 2 df --genotypic model involves two variables representing an additive effect and a dominance deviation;
A D
AA 0 0
AB 1 1
BB 2 0Although the 2df test will be identical, you would not expect to see similar p-values, etc for the two individual terms if instead you used a different version of "genotypic" coding, e.g. in another analysis package, such as using dummy variables to represent genotypes:
G1 G2
AA 0 0
AB 1 0
BB 0 1That is, although fundamentally the same, in terms of the 2df test, the interpretation of the two individual terms is different in these two cases. To achieve this coding in PLINK (v1.02 onwards), add the --hethom flag as well as --genotypic.
In a related note, you would not always expect the ADD p-value to be the same when entering in the dominance term as it is without it; if in doubt, you are advised to stick to just interpreting the 2 df test if using the --genotypic option.
To specify a model assuming full dominance (or recessive) for the minor allele (i.e. rather than the 2 df model mentioned above), you can specify with either
--dominant
or
--recessive
後ろの方に --covar の例がある
- Adjustment for multiple testing: Bonferroni, Sidak, FDR, etc
http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#adjust
plink --file mydata --assoc --adjust
- Permutation いろいろ
http://pngu.mgh.harvard.edu/~purcell/plink/perm.shtml
- SNP annotation を取ってくる
http://pngu.mgh.harvard.edu/~purcell/plink/psnp.shtml
plink --lookup rs6622
-
- SNP simulation routine
http://pngu.mgh.harvard.edu/~purcell/plink/simulate.shtml
http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml
- テストデータの作成
plink --simulate wgas.sim --recode --out sim1 --simulate-ncases 200 --simulate-ncontrols 250 --noweb plink --file sim1.recode --assoc --noweb --adjust