とりあえず adaptive permutation(ダメっぽいものはさっさと見切りをつけてしまう)が速くて良いなぁ。

  • PEDファイルフォーマト

Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype (-9 missing; 0 missing; 1 unaffected; 2 affected)


  • MAPファイルフォーマット

chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Genetic distance (morgans)
Base-pair position (bp units)


plink --file mydata --map3


In this case, the three columns are expected to be
chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Base-pair position (bp units)

2008/11/01の時点で X/Y/MTはそれぞれ23/24/25と結果には出力されるようになったようだ (GWS6に対応のためかな)。
PED ファイルから適当に生成するスクリプトでとりあえずラン出来るようにしちゃった。

  • (使ってないから分からない)covariate ファイル
logistic とかも出来るようだけど、covariate はどう与えるのか?試したくなったら調べる。

  • (使ってないけど便利そう)Remove a subset of individuals

plink --file data --remove mylist.txt
  • (便利)HWE のまとめ算出

plink --file data --hardy
  • (便利)MAF とかのまとめ算出

plink --file data --freq
  • (便利)サンプル当たりの call rate でフィルター

plink --file mydata --mind 0.1

ちなみにデフォルトは「Less than 0.10 Missing rate per individual --mind」だそうな。

ちなみにデフォルトは「Less than 0.10 Missing rate per SNP --geno」だそうな。

  • (便利)MAF でフィルター

plink --file mydata --maf 0.05

ちなみにデフォルトは「Greater than 0.01 Minor allele frequency --maf」だそうな。

  • (便利)SNP 当たりの call rate でフィルター

plink --file mydata --geno 1
  • (便利)HWE でフィルター

plink --file mydata --hwe 0.001
  • (便利)特定の SNP をフィルターする

plink --file plink_test_input03 --assoc --noweb --map3 --exclude chr.X.list.txt > removed.hoge

chr.X.list.txt には除きたい SNP の identifier がリストされている。

plink --file mydata --assoc

which generates a file


which contains the fields:

CHR Chromosome
BP Physical position (base-pair)
A1 Minor allele name (based on whole sample)
F_A Frequency of this allele in cases
F_U Frequency of this allele in controls
A2 Major allele name
CHISQ Basic allelic test chi-square (1df)
P Asymptotic p-value for this test
OR Estimated odds ratio (for A1)

  • Alternate / full model association tests

Cochran-Armitage trend test
Genotypic (2 df) test
Dominant gene action (1df) test
Recessive gene action (1df) test

plink --file mydata --model
  • Linear and logistic models

plink --bfile mydaya --logistic

depending on the phenotype/command used. The basic format is:

CHR Chromosome
SNP SNP identifier
TEST Code for the test (see below)
NMISS Number of non-missing individuals included in analysis
BETA Regression coefficient (--linear) or odds ratio (--logistic)
STAT Coefficient t-statistic
P Asymptotic p-value for t-statistic

The TEST column is by default ADD meaning the additive effects of allele dosage. Adding the option


will generate file which will have two extra tests per SNP, corresponding to two extra rows: DOMDEV and GENO_2DF which represent a separate test of the dominance component or a 2 df joint test of both additive and dominance (i.e. corresponding the the general, genotypic model in the --model command). Unlike the dominance model is the --model, DOMDEV refers to a variable coded 0,1,0 for the three genotypes AA,Aa,aa, i.e. representing the dominance deviation from additivity, rather specifying that a particular allele is dominant or recessive. That is, the DOMDEV term is fitted jointly with the ADD term in a single model.

NOTE! The coding PLINK uses with the 2 df --genotypic model involves two variables representing an additive effect and a dominance deviation;

AA 0 0
AB 1 1
BB 2 0

Although the 2df test will be identical, you would not expect to see similar p-values, etc for the two individual terms if instead you used a different version of "genotypic" coding, e.g. in another analysis package, such as using dummy variables to represent genotypes:

G1 G2
AA 0 0
AB 1 0
BB 0 1

That is, although fundamentally the same, in terms of the 2df test, the interpretation of the two individual terms is different in these two cases. To achieve this coding in PLINK (v1.02 onwards), add the --hethom flag as well as --genotypic.

In a related note, you would not always expect the ADD p-value to be the same when entering in the dominance term as it is without it; if in doubt, you are advised to stick to just interpreting the 2 df test if using the --genotypic option.

To specify a model assuming full dominance (or recessive) for the minor allele (i.e. rather than the 2 df model mentioned above), you can specify with either




後ろの方に --covar の例がある

  • Adjustment for multiple testing: Bonferroni, Sidak, FDR, etc

plink --file mydata --assoc --adjust
  • Permutation いろいろ

  • SNP annotation を取ってくる

plink --lookup rs6622
    • SNP simulation routine

  • テストデータの作成
plink --simulate wgas.sim --recode --out sim1 --simulate-ncases 200 --simulate-ncontrols 250 --noweb
plink --file sim1.recode --assoc --noweb --adjust