6 April 2007 - New UCSC Gene Prediction Set Released
We are pleased to announce the release of a new gene prediction set, UCSC Genes, on the latest human Genome Browser (hg18, NCBI Build 36). This annotation, which includes putative noncoding genes as well as protein-coding genes and 99.9% of RefSeq genes, is the next generation of the Known Genes set that UCSC has been providing for the several years and supersedes the existing Known Genes annotation on the hg18 assembly.
The UCSC Genes is a moderately conservative prediction set based on data from RefSeq, GenBank, and UniProt. Each entry requires the support of one GenBank RNA sequence plus at least one additional line of evidence, with the exception of RefSeq RNAs, which require no additional evidence. Some of the noncoding transcripts in the set may actually code for protein, but the evidence for the associated protein is weak at best. Compared to RefSeq, this gene set generally has about 10% more protein-coding genes, approximately five times as many putative noncoding genes, and about twice as many splice variants.
The UCSC Genes set is produced using a computational pipeline developed at UCSC by Jim Kent, Chuck Sugnet and Mark Diekhans. For detailed information about the process used to construct the genes set, see the track description page. In upcoming months, we plan to release UCSC Genes sets on several organisms in addition to human.
As part of this change, we are now using our own UCSC Genes accession numbers as the primary key into the underlying knownGene table, rather than the GenBank mRNA accessions we used in the previous Known Genes prediction set. Note that this may affect external sites with URLs that link into our genes track using the older-style accessions.
We will continue to provide the older Known Genes track on hg18 under the name "Old Known Genes". You may find the following tables useful in referencing the older gene set and converting between the two sets:
* knownGeneOld2: new name for table underlying the old Known Genes (previously called knownGene)
* kgXrefOld2: new name for table that contains data for converting old Known Genes IDs to other IDs (previously called kgXref)
* kg2ToKg3: data for converting old Known Genes IDs to the newer UCSC Genes IDs
We'd like to acknowledge the many people affiliated with the UCSC Genome Bioinformatics group who worked hard to release this new annotation: developers Jim Kent, Mark Diekhans, and Fan Hsu (with technical support from several other engineers in the group); David Haussler; our splendid QA team -- Archana Thakkapallayil, Ann Zweig, Robert Kuhn, Kayla Smith, and Brooke Rhead; our build engineer -- Andy Pohl; and our sysadmin group. We'd also like to thank Chuck Sugnet for his input, the people and organizations maintaining the RefSeq, UniProt, and GenBank databases, and the scientists worldwide who have contributed to them. If you have any questions about this new release, feel free to contact us at firstname.lastname@example.org (general questions) or email@example.com (mirror-specific questions).