Title: | Quick Access to Homologene and Gene Annotation Updates |
---|---|
Description: | A wrapper for the homologene database by the National Center for Biotechnology Information ('NCBI'). It allows searching for gene homologs across species. Data in this package can be found at <ftp://ftp.ncbi.nih.gov/pub/HomoloGene/build68/>. The package also includes an updated version of the homologene database where gene identifiers and symbols are replaced with their latest (at the time of submission) version and functions to fetch latest annotation data to keep updated. |
Authors: | Ogan Mancarci [aut, cre], Leon French [ctb] |
Maintainer: | Ogan Mancarci <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.68.23.10.31 |
Built: | 2024-12-26 03:12:02 UTC |
Source: | https://github.com/oganm/homologene |
Given a list of query gene list and a target gene list, the function
tries find the homology pairing that matches the query list to the target list. The query list
is a short list of genes while the target list is supposed to represent a large number of genes from the target
species. The default output will be the largest possible list. If returnAllPossible = TRUE
then
all possible pairings with any matches are returned. It is possible to limit the
search by setting possibleOrigins
and possibleTargets
. Note that gene symbols of some species
are more similar to each other than others. Using this with small gene lists and without providing any
possibleOrigins
or possibleTargets
might return multiple hits, or if returnAllPossible = TRUE
a wrong match can be returned.
autoTranslate( genes, targetGenes, possibleOrigins = NULL, possibleTargets = NULL, returnAllPossible = FALSE, db = homologene::homologeneData )
autoTranslate( genes, targetGenes, possibleOrigins = NULL, possibleTargets = NULL, returnAllPossible = FALSE, db = homologene::homologeneData )
genes |
A list of genes to match the target. Symbols or NCBI ids |
targetGenes |
The target list. This list is supposed to represent a large number of genes from the target species. |
possibleOrigins |
Taxonomic identifiers of possible origin species |
possibleTargets |
Taxonomic identifiers of possible target species |
returnAllPossible |
if TRUE returns all possible pairings with non zero gene matches. If FALSE (default) returns the best match |
db |
Homologene database to use. |
A data frame if returnAllPossibe = FALSE
and a list of data frames if TRUE
Query DIOPT database (https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl) for orthologues.
DIOPT database uses multiple tools to find gene orthologues. Sadly they don't have an
API so this function queries by visiting the site and filling up the form. By default
each query will take a minimum of 10 seconds due to delay
parameter. This
is taken from their robots.txt at the time this function is written.
Note that DIOPT is not necesariy in sync with homologene database as provided in this package.
diopt(genes, inTax, outTax, delay = 10)
diopt(genes, inTax, outTax, delay = 10)
genes |
A vector of gene identifiers. Anything that DIOPT accepts |
inTax |
taxid of the species that the input genes are coming from |
outTax |
taxid of the species that you are seeking homology. 0 to query all species. |
delay |
How many seconds of delay should be between queries. Default is 10 based on the robots.txt at the time this function is written. |
DIOPT does not support all species available in the homologene database. The supported species are:
Schizosaccharomyces pombe
Saccharomyces cerevisiae
Caenorhabditis elegans
Drosophila melanogaster
Danio rerio
Xenopus (Silurana) tropicalis
Homo sapiens
Mus musculus
Rattus norvegicus
Arabidopsis thaliana
A data frame
Downloads and reads the gene history file from NCBI website. This file is needed for other functions
getGeneHistory(destfile = NULL, justRead = FALSE)
getGeneHistory(destfile = NULL, justRead = FALSE)
destfile |
Path of the output file. If NULL a temp file will be used |
justRead |
If TRUE and destfile exists, it reads the file instead of downloading the latest one from NCBI |
A data frame with latest gene history information
This function downloads the gene_info file from NCBI website and returns the gene symbols for current IDs.
getGeneInfo(destfile = NULL, justRead = FALSE, chunk_size = 1e+06)
getGeneInfo(destfile = NULL, justRead = FALSE, chunk_size = 1e+06)
destfile |
Path of the output file. If NULL a temp file will be used |
justRead |
If TRUE and destfile exists, it reads the file instead of downloading the latest one from NCBI |
chunk_size |
Chunk size to be used with |
A data frame with gene symbols for each current gene id
This function downloads the latest homologene file from NCBI. Note that Homologene
has not been updated since 2014 so the output will be identical to homologeneData
included in this package. This function is here for futureproofing purposes.
getHomologene(destfile = NULL, justRead = FALSE)
getHomologene(destfile = NULL, justRead = FALSE)
destfile |
Path of the output file. If NULL a temp file will be used |
justRead |
If TRUE and destfile exists, it reads the file instead of downloading the latest one from NCBI |
A data frame with homology groups, gene ids and gene symbols
Given a list of genes and a taxid, returns a data frame inlcuding the genes and their corresponding homologues
homologene(genes, inTax, outTax, db = homologene::homologeneData)
homologene(genes, inTax, outTax, db = homologene::homologeneData)
genes |
A vector of gene symbols or NCBI ids |
inTax |
taxid of the species that the input genes are coming from |
outTax |
taxid of the species that you are seeking homology |
db |
Homologene database to use. |
homologene(c('Eno2','17441'), inTax = 10090, outTax = 9606)
homologene(c('Eno2','17441'), inTax = 10090, outTax = 9606)
List of gene homologues used by homologene functions
homologeneData
homologeneData
An object of class data.frame
with 275237 rows and 4 columns.
A modified copy of the homologene database. Homologene was updated at 2014 and many of its gene IDs and symbols are out of date. Here the IDs and symbols are replaced with their most current version Last update: Tue Oct 31 18:41:52 2023
homologeneData2
homologeneData2
An object of class data.frame
with 266573 rows and 4 columns.
Version of homologene used
homologeneVersion
homologeneVersion
An object of class integer
of length 1.
Human/mouse wraper for homologene
human2mouse(genes, db = homologene::homologeneData)
human2mouse(genes, db = homologene::homologeneData)
genes |
A vector of gene symbols or NCBI ids |
db |
Homologene database to use. |
human2mouse(c('ENO2','4340'))
human2mouse(c('ENO2','4340'))
Mouse/human wraper for homologene
mouse2human(genes, db = homologene::homologeneData)
mouse2human(genes, db = homologene::homologeneData)
genes |
A vector of gene symbols or NCBI ids |
db |
Homologene database to use. |
mouse2human(c('Eno2','17441'))
mouse2human(c('Eno2','17441'))
Names and ids of included species
taxData
taxData
An object of class data.frame
with 21 rows and 2 columns.
Creates an updated version of the homologene database. This is done by downloading
the latest gene annotation information and tracing changes in gene symbols and
identifiers over history. homologeneData2
was created using
this function over the original homologeneData
. This function
requires downloading large amounts of data from the NCBI ftp servers.
updateHomologene( destfile = NULL, baseline = homologene::homologeneData2, gene_history = NULL, gene_info = NULL )
updateHomologene( destfile = NULL, baseline = homologene::homologeneData2, gene_history = NULL, gene_info = NULL )
destfile |
Optional. Path of the output file. |
baseline |
The baseline homologene file to be used. By default uses the
|
gene_history |
A gene history data frame, possibly returned by |
gene_info |
A gene info data frame that contatins ID-symbol matches,
possibly returned by |
Homologene database in a data frame with updated gene IDs and symbols
Given a list of gene ids and gene history information, traces changes in the gene's name to get the latest valid ID
updateIDs(ids, gene_history)
updateIDs(ids, gene_history)
ids |
Gene ids |
gene_history |
Gene history information, probably returned by |
A character vector. New ids for genes that changed ids, or "-" for discontinued genes. the input itself.
## Not run: gene_history = getGeneHistory() updateIDs(c("4340964", "4349034", "4332470", "4334151", "4323831"),gene_history) ## End(Not run)
## Not run: gene_history = getGeneHistory() updateIDs(c("4340964", "4349034", "4332470", "4334151", "4323831"),gene_history) ## End(Not run)