Home   |  Other web-portals  

How to cite this? |  Promoter prediction 
Table: a comparison of publicly (non-commercial) available Promoter prediction resources
Surya Devarajan (*Correspondence: Acharya KK, kshitish@ibab.ac.in)
Input Parameters Output Parameters
Tool nameGene ID / Promoter seq.TSSAlternate promotersSequence parts' of interestOverlaps / assembly gapsPromoter cut-offExon cut-offSplice donor cut-off TFsMotifs: Name / IDMotifs: Occurrence per seq.Motifs: WidthMotifs: Maximum no.Motifs: Analysis typeMotifs: Distance between motifs within a clusterMotifs: Distance between clustersMotifs: Motif probabilityMotifs: Half width of sliding window for local base compositionMotifs: Cluster score thresholdMotifs: Motif score thresholdMotifs: Residue abundance rangeMotifs: Clustering optionsMotifs: TFBS finding programMotifs: Transcript selectionNegative seq. Background model Palindromes GC contentCpG Islands predictionsThreshold for homologyLength of flanksDNA stabilityRepeatsPromoter: Name/ IDPromoter: TypePromoter: SequencePromoter: TSS positionPromoter: StrandPromoter: Position/ length Promoter: PredictionPromoter: P(promoter)/ scoreChromosome positionEnhancer / InsulatorTFsMotifs: TFBSMotifs: Overepresented/ UnderepresentedMotifs: E- value Motifs: Sequence logoMotifs: Width Motifs: Number of clusters containing motifsMotifs: Cluster positionMotifs: Cluster scoreMotifs: Motif scoreMotifs: TFBS ID / Matrix IDMotifs: StrandMotifs: PositionMotifs: P-value Motifs: ProbabilityMotifs: MAP scoreUTRTandem repeatsRegular expression Block diagram ECR: TypeECR: Features Exon/ Intron: Position/ LengthExon/ Intron: P(exon)/ P(intron)Exon/ Intron: Rank P(donor)CpG window: Number of CpG IslandsCpG window: LengthCpG window: G+C frequencyCpG window: CpGo/e ratioCpG window: Start-pCpG window: AT skewCpG window: GC skewCpG window: Strand-pPeptidesHomology level
GENSCANAccepts sequence upto 1 million bp for number of organisms. Also, we can get a local copy of the available program to submit large sequences.Suboptimal exon cut-off can be chosen and is optional.Gives start and end positions, length for introns and exons. Gives the percentage of G+C.Predicted peptide sequences are displayed.
MEME - Multiple Em for Motif ElicitationMultiple sequences can be given, not more than 60000 characters. Even protein sequences can be given.Gives the motif occurrence distributed among the sequences. We can select one motif per sequence or any number of repetitions.Width of each motif can be selected within the limits 2 to 300.Max number of motifs and no. of sites for each motif can be specified. Sites can be within the limits 2 to 600 but it is optional.Negative sequences can be submitted but optional.Background Markov model can be submitted but optional.Palindromes are shown in case of DNA only and neglected for proteins.Number of binding sites have been shown for each motif along with the sequence. 10 sequence positions preceding and following each site are also shown.E- value for each motif is given which shows its significancy.Sequence logo is given for each motif.Width of each motif is displayed within the limits given.Gives whether the binding sites for each motifs are in positive or negative strand.Gives the start position of all binding sites for each motif.Lists all binding sites in the increasing order of statistical significance of each motif. Gives the regular expression of each motif as shown like AAA[AT]AAA[AT]AA. High frequent letters greater than 0.2 are shown and less than those are neglected.Gives combined block diagram showing the occurrence of all motifs in the given sequence set.
TRANSCompelNeeds registration for public version. Personal Email IDs are not allowed.
AlignACEMaximum file size for DNA sequences is 50 KB.Number of expected sites can be specified. Fractional background GC content can be specified and it is set as default.Gives number of binding sites and sequence for each motif. Also gives consensus (GC) for each motif.Gives position of all binding sites for each motif in the given sequence set. Gives MAP score for each motif.
rVISTAPromoter sequence for 2 different species has to be given.
ConSiteSingle sequence can only be given.Given TFs can be selected. It is optional.Output error.
Proscan, Promoter ScanREADSEQ program requires min sequence length of 111bp. So, if a sequence shorter than this has to be submitted, then FASTA format is better.Pol II promoter region is predicted on both forward and reverse strand.Gives significant signals with its ID, strand, weight and position in that predicted promoter region for both direct and reverse strand.
ECRbase - database of Evolutionary Conserved RegionData can be chosen by browsing through ECR Base. Single gene or feature or chromosome position can also be submitted from ECRBrowser available in the same webpage.Multi-genome comparison of promoters for RefSeq and UCSC known genes is given which has to be downloaded.DNA sequence of specified gene is available which is given through ECR Browser.TFBS present in promoter region, ECR and core ECR of multiple genomes can be downloaded.Both standard ECR and core ECR is available and it is a multi-genome comparison. Able to browse and download the file according to the desired species.Features like relative position, genomic position, length, percentage identity is given for multiple genomes of different species through ECR Browser.
Signal ScanREADSEQ program requires min sequence length of 111bp. So, if a sequence shorter than this has to be submitted, the FASTA format is better.Uses TRANSFAC & TFD database.Shows enhancer region. Binding sites are also displayed in enhancers.Gives factors which binds to the sites in the predicted promoter region.Gives the number of signal sequences in the predicted promoter region.TFBS IDs are also given.Gives whether the binding sites are in positive or negative strand.Gives position of all binding sites in the predicted promoter region.
FirstEF - First Exon FinderMaximum sequence file length is 100 kb. By default cut-off is given as 0.4By default cut-off for first exon is given as 0.5By default cut-off is given as 0.4Length of promoter region is given for each sequence on direct and complementary strand.Gives the probability of occurrence of promoter for the given window size on direct and complementary strand for each sequence.Exon length is given for both direct and complementary strand for each sequence.Probability of occurrence of exon for a given GT and promoter region is given for both strands.Rank of the first exon within a cluster is shown.Probability of occurrence of donor for a given GT is displayed for both strands.Length of CpG Islands is given for both strands if it exists.
Promoter 2.0 prediction ServerSubmits atmost 50 sequences and 1500000 nucleotides at a time. Allows single or multiple FASTA sequences. Predicts Pol III promoter region.Gives the positions of Pol III promoter region in each sequence.Gives if promoter prediction is highly likely or marginal prediction.Gives the scores of Pol III promoter region in each sequence.
EPD - Eukaryotic Promoter DatabasePromoters can be searched using ENSEMBL ID or RefSeq ID or HGNC Gene symbol from EPD or ENSEMBL database for human or mouse species. Multiple promoter sequences can be downloaded.Only 3 motifs (TATA Box, CCAAT, GC Box) are given as options, which can be selected according to the user.Promoter ID is given.Promoter sequence can be downloaded as FPS, SGA and FASTA file for the given upstream and downstream limit.TSS position for each ID will be given.Displays position of strand for each ID.Sequence position or length will be given for each ID.Gives the chromosome position in the genome for each ID.Specified motifs gives its positions in each ID.Image with details of promoter elements is displayed.
The MAPPER2 DatabaseAlthough registered but confirmation e-mail was not received so couldn't access it. (as on May 10, 2013)
Cister - Cis element Cluster FinderMaximum sequence length is 100 KB. Gene ID can be entered. Need to download Cister database to analyze longer sequence. Limits the search to a subsequence by entering start and end co-ordinates.Given motifs can be selected or our own motifs can be specified. Around 16 motifs are given to choose.Average number of motifs needed in a cluster can be mentioned.Distance between motifs within a cluster can be mentioned or set as default.Distance between clusters can be given or set as default.Motif predictions are displayed if their posterior probability is above the threshold which is set as default as 0.1.The base abundances are counted around each point in the query sequence.Sequence for each motif is given.Gives the specified motifs are in which strand.Gives motif position in the given sequence set.Motif occurrence above the given threshold are displayed.Cis-elements in both strands of sequence are represented.
Cluster-BusterMaximum sequence length is 100 KB.Gene ID can be entered.Need to download Cister database to analyze longer sequence.Limits the search to a subsequence by entering start and end co-ordinates.Given motifs can be selected or our own motifs can be specified. Around 16 motifs are given to choose.Distance between motifs within a cluster can be mentioned or set as default.Motif clusters above the given score will be reported which is set as default.Motifs in each cluster above the given score will be reported which is set as default.Abundances of A, C, G, T can be estimated at each point of sequence which is set as default.Sequences of the specified motifs are given in each cluster.Displays number of clusters.Position of each cluster is given.Score for each cluster is displayed.Score for each motif in the cluster is displayed.Gives the specified motifs are in which strand.Gives motif position in the given sequence set.
CpGProd - CpG Island Promoter DetectionRecommend all lines of text to be shorter than 80 characters. Also, given that sequences have to be masked with RepeatMasker.Total number of CpG Islands will be specified for the given gene set.Gives starting and ending positions of CpG Islands and also displays length of each.Gives G+C frequency of each.Gives CpGo/e ratio for each.Gives the probability of occurrence of CpG Island over TSS.Specifies CpG Islands located over + strand with excess of T compared to A and excess of G compared to C. Specifies CpG Islands located over - strand with depletion of T compared to A and depletion of G compared to C. Gives probability of occurrence over + strand or - strand.
PAINT - Promoter Analysis and Interaction Network ToolsetGene ID or promoter sequence can be given. The expected file format is a text.Extracts upto 5000 bp of upstream sequences Clusters TREs based on promoters they are present on or clusters genes based on the TRE present on the promoters.Uses TRANSFAC (Public or commercial)Prebuilt reference can be selected or can submit our own reference set.Promoter sequiences can be downloaded.Number of regulatory elements are given. Gives significance of ccurrence of regulatory elements for 3 sets:- input list compared to reference, individual clusters compared to list, clusters compared to reference.
SCOPE - Suite for Computational identification Of Promoter ElementsGene list or promoter sequences can be given.Selects either intergenic i.e. it can find upstream sequence of a gene until the previous gene or a fixed limit.Motifs: Name / ID: Either the specified motifs can be searched or others also can be displayed. It also examines genome for other genes containing found motifs.Output error.
CONREAL - CONserved Regulatory Elements anchored AlignmentSequences can be submitted in multiple FASTA format with maximum 100 kb per sequence.Threshold for PWMs can be chosen between 70 to 90 %.Uses JASPAR and TRANSFAC v8.2 vertebrates.Choices for giving threshold are 10 %, 50 % and 75 %.Length of flanks to calculate homology can be chosen between 0 to 15 bp.Output error.
ConTra v2Gene name or symbol, Ensembl ID, entrez ID, Ensembl or RefSeq transcript ID can be given or the sequence file can be in multifasta, maf or CLUSTAL format.Choses upstream core promoter region for the specified limit, 5' UTR and 3' UTR region. This is possible only if gene name or symbol is given as input and not a maf file.Selects by the given list of TFs to a maximum limit of 25 according to the analysis type chosen.We can submit PWM list and it is optional. Maximum sequence length should be 50 nt. Also, selects by the given list of Matrix ID to a maximum list of 25 PWMs according to the analysis type chosen.Chooses either visualization or exploration for TFBS analysis. Visualization identifies TFBS that leads to gene regulation; TFs will be given to select. Exploration can be selected when we don't know which all TFs leads to regulation; TFs won't be given in that case.PWMs are obtained from TRANSFAC or JASPAR or PhyloFACTS or PBM homeodomains. Choses core score and matrix score from the given options.Since a gene has multiple transcripts or isoforms, it can be selected according to the user. This is possible only if gene name or symbol is given as input and not a maf file.Specified upstream sequence will be given. The multiz alignments are divided in alignment blocks.Chromosome position will be given for each block.Specified TFBS with sequence will be shown in coloured. Gives width of each binding site. Gives 5' UTR, 3' UTR as mentioned in input.
CpG_promoter - Human promoter mapping using CpG islandsOnly single FASTA sequence can be submitted.Gives whether CpG Islands are related to promoter or not related to promoter.
PromH(W)2 orthologous sequences in FASTA format has to be submitted.Gives promoter positions separately for 2 orthologous sequences. Also gives length of the given input sequence.Enhancer positions are shown separately for 2 orthologous sequences.Gives number of TFBS with sequence in each promoter position for 2 different orthologous sequences. Position of TATA Box is given separately in each promoter region.Gives ID for each binding sites.Shows whether the binding sites are in positive or negative strand.Gives the position of each TFBS in the promoter region.Gives homology level of TATA Box, regulatory elements and aligned sequences in each promoter region.
PromoSerAccepts multi-FASTA format list of sequences, Genbank IDs and list of genomic loci. There is a limit of 2000 requests and 2Mb of total sequence while giving the list of genomic loci.Selects best supported ones or which starts 5' most or 3' most or the one closest upstream to 5' seq or can ignore them according to the options selected.Extracts upstream and downstream sequences from TSS.If promoter region overlaps transcription range of another region, we can stop at the boundary of next upstream region on same strand or either strand or can ignore them according to the options selected. If the promoter region has assembly gaps, ignore or stop at the boundary of nearest upstream gap. Repeats can be masked using a lower case or N or can be ignored.None of the sequences could align to selected genomes.
PSPA - Position Specific Propensity AnalysisOnly single FASTA sequence can be submitted, not more than 50 Kbs.Selects from the given options,CpG rich or CpG poor model.There are 2 options, default and user specify. Default option will return only one prediction on each cluster, predictions within 10,000 bps are clustered. User specified option selects the minimum distance between two clusters, how many top hits wanted, as ranked by PSPA score, and the minimum distances between two hits. This option helps to get alternative TSS but may not be reliable.Masks the lower case that are repeats. No. of CpG Islands will be displayed on both strands.Gives starting and ending positions of predicted CpG Islands on both strands.Gives GC percentage of predicted CpG Islands on both strands.Gives CpGo/e ratio of predicted CpG Islands on both strands.
TfsitescanOnly 500 nt sequences can be submitted. Searches IFTI Tf sites or IFTI Tf matrix according to the species selected.Output error.
GPMinerOnly single FASTA sequence can be submitted.Eponine (TSS prediction tool) score threshold can be set between 0 to 1. By default it is set as 0.8 with high prediction accuracy.Specifically TATA Box, CCAAT Box, GC Box can be chosen apart from TFBS.TFBS are obtained from TRANSFAC using MATCH program. Core score and matrix score is set as default.Sliding window size is set as default as 15 nt.Sliding window size is set as default as 15 nt.Graphically represents TSS position in the sequence.Gives number of TFs and represents graphically its position in the sequence.Gives graphical representation of 3 motifs TATA Box, CCAAT Box and GC Box.Displays tandem repeats if any.The results are displayed in graphical view.CpG Island region is shown coloured in the sequence.GC ratio is graphically represented.
SiTaR - Site Tracking and RecognitionMentioning 'untrusted connection'.
TrFAST - Transcription Factor Search and Analysis ToolQuery page is not displayed.(as on May 15, 2013)
 

copyright © BdataA; all rights reserved