DPCfam-UHGP50 version 1.0
Explore UHGP-50 clustering results generated by the DPCfam algorithm!
DPCfam automatically classifies homologous protein regions into putative protein families, called Metaclusters or MCs.
Each MC is a collection of protein sequence regions, called seed sequence regions; these regions are then used as seeds to subsequently build profile-HMMs.
Read more...
DPCfam automatically classifies homologous protein regions into putative protein families, called Metaclusters or MCs.
Each MC is a collection of protein sequence regions, called seed sequence regions; these regions are then used as seeds to subsequently build profile-HMMs.
Read more...
What you can search using this website:
- DPCfam: search a specific MC, e.g. MC81355
- PFam: search a Pfam family, e.g. PF01226, to see which MCs matches to it (Pfam v. 33.0)
- Protein: search for a protein in UHGP-50 (v. 1.0) using its short hame, e.g. GUT_GENOME000001_00003
Note: currently only MCs' seed sequence regions and respective protein's annotation is avaiable.
We are working to make the full UHGP-50 available on the website, annotated using profile-hmms.
The complete DPCfam annotation of UHGP-50 (v. 1.0) by profile-hmm is available at the following Zenodo repository in XML format:
In the Download section of the website you can find also DPCfam MC's MSAs and profile-hmms.
Publications:
- Russo, ET, Barone F, Bateman A, Cozzini S, Punta M, Laio A. Dpcfam: Unsupervised protein family classification by density peak clustering of large sequence datasets.256 PLOS Comput. Biol. 18, 1–29, 2022 - DOI
- Russo ET, Laio A, Punta M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics. 2021 Mar 12;22(1):121. doi: 10.1186/s12859-021-04013-x. PMID: 33711918; PMCID: PMC7955657 - DOI - PubMed
- Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–1496. doi: 10.1126/science.1242072 - DOI - PubMed