DPCfam-UHGP50 version 1.0

Explore UHGP-50 clustering results generated by the DPCfam algorithm!
DPCfam automatically classifies homologous protein regions into putative protein families, called Metaclusters or MCs.
Each MC is a collection of protein sequence regions, called seed sequence regions; these regions are then used as seeds to subsequently build profile-HMMs.
Read more...

What you can search using this website:

Note: currently only MCs' seed sequence regions and respective protein's annotation is avaiable.
     We are working to make the full UHGP-50 available on the website, annotated using profile-hmms.

The complete DPCfam annotation of UHGP-50 (v. 1.0) by profile-hmm is available at the following Zenodo repository in XML format: DOI

In the  Download section of the website you can find also DPCfam MC's MSAs and profile-hmms.

Publications:

  1. Russo, ET, Barone F, Bateman A, Cozzini S, Punta M, Laio A. Dpcfam: Unsupervised protein family classification by density peak clustering of large sequence datasets.256 PLOS Comput. Biol. 18, 1–29, 2022 - DOI
  2. Russo ET, Laio A, Punta M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics. 2021 Mar 12;22(1):121. doi: 10.1186/s12859-021-04013-x. PMID: 33711918; PMCID: PMC7955657 - DOI - PubMed
  3. Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–1496. doi: 10.1126/science.1242072 -  DOI  -  PubMed