I’ll try to list here every repository that might be useful for semantic applications. Even though there is a lot of data in the biomedical field including SNOMED, the MeSH and the gene ontology, sometimes it really takes some time to find data for some other field. For example, if you want to make a recommender system , it would be much sexier to propose a movie recommender system than a biomedical paper recommender system… So far in my PhD, I spent quite a great amount of time exploring the web to find knowledge databases. This post is supposed to be a memo but I’m sure it could help people with similar needs.
- DBPedia: that’s one of the firsts you’ll find. It relies on Wikipedia structure (categories) – which doesn’t aim to be perfect.
- Yago: a bit better than DBPedia. It is actually inspired from DBPedia but it also includes knowledge from WordNet and Geonames. This can be useful for really wide-range applications (I mean, if you’re not focusing on a very specific topic).
- WordNet: it’s an English lexical database. It contains synsets – sets of synonyms – that are ordered a bit like in an ontology. Therefore, it can be used in semantic applications. I’ve never used it actually so I won’t comment how useful it can be. I’ll just cite some papers using it and refer you to it, here and here.
- Freebase: this contains a lot of things. Movies, songs, people, facts… There is no “ontology” structure but we can find our way with it. For example, movies belong to several genres and genres are structured as a hierarchy. We can thus consider the genre graph as an ontology and the movies as a dataset annotated by this ontology.
Data structures, taxonomies, ontologies…
- ODP: the Open Directory Project provides a hierarchy of various categories in which websites are referenced.
- BBC ontologies: I’ve never used them, but they seem to have ontologies for several domains including politics, sport, food and so on.
- DataHub: more generally, there are a lot of data there – not only ontologies. Some of them are published with a paper, others may not be very serious. Anyway, you may need to select carefully.