3
Logo of Leipzig Corpora Collection

OpenSearch@MPDL – Crawler FAQ



The OpenSearch experiment is initiated by a group of universities and research organization, in which Max Planck Digital Library takes part. The aim of the project is to create the next generation Internet search.

The crawler that visited your website is collecting data for this experiment. The crawled data is used to build up an index containing Open Access scientific repositories and related websites.
The data will be deleted in less than 60 days after crawling takes place.

The crawler Heritrix (Vers. 3.4.0) is used, see this link for details. Heritrix was developed by the Internet Archive and is used by several institutions.

If you want to exclude crawlers from your website, it is common practice to do this by using the robots.txt file of your domain. Most crawlers, like ours, respect the rules you specify there. By adding the following lines to your robots.txt you can exclude the crawler of the OpenSearch@MPDL from your domain. Please allow for one day until changes take effect.

User-agent: OpenSearch@MPDL
Disallow: /

We can also offer you to include your domain in our black list of websites which are not to be crawled. In that case, please write us an e-mail.