filtron.rst 4.1KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
  1. How to protect an instance
  2. ==========================
  3. Searx depens on external search services. To avoid the abuse of these services it is advised to limit the number of requests processed by searx.
  4. An application firewall, ``filtron`` solves exactly this problem. Information on how to install it can be found at the `project page of filtron <https://github.com/asciimoo/filtron>`__.
  5. Sample configuration of filtron
  6. -------------------------------
  7. An example configuration can be find below. This configuration limits the access of
  8. * scripts or applications (roboagent limit)
  9. * webcrawlers (botlimit)
  10. * IPs which send too many requests (IP limit)
  11. * too many json, csv, etc. requests (rss/json limit)
  12. * the same UserAgent of if too many requests (useragent limit)
  13. .. code:: json
  14. [
  15. {
  16. "name": "search request",
  17. "filters": ["Param:q", "Path=^(/|/search)$"],
  18. "interval": <time-interval-in-sec>,
  19. "limit": <max-request-number-in-interval>,
  20. "subrules": [
  21. {
  22. "name": "roboagent limit",
  23. "interval": <time-interval-in-sec>,
  24. "limit": <max-request-number-in-interval>,
  25. "filters": ["Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"],
  26. "actions": [
  27. {"name": "block",
  28. "params": {"message": "Rate limit exceeded"}}
  29. ]
  30. },
  31. {
  32. "name": "botlimit",
  33. "limit": 0,
  34. "stop": true,
  35. "filters": ["Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"],
  36. "actions": [
  37. {"name": "block",
  38. "params": {"message": "Rate limit exceeded"}}
  39. ]
  40. },
  41. {
  42. "name": "IP limit",
  43. "interval": <time-interval-in-sec>,
  44. "limit": <max-request-number-in-interval>,
  45. "stop": true,
  46. "aggregations": ["Header:X-Forwarded-For"],
  47. "actions": [
  48. {"name": "block",
  49. "params": {"message": "Rate limit exceeded"}}
  50. ]
  51. },
  52. {
  53. "name": "rss/json limit",
  54. "interval": <time-interval-in-sec>,
  55. "limit": <max-request-number-in-interval>,
  56. "stop": true,
  57. "filters": ["Param:format=(csv|json|rss)"],
  58. "actions": [
  59. {"name": "block",
  60. "params": {"message": "Rate limit exceeded"}}
  61. ]
  62. },
  63. {
  64. "name": "useragent limit",
  65. "interval": <time-interval-in-sec>,
  66. "limit": <max-request-number-in-interval>,
  67. "aggregations": ["Header:User-Agent"],
  68. "actions": [
  69. {"name": "block",
  70. "params": {"message": "Rate limit exceeded"}}
  71. ]
  72. }
  73. ]
  74. }
  75. ]
  76. Route request through filtron
  77. -----------------------------
  78. Filtron can be started using the following command:
  79. .. code:: bash
  80. $ filtron -rules rules.json
  81. It listens on 127.0.0.1:4004 and forwards filtered requests to 127.0.0.1:8888 by default.
  82. Use it along with ``nginx`` with the following example configuration.
  83. .. code:: bash
  84. location / {
  85. proxy_set_header Host $http_host;
  86. proxy_set_header X-Real-IP $remote_addr;
  87. proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  88. proxy_set_header X-Scheme $scheme;
  89. proxy_pass http://127.0.0.1:4004/;
  90. }
  91. Requests are coming from port 4004 going through filtron and then forwarded to port 8888 where a searx is being run.