Analysis of Python's new string format vulnerability

Analysis of Python's new string format vulnerability

Preface

This article conducts an in-depth analysis of the security vulnerabilities of a new syntax for formatting strings introduced by Python, and provides corresponding security solutions.

When we use str.format on untrusted user input, it will bring security risks - I have known about this problem for a long time, but I didn't realize its seriousness until today. Because attackers can use it to bypass the Jinja2 sandbox, which will cause serious information leakage. At the same time, I provide a new secure version of str.format at the end of this article.

As a reminder, this is a pretty serious security risk, and the reason it's being written about here is that most people probably don't know how easily it can be exploited.

Core Issues

Starting from Python 2.6, Python introduced a new syntax for formatting strings inspired by .NET. Of course, in addition to Python, Rust and some other programming languages ​​also support this syntax. With the help of the .format() method, this syntax can be applied to bytes and unicode strings (in Python 3, it can only be used for unicode strings). In addition, it can also be mapped to the more customizable string.Formatter API.

A feature of this syntax is that one can determine the positional and keyword arguments of the string format, and can explicitly reorder the data items at any time. In addition, it is even possible to access object attributes and data items - this is the root cause of the security issue here.

In general, people can use it to:

  1. > > > 'class of {0} is {0.__class__}'.format(42)
  2. "class of 42 is < class 'int' > "

Essentially, anyone who can control the format string can potentially access various internal properties of the object.

What's the problem?

The first question is how to control the format string. You can start from the following places:

1. Untrusted translators in string files. These are likely to work because many applications translated into multiple languages ​​use this new Python string formatting method, but not everyone does a thorough review of all input strings.

2. User-exposed configuration. Since some system users can configure certain behaviors, these configurations may be exposed in the form of format strings. It is important to note that I have seen some users configure notification emails, log message formats, or other basic templates through web applications.

Hazard Level

If you just pass the C interpreter object to the format string, it's not very dangerous, because then the most you'll expose is some integer class or something like that.

However, once Python objects are passed to this format string, things get tricky. This is because the amount of things that can be exposed from a Python function is quite staggering. Here is a scenario for a hypothetical web application that could leak a key:

  1. CONFIG = {
  2. 'SECRET_KEY': 'super secret key'
  3. }
  4.    
  5. class Event(object):
  6. def __init__(self, id, level, message):
  7. self.id = id
  8. self.level = level
  9. self.message = message
  10.    
  11. def format_event(format_string, event):
  12. return format_string.format( event event =event)

If a user could inject format_string here, they would be able to discover a secret string like this:

  1. {event.__init__.__globals__[CONFIG][SECRET_KEY]}

Sandboxing Formatting

So, what should you do if you need someone else to provide a formatted string? In fact, you can use some undisclosed internal mechanisms to change the string formatting behavior.

  1. from string import Formatter
  2. from collections import Mapping
  3.    
  4. class MagicFormatMapping(Mapping):
  5. """This class implements a dummy wrapper to fix a bug in the Python
  6. standard library for string formatting.
  7.    
  8. See http://bugs.python.org/issue13598 for information about why
  9. this is necessary.
  10. """
  11.    
  12. def __init__(self, args, kwargs):
  13. self._args = args
  14. self._kwargs = kwargs
  15. self._last_index = 0  
  16.    
  17. def __getitem__(self, key):
  18. if key == '':
  19. idx = self ._last_index
  20. self._last_index += 1
  21. try:
  22. return self._args[idx]
  23. except LookupError:
  24. pass
  25. key = str (idx)
  26. return self._kwargs[key]
  27.    
  28. def __iter__(self):
  29. return iter(self._kwargs)
  30.    
  31. def __len__(self):
  32. return len(self._kwargs)
  33.    
  34. # This is a necessary API but it's undocumented and moved around
  35. # between Python releases
  36. try:
  37. from _string import formatter_field_name_split
  38. except ImportError:
  39. formatter_field_name_split = lambda \
  40. x: x._formatter_field_name_split()
  41.    
  42. class SafeFormatter(Formatter):
  43.    
  44. def get_field(self, field_name, args, kwargs):
  45. first, rest = formatter_field_name_split (field_name)
  46. obj = self .get_value(first, args, kwargs)
  47. for is_attr, i in rest:
  48. if is_attr:
  49. obj = safe_getattr (obj, i)
  50. else:
  51. obj obj = obj[i]
  52. return obj, first
  53.    
  54. def safe_getattr(obj, attr):
  55. # Expand the logic here. For instance on 2.x you will also need
  56. # to disallow func_globals, on 3.x you will also need to hide
  57. # things like cr_frame and others. So ideally have a list of
  58. # objects that are entirely unsafe to access.
  59. if attr[:1] == '_':
  60. raiseAttributeError(attr)
  61. return getattr(obj, attr)
  62.    
  63. def safe_format(_string, *args, **kwargs):
  64. formatter = SafeFormatter ()
  65. kwargs = MagicFormatMapping (args, kwargs)
  66. return formatter.vformat(_string, args, kwargs)

Now, we can use the safe_format method instead of str.format:

  1. > > > '{0.__class__}'.format(42)
  2. " < type 'int' > "
  3. > > > safe_format('{0.__class__}', 42)
  4. Traceback (most recent call last):
  5. File " < stdin > ", line 1, in < module >  
  6. AttributeError: __class__

summary

In this article, we conducted an in-depth analysis of the security vulnerabilities of a new syntax for formatting strings introduced by Python, and provided corresponding security solutions, hoping that it will be helpful to readers.

<<:  I don't know the router's address.

>>:  Accelerate 5G research and development to reduce network charges

Recommend

Do we really need a cloud-native edge to support 5G?

[51CTO.com Quick Translation] Convergence has bee...

5G: Smart cities’ potential to transform public services

While drawing parallels between 5G and national s...

These 8 technologies may disrupt 2018

As 2018 begins, many of the biggest tech trends a...