---
slug: "remove-4bytes-char-on-python"
title: "Remove 4-byte Characters from Strings in Python"
description: "Configure a macOS Service (Quick Action) that copies the absolute path of the file or folder open in Finder to the clipboard."
url: "https://www.ytyng.com/en/blog/remove-4bytes-char-on-python"
publish_date: "2023-03-06T07:01:43Z"
created: "2023-03-06T07:01:43Z"
updated: "2026-05-11T13:21:31.815Z"
categories: ["Python"]
keywords: ""
featured_image_url: "https://media.ytyng.com/resize/20250708/fbc7a05b18e44c2bb9ed93439cfd88c4.png.webp?width=768"
has_video: true
has_music: true
video_urls: ["https://media.ytyng.net/ytyng-blog/273/featured-video-1.mp4", "https://media.ytyng.net/ytyng-blog/273/featured-video-2.mp4", "https://media.ytyng.net/ytyng-blog/273/featured-video-3.mp4"]
music_urls: ["https://media.ytyng.net/ytyng-blog/273/featured-music-273-3.mp3", "https://media.ytyng.net/ytyng-blog/273/featured-music-273-4.mp3"]
lang: "en"
---

# Remove 4-byte Characters from Strings in Python

In this blog post, I will introduce a Python function that removes 4-byte characters from a string. This can be particularly useful when dealing with emojis or other special characters that are encoded as 4 bytes in UTF-8. Here's the function:

```python
def remove_4bytes_char(text):
    """
    Remove 4-byte characters from a string
    """
    # Convert the string to a bytearray
    byte_string = bytearray(text.encode('utf-8'))

    # Remove 4-byte UTF-8 characters from the byte array
    while b'\xf0' in byte_string:
        index = byte_string.index(b'\xf0')
        if index + 3 < len(byte_string):
            for _i in range(4):
                byte_string.pop(index)

    # Convert the bytearray back to a string
    return byte_string.decode('utf-8')
```

First, the function `remove_4bytes_char` takes a string `text` as input. It then converts this string to a `bytearray` object using UTF-8 encoding. This is necessary because 4-byte characters are easier to identify and manipulate at the byte level.

Next, the function enters a while loop that continues as long as it finds a 4-byte character, which starts with the byte `\xf0`. When it finds this byte, it removes it along with the next three bytes, effectively removing the 4-byte character from the `bytearray`.

Finally, the function converts the modified `bytearray` back into a string using UTF-8 decoding and returns the result.

This function can help sanitize text input by removing unwanted 4-byte characters, which is especially useful when dealing with text data that should not contain emojis or other special characters.
