Atlas

Atlas

Do you 👂 the people sing?
twitter
twitter

How to export Moments data and link it to permanent storage

This is actually two questions. The first question is how to export your Moments data, and the second question is how to store this data on the blockchain. Let's start with the results. In the end, I successfully exported my Moments data from WeChat version 8.0.32 on iOS and stored it on the Crossbell blockchain: https://xfeed.app/u/wxd6bb23a9

The reason I researched this was because on February 4, 2023, my WeChat account was banned (https://atlas.xlog.app/after-wx-banned), and I wanted to export and showcase my Moments content. I searched for a lot of information during this process, but most of the solutions I found were outdated. So I thought it would be a good idea to document my own exploration and the pitfalls I encountered.

Before I continue, I want to make a few disclaimers:

  1. I did not export the comments data because I didn't think it was necessary. The purpose of storing data on the blockchain is to confirm ownership, and it doesn't make much sense to help others store their content on the blockchain. However, if there is a demand for it, it shouldn't be too difficult to implement. I noticed that there is a table called "MyWC_Message01" that stores comments in plain text, but I'm not sure if it's complete. If there is a demand, you can refer to this tutorial and continue your own research.
  2. I did not export other people's Moments data, for the same reason. However, it's not difficult to guess which table to start with if you need to export it.
  3. I did not export WeChat friends/chat records. I assume this is a common requirement, but I personally didn't have a need for it, so I didn't research it. However, it's likely that exporting this data would involve exporting data from another table, so it shouldn't be too difficult.
  4. I did not parse the sharing of video posts. Regular link sharing can be parsed, but video posts are too complex to recover the actual video link. Also, I rarely shared video posts, so I didn't have much need to parse them.

Clear Objective: Export Moments Data#

Assuming the current requirement is to export Moments data, there are actually several different scenarios, each requiring a different method:

  • If you are a WeChat User (not a WeChat user), the official WeChat API provides a way to export data. You can refer to this blog post for more information. If you also want to showcase the data on the blockchain, you can also refer to this blog post.
  • If you are a WeChat user:
    • If your WeChat account has not been banned, you can try searching on Taobao for "WeChat Moments" to find services that can export your Moments as an e-book or similar format (I'm curious how Taobao sellers do this, I'm not sure if it's through caching).
    • If your WeChat account has been banned, or even if it hasn't been banned but you're interested in how to export data, you can try the method I will focus on next, which is recovering data from the phone's cache. This method is feasible because even if your WeChat account is banned, you can still access your own Moments (fortunately).

Cache Recovery Method#

As the name suggests, this method involves ensuring that WeChat has cached your Moments locally, then exporting your phone's data, and finally finding the relevant files from the exported data and extracting useful information such as the posting time and content of Moments, and then piecing together the complete Moments data.

1. Local Cache#

Open WeChat, clear the cache (this step is not necessary, but it can reduce the waiting time for backup and copying), then open your Moments and scroll down to the earliest post. Cache all your Moments locally, and make sure to open each image as well, otherwise only the thumbnails will be cached. To ensure that all Moments are successfully cached, you can disconnect from the internet after scrolling to the end and confirm if you can still see the Moments. If you can see them, it means they have been successfully cached.

2. Export Cache Files#

Because I logged into WeChat on an iOS device, I'm not sure if I can still log in on other devices after my account was banned (I'm worried about trying too many times and not being able to log in on any iOS device). Therefore, I can only export the cache through a complete backup of the phone. Android devices should be able to directly export the cache files, but on iOS, you can only access the app's cache files through a complete backup of the phone.

I used a tool called iMazing, the free version is enough. First, back up your phone's data, then find WeChat's Documents folder and export it. The steps are shown in the following image. The free version of iMazing allows 10 exports.
image

In the Documents folder, there is at least one folder named with a hash string, like this:

eb8a6093b56e2f1c27fbf471ee97c7f9

This folder contains the personal data of the WeChat user. If you have logged into multiple WeChat accounts on this phone, there may be multiple folders with hash names. If you're not sure which one you want to export, you can export all of them and check.

Find the files ./Documents/{hash}/wc/wc005_008.db and ./Documents/{hash}/DB/WCDB_Contact.sqlite. The former is the table related to Moments data, and the latter is the table related to friend data. We only need the latter table to extract our own account's profile picture.

(Pitfall: New versions of Mac cannot back up through iTunes)

3. Parse the Cache#

TL;DR: Download this repo, drag the wc005_008.db and WCDB_Contact.sqlite files to the root directory, modify the "hash" in the main.py file to your own hash, and then run python3 main.py to export a moments.json file.

Regarding the script, there are a few things worth mentioning:

  1. In the script, I set a parameter called "dl_img". If set to True, it will download all the images locally. Since my WeChat account has been banned, I don't know how long the images in Moments will be hosted, and if there are frequent external requests for Moments images, who knows what might happen. I recommend downloading the images locally while you still can for safety reasons.

  2. For moments that contain shared links, I not only parse the shared link itself, but also parse the images/titles/descriptions cached by WeChat for that link. This fully restores how a shared link is rendered in Moments. I did this because many shared links have already become 404... If we only parse the link, it wouldn't be very meaningful. I think it's necessary to parse all the cached data at that time, at least to restore the "cover".

Of course, there is a lot of analysis behind the script in this repo. I will briefly explain it, and you can decide whether to skip this section based on your interest.

WeChat uses SQLite for caching, and to analyze the wc005_008.db database, you can use this open-source SQLite browser. After a simple analysis, I found that there are many tables starting with "MyWC01_" in the db, and the Moments data for your own account is stored in the table "MyWC01_{$hash}", where $hash is the same as the hash in the directory mentioned earlier. Other "MyWC01_..." tables likely contain Moments data for your friends.

When you enter the table that stores your own Moments data, you will find two fields that are very important: "Buffer" and "id". If you decode the Buffer field in UTF-8, you will see many plaintext fields, some are image URLs, some are the names of friends, and some are previous Moments content. From the Buffer field, we can recover Moments data.

Let's take the content of the Moments as an example. The following image shows the binary files of two Moments content. By observing, it is easy to see that b'\xba\x01' is the identifier for the content.
flag

Different fields have corresponding flags, such as image URLs, content, shared links, etc. These fields are represented in the Buffer field as flags followed by one or two bytes indicating the length of the Message, followed by the Message itself.

Taking the content as an example, the typical payload looks like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Different fields have corresponding flags, such as image URLs, content, shared links, etc. These fields are represented in the Buffer field as flags followed by one or two bytes indicating the length of the Message, followed by the Message itself.

Taking the content as an example, the typical payload looks like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Different fields have corresponding flags, such as image URLs, content, shared links, etc. These fields are represented in the Buffer field as flags followed by one or two bytes indicating the length of the Message, followed by the Message itself.

Taking the content as an example, the typical payload looks like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

Although it is difficult to fully understand the format of this data, we can still identify the content based on certain fixed markers. We can find a typical payload like this:

payload

This is the basic idea. Of course, there are many more details. If you're interested, you can refer to the code implementation.

(Pitfall: I initially referred to the code in this repo for many things, but the code in this repo parses the Buffer as a plist format, which is not the actual format of the cache in the current version)

4. Data Display and Permanent Storage on the Blockchain#

Now that the data has been exported, you can do whatever you want with it. I think storing the data on the blockchain is a romantic way to keep a record, so I chose to back up my Moments on the Crossbell blockchain and showcase them on xFeed.

To implement the blockchain functionality and facilitate debugging to ensure that my data was exported correctly, I also created a simple display page in the repository. The final result is roughly as shown in the image:

image

If the data export went smoothly, you can simply click "Store on Blockchain" on the page and follow the steps to store the data. However, to interact with the blockchain smoothly, there are some preparations to be made:

  1. Download the Metamask wallet plugin.
  2. Claim gas from the faucet for the interaction. If you have a large amount of data, you may need a significant amount of gas. If you need more gas, you can contact me.

Once these two preparations are done, you can simply click "Store on Blockchain" on the page.

Conclusion#

WeChat is constantly updating, and the structure of the cache is also changing. This article may not be completely applicable or cover all scenarios, but I hope it provides some reference and inspiration. If you have made other discoveries, I welcome the opportunity to discuss them.

Finally, here are the two repositories mentioned in this article:

In addition to exporting Moments, I also wrote a Tampermonkey script to export QQ Zone posts (yes, my QQ account was also banned). Exporting QQ Zone posts is much simpler, and although the content has already been exported here, I haven't finished organizing the code yet, but I plan to write a simple tutorial later.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.