Dotfuscator中字符串混淆算法

2008年2月14日 4 分钟阅读

代码混淆工具，像Dotfuscator、Xenocode Postbuild等，都有重要功能就是字符串混淆，说起来很轻巧很简单，那么它到底是什么呢，如何工作的呢？

本文以Dotfuscator 4.x为例，并制造一个简单的ConsoleApplication用来做小白鼠，以此窥探字符串混淆的一斑。

首先是简单ConsoleApplication的代码：

using System;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("This is the unencrypted string.");
        }
    }
}

编译，然后使用Dotfuscator混淆，这里我使用的Dotfuscator是4.x Pro，你需要在 Option Tab 里面设置 Disable String Encryption 为 No，在 Input Tab 设置输入为上面工程的编译结果，在 String Encryption Tab 里勾选所有的项或者添加 type 为 * 和 method 为 * 的两条规则，然后编译，完成后就可以在输出目录里找到已经混淆过了的ConsoleApplication1.exe了，使用Reflector打开，可以看到反编译出来的代码如下：

private static void a(string[] A_0)
{
    int num = 2;
    Console.WriteLine(a("軙듛럝鏟싡跣闥죧黩蓫语탯蟱髳鏵雷駹軻蟽烿瘁愃戅⠇礉砋簍礏簑猓㠕", num));
}

嗬，一串乱码，同时还可以看到这里增加了一个叫a的方法，那么这个a到底是什么呢？ Reflector报告如下：

/* private scope */ static string a(string A_0, int A_1)
{
    // This item is obfuscated and can not be translated.

显然这段代码使用Control Flow混淆过了，如此只能从IL下手了：

.method privatescope hidebysig static string a(string A_0, int32 A_1) cil managed
{
    .maxstack 8
    .locals init (
        [0] char[] chArray,
        [1] int32 num,
        [2] int32 num2,
        [3] uint8 num3,
        [4] uint8 num4)
    L_0000: ldarg.0 
    L_0001: callvirt instance char[] [mscorlib]System.String::ToCharArray()
    L_0006: stloc.0 
    L_0007: ldc.i4 0xe74d6d7
    L_000c: ldarg.1 
    L_000d: add 
    L_000e: stloc.1 
    L_000f: ldc.i4.0 
    L_0010: dup 
    L_0011: ldc.i4.1 
    L_0012: blt.s L_0047
    L_0014: dup 
    L_0015: stloc.2 
    L_0016: ldloc.0 
    L_0017: ldloc.2 
    L_0018: ldloc.0 
    L_0019: ldloc.2 
    L_001a: ldelem.i2 
    L_001b: dup 
    L_001c: ldc.i4 0xff
    L_0021: and 
    L_0022: ldloc.1 
    L_0023: dup 
    L_0024: ldc.i4.1 
    L_0025: add 
    L_0026: stloc.1 
    L_0027: xor 
    L_0028: conv.u1 
    L_0029: stloc.3 
    L_002a: dup 
    L_002b: ldc.i4.8 
    L_002c: shr 
    L_002d: ldloc.1 
    L_002e: dup 
    L_002f: ldc.i4.1 
    L_0030: add 
    L_0031: stloc.1 
    L_0032: xor 
    L_0033: conv.u1 
    L_0034: stloc.s num4
    L_0036: pop 
    L_0037: ldloc.s num4
    L_0039: ldloc.3 
    L_003a: stloc.s num4
    L_003c: stloc.3 
    L_003d: ldloc.s num4
    L_003f: ldc.i4.8 
    L_0040: shl 
    L_0041: ldloc.3 
    L_0042: or 
    L_0043: conv.u2 
    L_0044: stelem.i2 
    L_0045: ldc.i4.1 
    L_0046: add 
    L_0047: dup 
    L_0048: ldloc.0 
    L_0049: ldlen 
    L_004a: conv.i4 
    L_004b: blt.s L_0014
    L_004d: pop 
    L_004e: ldloc.0 
    L_004f: newobj instance void [mscorlib]System.String::.ctor(char[])
    L_0054: call string [mscorlib]System.String::Intern(string)
    L_0059: ret 
}

这里我不想过多解释IL，毕竟不是介绍 MSIL，如果你有兴趣，可以查阅MSDN、相关书籍或者Google一下。从IL代码来看，混淆逻辑使用了一个永远为true的条件（等效为 if(0<1) ），做了一次跳转，这才到真正的循环上，这里对string的每一个char进行遍历并处理，然后依次对char的高低位分别和参考量做异或运算，在交换高低位后做对高低位求或，其结果就是真实的字符串了。总结整理了一下，算法如下：

static string GetString(string source, int salt)
{
    int index = 0;
    char[] data = source.ToCharArray();
    salt += 0xe74d6d7; // This const data generated by dotfuscator
    while (index < data.Length)
    {
        char key = data[index];
        byte low = (byte)((key & '\x00ff') ^ salt++);
        byte high = (byte)((key >> 8) ^ salt++);
        data[index] = (char)((low << 8 | high));
        index++;
    }
    return string.Intern(new string(data));
}

由此可见，字符串混淆的代价是相当大的，对于商业应用来说，应该尽量避免，也就是说避免使用hard code字符串保存敏感信息。此外，以上字符串混淆只能阻碍静态逆向分析，因为在.NET所有的字符串对CLR Runtime Host都是透明的，如果hacker使用debugger或者类似ProcessExplorer之类的工具是很容易分析出字符串里的秘密的。

Twitter Facebook LinkedIn

张文清

Dotfuscator中字符串混淆算法

分享

猜您还喜欢

在 VMware 里运行 macOS Sequoia

hash codes

如何在 C# 中重复字符串

React 组件中避免业务逻辑示例