Compare commits

...

4 Commits

Author SHA1 Message Date
325d674114 add build-programming-languages 2023-12-14 22:58:38 +07:00
3842202c21 add abusing-service-worker-to-protect-your-website 2023-12-14 22:58:06 +07:00
135cf0df6b update index page 2023-12-14 22:57:51 +07:00
729cec9211 fix katex settings 2023-12-14 22:57:26 +07:00
4 changed files with 331 additions and 24 deletions

View File

@ -140,9 +140,7 @@ projects =
div []
[ h1 [] [text "My Projects"]
, div []
[ text "2023"
, text " "
, h2 [] [text "TSShock"]
[ h2 [] [text "(2023) TSShock"]
, withSpacing (p [])
[ text "At Verichains, our team discovered multiple weaknesses in most implementations of Threshold ECDSA Signature Scheme following the works of"
, Link.link (Link.external "https://eprint.iacr.org/2019/114") [] [text "Gennaro and Goldfeder."]
@ -154,9 +152,7 @@ projects =
]
]
, div []
[ text "2023"
, text " "
, h2 [] [text "Audited Vietnam Citizen Card"]
[ h2 [] [text "(2023) Audited Vietnam Citizen Card"]
, withSpacing (p [])
[ text "Performed auditing of the protocol and the chip-based Citizen Card of Vietnam."
, text "Simulation of NFC protocols conforming to ICAO 9303."
@ -167,9 +163,7 @@ projects =
]
]
, div []
[ text "2020 - 2023"
, text " "
, h2 [] [text "Mach-O binary format analysis and obfuscation"]
[ h2 [] [text "(2019 - 2023) Mach-O binary format analysis and obfuscation"]
, withSpacing (p [])
[ text "Research into Mach-O binary format, which is used in Apple devices."
, text "Proposed obfuscation for the Mach-O binary."
@ -177,9 +171,7 @@ projects =
]
]
, div []
[ text "2021"
, text " "
, h2 [] [text "LLVM based Obfuscation"]
[ h2 [] [text "(2021-2022) LLVM based Obfuscation"]
, withSpacing (p [])
[ text "Build a LLVM based obfuscation compiler."
, text "Extend"
@ -196,9 +188,7 @@ projects =
]
]
, div []
[ text "2019-2020, 2022-2023"
, text " "
, h2 [] [text "Windows Live Memory Forensics"]
[ h2 [] [text "(2019-2023) Windows Live Memory Forensics"]
, withSpacing (p [])
[ text "Research into Windows Forensics."
, text "Learned techniques used in Memory Forensics and familiar with tools like Volatility."
@ -235,23 +225,23 @@ publications =
]
, br [] []
, withSpacing (div [])
[ text "(Draft) Obfuscate API calls in Mach-O Binary."
[ text "Obfuscate API calls in Mach-O Binary."
, text "Anh Khoa Nguyen."
, text "Expecting 2024."
, br [] []
, Link.link (Link.external "macho-obfuscation.pdf")
[Attributes.target "_blank"]
[text "[pdf]"]
[text "[preprint]"]
]
, br [] []
, withSpacing (div [])
[ text "(Draft) Live Memory Forensics Without RAM Extraction."
[ text "Live Memory Forensics Without RAM Extraction."
, text "Anh Khoa Nguyen, Dung Vo Van Tien."
, text "Expecting 2024."
, br [] []
, Link.link (Link.external "live-memory-forensics.pdf")
[Attributes.target "_blank"]
[text "[pdf]"]
[text "[preprint]"]
]
, br [] []
, h2 [] [text "Dissertations"]

View File

@ -0,0 +1,192 @@
---
title: "Abusing Service Worker for web protection"
subtitle: ""
summary: ""
tags: []
categories: []
published: "2023-12-14"
featured: false
draft: false
---
# Intro
Recently, I have been testing on service worker. And I thought of an idea that we can use to cover the traces of web protection. This is a draft idea and have not been PoC, but in theory, it should work.
# What is a Worker
I have been working on the web for so many years, yet Web Worker is something that I do not work on. Mainly because the Javascript is so powerful and most of the web applications use Javascript only. But the browser has a feature for us to run Javascript on another thread.
We all know that Javascript is single-threaded. But the browser runs in multi-thread (obviously), and it can control multiple contexts of Javascript. This is why the browser can run multiple tabs at a time, using multi-threading architecture. So someone also thought that for each webpage, we can have background threads. And it is defined as Web Workers.
There are multiple types of "worker". A dedicated worker is a worker that can be accessed by the script that calls the worker. A shared worker can be accssed by any scripts.
The communication between worker and the main thread is quite complicated. The simplest communication method is through the use of `postMessage` function and `onmessage` event. Yes they are event-based APIs, Javascript is event-based fyi.
## What about Service Worker
Service Worker is a special kind of worker. This is a worker that can monitor the whole webpage after it has been *installed*. Service Worker only needs to be created one time and it will be installed in the context of the webpage.
The most common use of Service Worker is caching of resources. Usually resources fetching logic is embeded in the application. Each fetch calls connect to the server to download the resource but it can be cached and the Service Worker is mostly use for this usecase.
It has this usecase because the Service Worker can intercept **fetch** invocations, and has a separate storage.
## Service Worker tutorial
To continue, we should know how to setup a Service Worker. I should stress that worker cannot be run in a **file** manner, which means we must host those file through a web server. So let's build a simple web server with Flask, because I do not like `node_modules` piling up the storage for simple application.
```py
from flask import Flask, make_response
app = Flask("dummy test server")
index = """
<head>
<!-- starting script -->
<script type="text/javascript" src="/index.js"></script>
</head>
<body>
</body>
"""
@app.route("/")
def main():
return index
@app.route("/index.js")
def bshield():
bshieldjs = open("index.js").read()
return bshieldjs
# service worker to hook request
# this uses worker.wasm for hooking logic
# mainly to hook fetch() requests or resources fetch on html load
@app.route("/sw.js")
def service_worker():
sw = open("sw.js").read()
resp = make_response(sw, 200)
resp.headers['Service-Worker-Allowed'] = '/'
resp.content_type = "text/javascript"
return resp
if __name__ == '__main__':
app.run(host="localhost", port=3000, debug=True)
```
Our server exposes 3 endpoints, the `/` hosting the `index.html` file, `/index.js` hosting the main script `index.js` and `/sw.js` hosting the service worker code.
> The basic way to create a worker is through files. Blobs can also be used.
The service worker must be specified to be of type `text/javascript` and the `Service-Worker-Allowed` field in the header set to `/` to denote the scope. The scope basically tells under what sub-page the service worker can be run, `/` means it can be run on any page of our website.
To "start" or "create" a service worker, we put the following code in `index.js`.
```javascript
navigator.serviceWorker.register(
'/sw.js',
{ scope: '/' }
}).then(reg => {
if (reg.installing) {
const sw = reg.installing || reg.waiting;
sw.onstatechange = function() {
if (sw.state === 'installed') {
setTimeout(function() {
window.location.reload();
}, 0);
}
};
}
})
.catch(error => console.log(error))
```
It should be straight forward to understand the code. We "register" a service worker where the logic is at `/sw.js`, and the scope is `/`. Then we register a state change event callback to refresh the page if the service worker is **installed**. The refresh is required because the first time the service worker is "created", it cannot capture the current page.
To intercept the fetch invocations, we put the following code in the `sw.js`.
```javascript
self.onfetch = (event) => {
if (/* no-intercep */) {
return;
}
// return another response object
event.respondWith(fetch(""));
}
```
Now the service worker can intercept fetch invocations and we can monitor or replace the response. The response can be replaced with another fetch (redirect), or static content.
So it can be used for intercepting requests but not only requests. It can also intercept resources downloaded at the HTML parsing stage. This means that all resources in the HTML can be monitored by the service worker, including `<script>`, `<img>`, `<link>` tags.
> I still have not tested the `<link>` tag
This feature will be used later for web protection.
# Website protection
Current web protection relies on many strategies. Most commonly, methods such as obfuscating the sourcecode and protection code insertion are used. Obfuscation is not unrelated to our context, so I will explain how protection code insertion can protect the website.
Obviously, protection should prevent tampering of sourcecode as well as debugging. Because the Javascript is a dynamic environment it is easy to tamper with the runtime. The goal for protection should be to limit the possibility for such tampering to happen. And a straight forward method for this is through insertion of checkers, preventors, traps and let it load together when the website is rendering. The code is periodically run to check for abnormal behaviour also.
I will not go into how the prevention is performed, but I can at least say that these code are often included as a separated script from the "normal operation" code. A demonstration will be a HTML that loads like below:
```html
<head>
<!-- protection script -->
<script type="text/javascript" src="/protect.js"></script>
<!-- starting script -->
<script type="text/javascript" src="/index.js"></script>
</head>
<body>
</body>
```
In the code above, `index.js` is the application code, and the `protect.js` is the protection script.
# Abusing Service Worker for Website protection
## Simple case
Let's explore how we can use Service Worker and incorperate them to the protection of websites. I started with having the `protect.js` loads a Service Worker. The logic for the `protect.js` should **only loads the Service Worker** instead of performing protection logic.
Let's put all protection logic into a different script called `protect-logic.js`. We now define the Service Worker as below:
```javascript
self.onfetch = (event) => {
if (event.request.url === "https://root/protect.js") {
event.respondWith(fetch("/protect-logic.js"));
return;
}
}
```
What this does is, when the webpage with Service Worker installed, by the HTML resource, the `protect.js` is requested, but intercepted by the Service Worker to return a different file, `protect-logic.js`. In the `protect-logic.js`, the code to load the Service Worker is also there for Service Worker management. But the rest of the code can be the protection code.
## Advanced case
`event.respondWith` is compatible with any kind of data as long as it represents the object Response. Instead of redirecting to another request invocation, we can actually build a custom response that returns a file content. It becomes:
```javascript
const res = Respond(body, {status, headers});
event.respondWith(res);
return;
```
Let body be an array of characters (`Blob`, `ArrayBuffer`) and we can put the script in memory. With the script encrypted, our Service Worker decrypts the content when the resource is required and return.
Using the same logic with the application script, e.g., `index.js`, and its resources (css and images). The Service Worker is responsible for all resource resolvements.
By the logic of redirection, we can actually use fake names to load files. Instead of specifying `index.js` and `protect.js`, we can put random names like `a.js`, `b.js` with their corresponding mapping to files and build the logic for Service Worker to resolve the correct file.
## Extreme case
The Service Worker code is currently written in Javascript and easy to understand the logic. I would not disclose how, but the Service Worker logic can be run through the Web Assembly. And Web Assembly is hard to read so it adds a layer of protection.
Combined, the Service Worker can be used to build a custom file resolver for all resources required. The user cannot disable the Service Worker (through Developer's Application -> Bypass for network) because it is responsible for files requests.
I should note that there was a [research](https://repositorio-aberto.up.pt/bitstream/10216/136432/2/499388.pdf) using Service Worker for intercepting requests and filter them out to prevent third-party libraries making "malicious requests" to "known server" (supply chain attack).

View File

@ -0,0 +1,112 @@
---
title: "Building Programming Languages"
subtitle: ""
summary: ""
tags: []
categories: []
published: "2023-12-10"
featured: false
draft: false
---
# Building Programming Languages
Lets look into how to build a programming language. We start with informally building a language, then moving forward to a formal representation.
Programming languages are sophisticated. It was a hard task building even a simple programming language without sufficient knowledge in multiple areas of computer science. However, it can be simplified although it will be informal and might be unsound. In this post, I want to introduce developers to the behind-the-scene of programming languages, guide you through the steps of building one informally, and at the last step, lay them out in a formal
## Informal representation of programming languages
Developers/Programmers look at programming languages differently from the point-of-view of theorists. This perception is not wrong, but it might not deliver the full meaning/context. If you are a developer or a programmer, you might already have a sense for programming languages although informally. In this section, I introduce you to the informal representation of programming languages, and how to build a simple programming languages.
### Syntax
The first and foremost important part of a programming language is the syntax. The syntax is the aspect of programming languages that is commonly criticized by programmers. For many people, a programming languages signature is the syntax itself, and they are not wrong in some sense. The syntax describe how a series of characters can be laid out to represent parts of programs, including functions, statements and expressions.
Deciding on the syntax of a programming language of your own can be a hard task, fortunately, we have some templates from existing (famous) programming languages. One of the “easy” syntax that is often used as an example for compiler classes is the curly braces with functions declarations and statements in popular languages like C or Golang.
After choosing a syntax for a programming language, a set of special keywords should be assigned as well. These give meaning to the programming language for humans to read. These keywords are usually “var”, “let”, “int”, “const”, “if”, “else”, “for”, “while”. Suppose we want following programming language syntax:
```js
function name(a, b, c) {
let x = 1;
if (x = 1) {
x += 1;
} else {
x += 2;
}
for (i = 0; i < 10; i += 1) {
x += 1;
}
}
```
Then we should define keywords to be: “function“, “let“, “if“, “else”, “for“. After the general syntax and all keywords are defined, we begin to write a parser. The parser simple build a compact representation of a programs from a programming language syntax. From the above example, we would want a parser to generate objects like below:
```js
// not syntatically correct
let program = new Function(new Name("name"), [new Arg("a"), new Arg("b"), new Arg("c")], [
new Let(new Var("x"), 1),
new If(new Var("x") = 1, [new Var("x") += 1,], [new Var("x") += 2,]),
new For(new Var("i") = 0, new Var("i") < 10, new Var("i") += 1, [
new Var("x") += 1,
])
])
```
It might not look “compact” if we write everything out, but internally, the program object is information-compact and we can use it instead of moving through a list of characters/words finding for information. This compactness representation of programs is called **Abstract Syntax Tree**, commonly called AST. It represents every parts of a program abstractly.
After successfully building the AST for a program, the program can be run given a machine (another program) capable of stepping through each statement. This is where things get complicated and I will go through these complications later.
We have been talking about functions statements, and expressions, yet their meanings have not been discussed thus far. These terms are important in understanding of programming languages and should be thoroughly explained.
Functions are familiar to programmers, initially named sub-routine to denote parts of program that can will be reused multiple times. Today, in the context of programming language, functions serve multiple purposes: (1) sub-routine that can be reused across programs; (2) perform calculation from inputs, so called arguments. Statements are the most basic parts of a program that can be executed. Example statements are performing atomic calculations (x += 1), assignment (x = 1), if statement, for statement. Usually, statements are designed so that they can be nested, this will be discussed formally in later sections. Expressions are the smallest parts that can perform calculations. These are usually mathematic operations and must return a value.
### Checking programs
Not all written programs can be run successfully. There are many factors that affect the program making it not able to run, or would yield errors. To eliminate or at least reduce the errors, checks are performed at many different stages. We will examine the common and widely used check of variable availability.
Variables are everywhere in a program, with our limited use of vocabulary, programmers often use variables with the same name across their program in different context. The task of a checker is to perform checks for a variable if it is available at the current context. The context that we are referring to is usually called the “scope”. In “curly braces” programming languages, the scope can be thought of as between a pair of “curly braces” {}. This scope is usually defined to allow inheritance, where variable in the outside scope can be accessible from the inside scope. This checker also define what variable, memory space, is used for a statement or expression.
### Optimizing programs
Optimizations are frequently applied to a program to remove surplus information that usually not affecting the program logic. They are also used to remove parts of code that does not affect the result (of a function, expression, ...). Another kind of optimization focuses on the pre-computation of programs, where some expressions are pre-calculated and replaced with a simpler (cheaper) operation or constant. A special kind of optimization focuses on the way CPU works and rewrite the program so that it performs better.
All these optimizations are applied with a simple goal in mind, making programs run faster and smaller.
### Runtime
When talking about a programming language's runtime, programmers often assume that it only valids for languages like Java, because of the Java Virtual Machine. However, a programming language's runtime is a much broader concept that involves everything that a program needs to run (this would also include the Operating System). It is worth to get through all of the components at least at the surface level to understand how programming languages run/are executed.
There are several ways to make a program run for a specific programming languages. This is often by the design of the language and its runtime. In my observation, there are three types a programming language is designed to run.
1. Interpreted
2. Virtual Machine
3. Native code
Even with 3 types of design, not all programming languages runtime are designed similarly. Often each of them has a different runtime design to support their own language. Although not commonly seen, but all of them can be designed into a general runtime that can support multiple languages, such as the case for JVM (and the more modern GraalVM).
#### Interpreted
The program is run in a controlled manner by another program called the interpreter. The interpreter reads the program, understand the statements and perform them through the defined logic. In a sense, the interpreter acts as a coordinator between the program and its logic.
Languages designed with this kind of runtime are suitable for quick development. Just write and the interpreter will run the program immediately with little time in between. Famous languages that uses this design is Python and Javascript.
Both **Python** and **Javascript** are interpreted languages by design (officially). Although in fact, for Python, it uses a custom bytecode and everything is compiled into that bytecode before running; for Javascript, it compiles frequently used functions into the machine code. Nevertheless, they are interpreted languages suitable for fast development.
#### Virtual Machine
Instead of directly read a program and run them, some languages are companied with a Virtual Machine. These Virtual Machines are a program that reads a series of (simple?) instructions and perform them. For languages designed to be used with a Virtual Machine, the compiler transforms the program in programming languages into a series of instructions. Later on, to execute the program, these instructions are fed to the Virtual Machine and are run by the machine.
Virtual Machine manages the running program in all aspect, including memory usage, variables, classes, etc. Making a Virtual Machine is hard, but having a common Virtual Machine that runs on multiple platform allows a program to be written in one language and use them across multiple platform.
Such is the idea of the JVM. When multiple platforms emerged, people would write the same program for each platform and have to manually dealt with differences between platforms. Sun Micro System designed and implemented a Virtual Machine that compatible with multiple platforms and allow people to write program in a universal language of Java.
A Virtual Machine can be targeted and design a language that uses this Virtual Machine. Which is the idea behind Kotlin and Closure. These two languages "informally" proposed a new syntax for Java and work well with the Java Virtual Machine.
#### Native code
## Why informal is not enough
## Formalize programming language

View File

@ -20,21 +20,34 @@ export default {
onload="renderMathInElement(document.body);"></script>
<script defer type="text/javascript">
// delay until the whole page is rendered to run Katex
setTimeout(() => {
const katexRender = () => {
renderMathInElement(document.body, {
// customised options
// • auto-render specific keys, e.g.:
delimiters: [
{left: '$$', right: '$$', display: true},
{left: '$', right: '$', display: false},
{left: '\\(', right: '\\)', display: false},
{left: '\\[', right: '\\]', display: true}
// {left: '\\(', right: '\\)', display: false},
// {left: '\\[', right: '\\]', display: true}
],
// • rendering keys, e.g.:
throwOnError : true
});
}, 1000);
};
// delay until the whole page is rendered to run Katex
setTimeout(() => {
katexRender();
}, 500);
// set katex to render everytime the page url changes
// single-page problem
var pushState = history.pushState;
history.pushState = function() {
console.log("bruh, url changes")
pushState.apply(history, arguments);
setTimeout(() => {katexRender();}, 500);
};
</script>
<style>