content/ru/blog/2025/01-sitegen.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134

# Генератор статических сайтов в 90 строк Python кода

Создано: 16 апреля 2025

[In English](/en/blog/2025/01-sitegen)

Давным-давно я сделал небольшой сайт-визитку - он состоял из трех простеньких 
HTML-страниц, одного CSS-файла (который я генерировал из SCSS), нескольких 
шрифтов и картинок. Этого было более чем достаточно, чтобы ссылка на мой 
сайт красовалась в каком-либо резюме или профиле соцсети.

![Изображение старого сайта](/images/01-oldsite.png "Изображение старого сайта")

Недавно я решил продолжить работу <a href="https://github.com/blankhex/bhlib" target="_blank">над своим pet проектом</a> 
и хотел бы публиковать на своем сайте всякие заметки и статьи на эту тему.
Мне не хотелось вручную возиться с HTML-файлам, поэтому я решил подыскать
альтернативу в виде какого-нибудь статического генератора сайтов. В идеале, я 
хотел бы, чтобы он был:

- Небольшим и достаточно простым
- Мог работать с Markdown
- Мог выполнять подсветку синтаксиса в блоках кода

К сожалению, я не смог найти ни одно подходящее для себя решение, поэтому
я решил собрать свое на коленке используя Python, парсер Markdown'а <a href="https://mistune.lepture.com/en/latest/" target="_blank">mistune</a>,
шаблонизатор <a href="https://jinja.palletsprojects.com/en/stable/" target="_blank">Jinja2</a> 
и <a href="https://pygments.org" target="_blank">Pygments</a>. Весь процесс 
генерации сводиться к следующему:

1. Для каждого файла во входном каталоге проверяем, является ли он Markdown
   - Если да - преобразуем его в HTML (с подсветкой синтаксиса) и записываем 
     в выходной каталог
   - Если нет - копируем файл как есть в выходной каталог
2. Сжимаем содержимое выходного каталога

У данного процесса генерации есть достаточно крупный недостаток - из-за того, 
что нет постобработки HTML, любые ссылки на другие страницы Markdown должны 
заканчиваться расширением `.html`[^1].

[^1]: Эту проблему можно устранить с помощью специальной настройки веб-сервера,
  которая заменяет в расширение `.md` на `.html` или путем автоматического 
  дописывания расширения `.html` (к примеру: `try_files $uri $uri.html`).

Сам код генератора:

```python
import re, jinja2, mistune, shutil, os, pathlib, tarfile
from pygments.lexers import get_lexer_by_name
from pygments.formatters import HtmlFormatter
from pygments import highlight


class PygmentsHTMLRenderer(mistune.HTMLRenderer):
    def block_code(self, code: str, info = None):
        if not info:
            return '\n<pre><code>%s</code></pre>\n' % mistune.escape(code)
        lexer = get_lexer_by_name(info, stripall=True)
        formatter = HtmlFormatter(lineseparator='<br>')
        return highlight(code, lexer, formatter)


def convert_markdown(page: str):
    plugins = ['footnotes', 'table', 'strikethrough', 'url']
    renderer = PygmentsHTMLRenderer(escape=False)
    return mistune.create_markdown(plugins=plugins, renderer=renderer)(page)


def extract_title(page: str):
    matches = re.match('<h1>(.*?)</h1>', page)
    if matches:
        return matches.group(1)
    return 'BlankHex'


def handle_file(path: str, input_dir: str, output_dir: str, template_name: str):
    # Calculate input and output paths
    relpath = os.path.relpath(path, input_dir)
    input_path = path
    output_path = os.path.join(output_dir, relpath)
    if input_path.endswith('.md'):
        output_path = output_path.replace('.md', '.html')

    # Don't convert if output path exists
    if os.path.exists(output_path):
        return

    # Run conversion
    pathlib.Path(os.path.dirname(output_path)).mkdir(parents=True, exist_ok=True)
    if input_path.endswith('.md'):
        # Read Markdown document
        with open(input_path, 'r') as handle:
            markdown_page = handle.read()

        # Get Pygments styles for light and dark themes
        light_style = HtmlFormatter(style='default').get_style_defs()
        dark_style = HtmlFormatter(style='monokai').get_style_defs()

        # Convert Markdown document to HTML document
        html_page = convert_markdown(markdown_page)
        html_header = extract_title(html_page)
        environment = jinja2.Environment(loader=jinja2.FileSystemLoader('template/'))
        template = environment.get_template(template_name)
        output_page = template.render(title=html_header,
                                      body=html_page,
                                      light_style=light_style,
                                      dark_style=dark_style)

        # Write HTML document
        with open(output_path, 'w') as handle:
            handle.write(output_page)
    else:
        # Copy file as is
        shutil.copy(path, output_path)


def convert_dir(input_dir: str, output_dir: str, template_name: str):
    # Convert or copy every file from the input directory to the output directory
    for subdir, dirs, files in os.walk(input_dir):
        for file in files:
            handle_file(os.path.join(subdir, file), input_dir, output_dir, template_name)


# Remove output from previous run
if os.path.isdir('public'):
    shutil.rmtree('public')
if os.path.isfile('public.tgz'):
    os.remove('public.tgz')

# Run conversion
convert_dir('content', 'public', 'template.html')
with tarfile.open('public.tgz', 'w:gz') as tar:
    for file in os.listdir('public'):
        tar.add(os.path.join('public', file), file)
```