Table of Contents

Content Migration

Migrating existing documentation to a Documentation as Code approach requires careful planning and systematic execution. This guide provides strategies and tools for successfully transitioning from traditional documentation formats to DocFX-based Markdown documentation.

Migration Assessment

Document Inventory

Before beginning migration, create a comprehensive inventory of existing documentation:

# migration-inventory.yml
documentation_sources:
  - type: "Word Documents"
    location: "SharePoint/Teams"
    count: 45
    estimated_pages: 230
    priority: "high"
  
  - type: "Wiki Pages"
    location: "Azure DevOps Wiki"
    count: 120
    estimated_pages: 180
    priority: "medium"
  
  - type: "PDF Files"
    location: "File Shares"
    count: 15
    estimated_pages: 95
    priority: "low"

Content Assessment Matrix

Content Type Volume Complexity Priority Migration Method
API Documentation High Complex Critical Automated
User Guides Medium Moderate High Semi-automated
Release Notes High Simple High Automated
Procedures Medium Moderate Medium Manual
Legacy Archives Low Variable Low Selective

Migration Strategies

1. Big Bang Migration

Complete migration of all documentation at once.

Pros:

  • Single cutover event
  • Immediate consistency
  • No dual maintenance

Cons:

  • High risk
  • Significant downtime
  • Resource intensive

Best for: Small documentation sets, greenfield projects

2. Phased Migration

Gradual migration of documentation sections over time.

Pros:

  • Lower risk
  • Manageable workload
  • Learning opportunities

Cons:

  • Dual maintenance period
  • Potential inconsistencies
  • Longer timeline

Best for: Large documentation sets, active development

3. On-Demand Migration

Migrate documentation when it requires updates.

Pros:

  • Natural prioritization
  • Minimal upfront effort
  • Continuous improvement

Cons:

  • Indefinite timeline
  • Potential orphaned content
  • Inconsistent experience

Best for: Legacy documentation, resource-constrained teams

Automated Migration Tools

Pandoc Conversion

Pandoc provides excellent conversion capabilities for various formats:

# Convert Word documents to Markdown
pandoc -f docx -t markdown_strict --wrap=preserve \
  --extract-media=./images \
  input.docx -o output.md

# Convert HTML to Markdown
pandoc -f html -t markdown_strict \
  --wrap=preserve \
  input.html -o output.md

# Batch conversion script
for file in *.docx; do
  pandoc -f docx -t markdown_strict \
    --wrap=preserve \
    --extract-media=./images \
    "$file" -o "${file%.docx}.md"
done

Custom Migration Scripts

PowerShell script for processing multiple files:

# migration-script.ps1
param(
    [string]$SourcePath,
    [string]$OutputPath,
    [string]$Format = "docx"
)

# Create output directory
if (!(Test-Path $OutputPath)) {
    New-Item -ItemType Directory -Path $OutputPath -Force
}

# Process files
Get-ChildItem -Path $SourcePath -Filter "*.$Format" | ForEach-Object {
    $inputFile = $_.FullName
    $outputFile = Join-Path $OutputPath ($_.BaseName + ".md")
    
    Write-Host "Converting: $($_.Name)"
    
    # Run pandoc conversion
    & pandoc -f $Format -t markdown_strict `
        --wrap=preserve `
        --extract-media="$OutputPath/images" `
        $inputFile -o $outputFile
    
    # Post-process the file
    $content = Get-Content $outputFile -Raw
    $content = $content -replace '\\', '/'  # Fix image paths
    $content = $content -replace '\r\n', "`n"  # Normalize line endings
    Set-Content $outputFile $content -NoNewline
}

Content Transformation

Front Matter Addition

Add DocFX front matter to migrated files:

---
title: "Migrated Document Title"
description: "Brief description of the document content"
tags: ["tag1", "tag2", "category"]
category: "documentation"
difficulty: "beginner"
last_updated: "2025-07-06"
author: "Original Author"
migrated_from: "SharePoint/OriginalDocument.docx"
migration_date: "2025-07-06"
---

# Document Content

Convert internal links to Markdown format:

# link-converter.py
import re
import os

def convert_links(content, base_path):
    # Convert Word cross-references
    content = re.sub(
        r'See section "([^"]+)"',
        r'See [section \1](#\1)',
        content
    )
    
    # Convert file references
    content = re.sub(
        r'Reference: ([^\.]+\.docx)',
        r'Reference: [\1](\1.md)',
        content
    )
    
    # Convert image references
    content = re.sub(
        r'!\[([^\]]*)\]\(([^)]+)\)',
        lambda m: f'![{m.group(1)}](images/{os.path.basename(m.group(2))})',
        content
    )
    
    return content

Table Conversion

Ensure proper Markdown table formatting:

// table-formatter.js
function formatMarkdownTable(tableText) {
    const rows = tableText.split('\n').filter(row => row.trim());
    const formattedRows = rows.map(row => {
        const cells = row.split('|').map(cell => cell.trim());
        return '| ' + cells.join(' | ') + ' |';
    });
    
    // Add header separator
    if (formattedRows.length > 1) {
        const headerSeparator = '| ' + 
            formattedRows[0].split('|').slice(1, -1)
                .map(() => '---').join(' | ') + ' |';
        formattedRows.splice(1, 0, headerSeparator);
    }
    
    return formattedRows.join('\n');
}

Quality Assurance

Validation Checklist

Create a comprehensive validation process:

# validation-checklist.yml
content_validation:
  structure:
    - front_matter_present
    - headings_properly_formatted
    - toc_updated
    - file_naming_conventions
  
  content:
    - links_functional
    - images_accessible
    - tables_formatted
    - code_blocks_syntax_highlighted
  
  metadata:
    - tags_appropriate
    - category_assigned
    - author_attributed
    - dates_updated

Automated Testing

PowerShell script for validation:

# validate-migration.ps1
function Test-MarkdownFile {
    param([string]$FilePath)
    
    $content = Get-Content $FilePath -Raw
    $issues = @()
    
    # Check front matter
    if ($content -notmatch '^---\s*\n.*?\n---\s*\n') {
        $issues += "Missing front matter"
    }
    
    # Check for broken links
    $links = [regex]::Matches($content, '\[([^\]]+)\]\(([^)]+)\)')
    foreach ($link in $links) {
        $target = $link.Groups[2].Value
        if ($target -match '^[^#].*\.md$') {
            $targetPath = Join-Path (Split-Path $FilePath) $target
            if (!(Test-Path $targetPath)) {
                $issues += "Broken link: $target"
            }
        }
    }
    
    # Check for images
    $images = [regex]::Matches($content, '!\[([^\]]*)\]\(([^)]+)\)')
    foreach ($image in $images) {
        $imagePath = $image.Groups[2].Value
        $fullImagePath = Join-Path (Split-Path $FilePath) $imagePath
        if (!(Test-Path $fullImagePath)) {
            $issues += "Missing image: $imagePath"
        }
    }
    
    return $issues
}

# Validate all migrated files
Get-ChildItem -Path "docs" -Filter "*.md" -Recurse | ForEach-Object {
    $issues = Test-MarkdownFile $_.FullName
    if ($issues) {
        Write-Host "Issues in $($_.Name):" -ForegroundColor Yellow
        $issues | ForEach-Object { Write-Host "  - $_" }
    }
}

Migration Timeline Template

Phase 1: Planning (Week 1-2)

  • [ ] Complete content inventory
  • [ ] Define migration strategy
  • [ ] Set up tooling and scripts
  • [ ] Create validation procedures
  • [ ] Establish team responsibilities

Phase 2: Preparation (Week 3-4)

  • [ ] Convert high-priority content
  • [ ] Validate automation scripts
  • [ ] Create style guide updates
  • [ ] Train team on new processes
  • [ ] Set up staging environment

Phase 3: Migration (Week 5-8)

  • [ ] Execute migration plan
  • [ ] Validate converted content
  • [ ] Update links and references
  • [ ] Implement quality checks
  • [ ] Gather team feedback

Phase 4: Cleanup (Week 9-10)

  • [ ] Address validation issues
  • [ ] Optimize content structure
  • [ ] Update navigation and TOCs
  • [ ] Finalize style consistency
  • [ ] Document lessons learned

Common Challenges and Solutions

Challenge: Complex Formatting

Problem: Rich formatting doesn't translate well to Markdown

Solution:

  • Simplify formatting to essential elements
  • Use custom CSS for special formatting needs
  • Document formatting standards for future content

Challenge: Large Media Files

Problem: Images and videos increase repository size

Solution:

  • Optimize images before migration
  • Use external storage for large media
  • Implement Git LFS for binary files

Challenge: Cross-References

Problem: Internal document references break during migration

Solution:

  • Create a reference mapping document
  • Update links systematically
  • Use automated link checking

Challenge: Version History

Problem: Losing document history during migration

Solution:

  • Archive original documents
  • Document migration metadata
  • Preserve key revision information

Post-Migration Tasks

Content Optimization

  1. Structure Review: Reorganize content for better navigation
  2. SEO Enhancement: Add meta descriptions and optimize titles
  3. Cross-Linking: Create connections between related topics
  4. Search Optimization: Implement proper tagging and categories

Process Integration

  1. Workflow Documentation: Update content creation procedures
  2. Training Materials: Create guides for the new system
  3. Quality Standards: Establish review and approval processes
  4. Maintenance Schedule: Plan regular content reviews

Success Metrics

Track migration success with these metrics:

  • Content Completeness: Percentage of content successfully migrated
  • Quality Score: Validation test pass rate
  • User Adoption: Team usage of new documentation system
  • Time to Publish: Speed of content updates post-migration

This migration guide ensures a systematic approach to transitioning existing documentation to Documentation as Code, minimizing disruption while maximizing the benefits of the new system.