Today while I was writing a unit test for a web crawler,  I ran into an encoding error. What I did was download a webpage with curl, then tried to make my mock Client return that. It had worked with another page, but not that one. Let's fix it.

My code looked like this:

import 'dart:io';

import 'package:gpu_benchmarks/game_crawler.dart';
import 'package:http/http.dart';
import 'package:mockito/annotations.dart';
import 'package:mockito/mockito.dart';
import 'package:test/test.dart';

import 'videocardbenchmark_crawler_test.mocks.dart';

@GenerateMocks([Client])
main() {
  test('Crawls a games page', () async {
    const url = '[some url]';
    final client = MockClient();
    final uri = Uri.parse(url);
    final mockResponse =
        Response(File('./test/games.html').readAsStringSync(), 200);
    when(client.get(any)).thenAnswer((_) => Future.value(mockResponse));
    final r = await GameCrawler.crawl(uri, client);
    expect(r.length, equals(0));
  });
}

The error came from the line when I am initializing Response:

Invalid argument (string): Contains invalid characters.: "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n\n...
...
...
dart:convert                                     Latin1Codec.encode
new Response
package:http/src/response.dart:37

Here's what the constructor for Response looks like:

_encodingForHeaders creates an Encoder based on the headers

Here's the code for _encodingForHeaders:

/// Returns the encoding to use for a response with the given headers.
///
/// Defaults to [latin1] if the headers don't specify a charset or if that
/// charset is unknown.
Encoding _encodingForHeaders(Map<String, String> headers) =>
    encodingForCharset(_contentTypeForHeaders(headers).parameters['charset']);

So it tries to find the charset in the headers.

/// Returns the [Encoding] that corresponds to [charset].
///
/// Returns [fallback] if [charset] is null or if no [Encoding] was found that
/// corresponds to [charset].
Encoding encodingForCharset(String? charset, [Encoding fallback = latin1]) {
  if (charset == null) return fallback;
  return Encoding.getByName(charset) ?? fallback;
}

And if it can't find it in the header, then it'll fall back to latin1. I find it odd that it wouldn't fall back to utf8, but ok. Let's try and pass a charset in the header, using one of those constants. I want to use "utf-8":

So here's how I pass it:

final mockResponse = Response(
        File('./test/games.html').readAsStringSync(), 200,
        headers: {'charset': 'utf-8'});

Unfortunately, it still threw the exception. Looking at the code again, it turns out Response doesn't look at the charset property in the headers that I passed. Instead it parses the content-type from the headers, then extracts the charset from that:

/// Returns the encoding to use for a response with the given headers.
///
/// Defaults to [latin1] if the headers don't specify a charset or if that
/// charset is unknown.
Encoding _encodingForHeaders(Map<String, String> headers) =>
    encodingForCharset(_contentTypeForHeaders(headers).parameters['charset']);

/// Returns the [MediaType] object for the given headers's content-type.
///
/// Defaults to `application/octet-stream`.
MediaType _contentTypeForHeaders(Map<String, String> headers) {
  var contentType = headers['content-type'];
  if (contentType != null) return MediaType.parse(contentType);
  return MediaType('application', 'octet-stream');
}

The parsing is done here:

/// Parses a media type.
  ///
  /// This will throw a FormatError if the media type is invalid.
  factory MediaType.parse(String mediaType) =>
      // This parsing is based on sections 3.6 and 3.7 of the HTTP spec:
      // http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html.

I was unfamiliar with content-types, but Google gives a few common examples, and it looks like "text/html; charset=utf-8" should do the trick. And indeed, this code doesn't throw an exception anymore:

final mockResponse = Response(
    File('./test/games.html').readAsStringSync(), 200,
    headers: {'content-type': 'text/html; charset=utf-8'});