2020-06-14 14:49:34 -07:00
|
|
|
# String decoder
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2017-01-22 19:16:21 -08:00
|
|
|
<!--introduced_in=v0.10.0-->
|
|
|
|
|
2016-07-16 00:35:38 +02:00
|
|
|
> Stability: 2 - Stable
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2020-06-22 13:56:08 -04:00
|
|
|
<!-- source_link=lib/string_decoder.js -->
|
|
|
|
|
2022-04-20 10:23:41 +02:00
|
|
|
The `node:string_decoder` module provides an API for decoding `Buffer` objects
|
|
|
|
into strings in a manner that preserves encoded multi-byte UTF-8 and UTF-16
|
2016-05-23 14:30:48 -07:00
|
|
|
characters. It can be accessed using:
|
|
|
|
|
2024-10-26 17:36:25 -03:00
|
|
|
```mjs
|
|
|
|
import { StringDecoder } from 'node:string_decoder';
|
|
|
|
```
|
|
|
|
|
|
|
|
```cjs
|
2022-04-20 10:23:41 +02:00
|
|
|
const { StringDecoder } = require('node:string_decoder');
|
2016-05-23 14:30:48 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
The following example shows the basic use of the `StringDecoder` class.
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2024-10-26 17:36:25 -03:00
|
|
|
```mjs
|
|
|
|
import { StringDecoder } from 'node:string_decoder';
|
|
|
|
import { Buffer } from 'node:buffer';
|
|
|
|
const decoder = new StringDecoder('utf8');
|
|
|
|
|
|
|
|
const cent = Buffer.from([0xC2, 0xA2]);
|
|
|
|
console.log(decoder.write(cent)); // Prints: ¢
|
|
|
|
|
|
|
|
const euro = Buffer.from([0xE2, 0x82, 0xAC]);
|
|
|
|
console.log(decoder.write(euro)); // Prints: €
|
|
|
|
```
|
|
|
|
|
|
|
|
```cjs
|
2022-04-20 10:23:41 +02:00
|
|
|
const { StringDecoder } = require('node:string_decoder');
|
2016-01-17 18:39:07 +01:00
|
|
|
const decoder = new StringDecoder('utf8');
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2016-04-25 10:36:57 +08:00
|
|
|
const cent = Buffer.from([0xC2, 0xA2]);
|
2023-08-28 00:32:47 +09:00
|
|
|
console.log(decoder.write(cent)); // Prints: ¢
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2016-04-25 10:36:57 +08:00
|
|
|
const euro = Buffer.from([0xE2, 0x82, 0xAC]);
|
2023-08-28 00:32:47 +09:00
|
|
|
console.log(decoder.write(euro)); // Prints: €
|
2016-01-17 18:39:07 +01:00
|
|
|
```
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2016-05-23 14:30:48 -07:00
|
|
|
When a `Buffer` instance is written to the `StringDecoder` instance, an
|
|
|
|
internal buffer is used to ensure that the decoded string does not contain
|
|
|
|
any incomplete multibyte characters. These are held in the buffer until the
|
|
|
|
next call to `stringDecoder.write()` or until `stringDecoder.end()` is called.
|
|
|
|
|
2016-05-27 15:15:25 -04:00
|
|
|
In the following example, the three UTF-8 encoded bytes of the European Euro
|
|
|
|
symbol (`€`) are written over three separate operations:
|
2016-05-23 14:30:48 -07:00
|
|
|
|
2024-10-26 17:36:25 -03:00
|
|
|
```mjs
|
|
|
|
import { StringDecoder } from 'node:string_decoder';
|
|
|
|
import { Buffer } from 'node:buffer';
|
|
|
|
const decoder = new StringDecoder('utf8');
|
|
|
|
|
|
|
|
decoder.write(Buffer.from([0xE2]));
|
|
|
|
decoder.write(Buffer.from([0x82]));
|
|
|
|
console.log(decoder.end(Buffer.from([0xAC]))); // Prints: €
|
|
|
|
```
|
|
|
|
|
|
|
|
```cjs
|
2022-04-20 10:23:41 +02:00
|
|
|
const { StringDecoder } = require('node:string_decoder');
|
2016-05-23 14:30:48 -07:00
|
|
|
const decoder = new StringDecoder('utf8');
|
|
|
|
|
|
|
|
decoder.write(Buffer.from([0xE2]));
|
|
|
|
decoder.write(Buffer.from([0x82]));
|
2023-08-28 00:32:47 +09:00
|
|
|
console.log(decoder.end(Buffer.from([0xAC]))); // Prints: €
|
2016-05-23 14:30:48 -07:00
|
|
|
```
|
|
|
|
|
2019-12-24 15:10:12 -08:00
|
|
|
## Class: `StringDecoder`
|
2018-04-14 14:38:02 +03:00
|
|
|
|
2019-12-24 15:10:12 -08:00
|
|
|
### `new StringDecoder([encoding])`
|
2021-10-10 21:55:04 -07:00
|
|
|
|
2016-05-13 10:01:33 -07:00
|
|
|
<!-- YAML
|
|
|
|
added: v0.1.99
|
|
|
|
-->
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2020-02-22 18:00:28 +08:00
|
|
|
* `encoding` {string} The character [encoding][] the `StringDecoder` will use.
|
2018-04-02 04:44:32 +03:00
|
|
|
**Default:** `'utf8'`.
|
2012-04-19 23:32:58 -07:00
|
|
|
|
2016-05-23 14:30:48 -07:00
|
|
|
Creates a new `StringDecoder` instance.
|
|
|
|
|
2019-12-24 15:10:12 -08:00
|
|
|
### `stringDecoder.end([buffer])`
|
2021-10-10 21:55:04 -07:00
|
|
|
|
2016-05-13 10:01:33 -07:00
|
|
|
<!-- YAML
|
|
|
|
added: v0.9.3
|
|
|
|
-->
|
2012-10-11 15:53:11 -07:00
|
|
|
|
2023-08-25 01:00:28 +09:00
|
|
|
* `buffer` {string|Buffer|TypedArray|DataView} The bytes to decode.
|
2018-04-11 21:07:14 +03:00
|
|
|
* Returns: {string}
|
2016-05-23 14:30:48 -07:00
|
|
|
|
|
|
|
Returns any remaining input stored in the internal buffer as a string. Bytes
|
|
|
|
representing incomplete UTF-8 and UTF-16 characters will be replaced with
|
|
|
|
substitution characters appropriate for the character encoding.
|
2015-11-04 12:36:11 -05:00
|
|
|
|
2016-05-23 14:30:48 -07:00
|
|
|
If the `buffer` argument is provided, one final call to `stringDecoder.write()`
|
|
|
|
is performed before returning the remaining input.
|
2020-10-10 20:22:02 +04:00
|
|
|
After `end()` is called, the `stringDecoder` object can be reused for new input.
|
2016-05-23 14:30:48 -07:00
|
|
|
|
2019-12-24 15:10:12 -08:00
|
|
|
### `stringDecoder.write(buffer)`
|
2021-10-10 21:55:04 -07:00
|
|
|
|
2016-05-13 10:01:33 -07:00
|
|
|
<!-- YAML
|
|
|
|
added: v0.1.99
|
2017-02-22 00:10:25 +01:00
|
|
|
changes:
|
2017-03-15 20:26:14 -07:00
|
|
|
- version: v8.0.0
|
2017-02-22 00:10:25 +01:00
|
|
|
pr-url: https://github.com/nodejs/node/pull/9618
|
|
|
|
description: Each invalid character is now replaced by a single replacement
|
|
|
|
character instead of one for each individual byte.
|
2016-05-13 10:01:33 -07:00
|
|
|
-->
|
2015-11-04 12:36:11 -05:00
|
|
|
|
2023-08-25 01:00:28 +09:00
|
|
|
* `buffer` {string|Buffer|TypedArray|DataView} The bytes to decode.
|
2018-04-11 21:07:14 +03:00
|
|
|
* Returns: {string}
|
2016-05-23 14:30:48 -07:00
|
|
|
|
|
|
|
Returns a decoded string, ensuring that any incomplete multibyte characters at
|
2021-10-10 21:55:04 -07:00
|
|
|
the end of the `Buffer`, or `TypedArray`, or `DataView` are omitted from the
|
|
|
|
returned string and stored in an internal buffer for the next call to
|
|
|
|
`stringDecoder.write()` or `stringDecoder.end()`.
|
2020-02-22 18:00:28 +08:00
|
|
|
|
2021-07-04 20:39:17 -07:00
|
|
|
[encoding]: buffer.md#buffers-and-character-encodings
|