Just Do Swift

Just Do Swift

Working with Text on Apple Devices: Key Natural Language Tricks You Can Use

Some Approaches That Have Helped in Real Projects

Peng Ge's avatar
Peng Ge
Aug 05, 2025
∙ Paid

Building smarter apps is something many of us are aiming for.

Apple’s Natural Language framework gives us a surprisingly broad set of tools—right on-device—but real-world usage often raises more questions than official docs answer.

This guide collects five NLP techniques you might find useful, with code samples, practical scenarios, and the sorts of caveats I wish someone had mentioned early on.


1. Language Identification with NLLanguageRecognizer

What it does:
Tries to figure out what language a piece of text is in.

When this helps:

  • Automatically choosing localization, keyboard, or spellcheck

  • Validating or tagging user-generated content in a multilingual app

  • Flagging unexpected input for further review

Code sample:

import SwiftUI
import NaturalLanguage

struct LanguageIdentifyView: View {
    @State private var text = "Natural Language can automatically identify languages."

    var body: some View {
        VStack(alignment: .leading, spacing: 12) {
            TextEditor(text: $text).frame(height: 120).border(.secondary)
            Button("Identify Language") { }
                .buttonStyle(.bordered)
            ForEach(topLanguages(for: text), id: \.0) { code, prob in
                Text("\(code)  \(String(format: "%.2f", prob))").monospaced()
            }
        }
        .padding()
    }

    private func topLanguages(for text: String) -> [(String, Double)] {
        let recognizer = NLLanguageRecognizer()
        // If your app expects certain languages more often, try languageHints.
        // recognizer.languageHints = [.english: 0.8, .chinese: 0.2]
        recognizer.processString(text)
        let hyps = recognizer.languageHypotheses(withMaximum: 3)
        return hyps.map { ($0.key.rawValue, $0.value) }
            .sorted { $0.1 > $1.1 }
    }
}

A few observations:

  • Works best with longer sentences. For very short or ambiguous input, results may be less certain.

  • If you see low confidence scores, consider asking users to clarify.

  • Adding languageHints can improve accuracy when you have a good guess about the user’s likely language.


2. Robust Tokenization with NLTokenizer

What it does:
Breaks up text into words, sentences, or paragraphs—in a way that works even for languages without spaces, like Chinese or Japanese.

Typical use cases:

  • Counting words or sentences

  • Highlighting, search, or text selection

  • Handling user input that mixes multiple languages

Code sample:

import SwiftUI
import NaturalLanguage

struct TokenizeView: View {
    @State private var text = "I love Natural Language Processing. Swift makes it fun!"
    
    var body: some View {
        VStack(alignment: .leading, spacing: 12) {
            TextEditor(text: $text).frame(height: 120).border(.secondary)
            Text("Tokens:").bold()
            ScrollView {
                LazyVStack{
                    ForEach(tokens(in: text), id: \.self) { token in
                        Text(token).padding(6).background(.thinMaterial).cornerRadius(8)
                    }
                }
            }
            
        }
        .padding()
    }

    private func tokens(in text: String) -> [String] {
        let tokenizer = NLTokenizer(unit: .word)
        tokenizer.string = text
        // tokenizer.setLanguage(.simplifiedChinese) // For better accuracy in CJK languages
        var result: [String] = []
        tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, _ in
            result.append(String(text[range]))
            return true
        }
        return result
    }
}

Worth noting:

  • .word, .sentence, .paragraph, and .document are all supported units.

  • Works out-of-the-box for most languages, but setting the tokenizer language can sometimes help.

  • You don’t need to worry about weird edge cases like emoji—NLTokenizer handles most of those for you.

Heads-up:
If you’re coming from regular expressions or simple .split calls, the improvement in languages like Chinese is huge. Still, results aren’t always perfect—if your app depends on “perfect” tokenization, you might want to test with real user data.

Keep reading with a 7-day free trial

Subscribe to Just Do Swift to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Just Do Swift
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture