Address Validation

Tutorial: validate Philippine addresses against the official PSGC masterlist. Fuzzy matching handles misspellings, abbreviations, and unstandardized formats with validate() and validate_many().
Author

bendlikeabamboo

Validate Philippine addresses against the official PSGC masterlist using the barangay package. Fuzzy matching handles common misspellings, abbreviations, and unstandardized address formats.

Why Address Validation Matters

Philippine addresses are notoriously inconsistent. A single barangay might appear as:

  • “Brgy. 291”
  • “Barangay 291”
  • “Barangay No. 291”
  • “Brgy 291”

The barangay package solves this by matching addresses against the official Philippine Standard Geographic Code (PSGC) masterlist from the Philippine Statistics Authority (PSA), which covers all 42,010 barangays.

Installation

See Get Started — Installation. Quick one-liner:

pip install barangay

Using validate()

The validate() function provides a simple interface for address validation with a high-confidence default threshold of 95.0:

from barangay import validate

v = validate("Tongmageng, Sitangkai, Tawi-Tawi")
print(v.valid, v.matched_name, v.score)  # True Tongmageng 100.0

For addresses that use abbreviations or non-standard formats, lower the threshold:

v = validate("Brgy 291, City of Manila", threshold=80.0)
print(v.valid, v.matched_name, v.score)  # False None None
TipLower the threshold for abbreviations

The default threshold of 95.0 is strict. Abbreviations like “Brgy.” or “No.” and non-standard formats (“Pob.”, parentheticals) rarely clear it. Drop to 80.0 for lenient matching, or use sanitize_input() to normalize the text first.

ValidationResult Properties

Property Type Description
.input str Original input string
.valid bool Whether a match was found above threshold
.matched_name str \| None Name of the matched record
.matched_psgc_id str \| None PSGC ID of the matched record
.matched_record AdminDivRecord \| None Full matched record
.score float \| None Confidence score

Handling Misspellings with search_fuzzy()

When you need more than a pass/fail verdict — for example, to show candidate matches for manual review — use search_fuzzy(). It tolerates typos and alternative spellings and returns typed SearchResult objects with a similarity score:

from barangay import search_fuzzy

results = search_fuzzy("Tongmagen", threshold=60.0, limit=3)
for r in results:
    print(f"{r.name} — score: {r.score}")

Output:

Barangay Municipality Province Score
Tongmageng Sitangkai Tawi-Tawi 94.7
Tagen Sarangani Davao Occidental 71.4
Tonga Balud Masbate 71.4

Each SearchResult exposes .name, .psgc_id, .score, .match_type, and .record (see the Search reference for the full property list).

Batch Validation with validate_many()

Validate many addresses in a single call and inspect each result programmatically:

from barangay import validate_many

results = validate_many([
    "Tongmageng, Sitangkai, Tawi-Tawi",
    "Brgy 291, City of Manila",
    "Nonexistent Place",
])
for r in results:
    status = "valid" if r.valid else "invalid"
    print(f"{r.input!r} -> {status}")
# 'Tongmageng, Sitangkai, Tawi-Tawi' -> valid
# 'Brgy 291, City of Manila' -> invalid
# 'Nonexistent Place' -> invalid

Lower the threshold once for the whole batch when your data uses abbreviations or non-standard formats:

results = validate_many(
    ["Brgy 291, Manila", "Tongmageng, Sitangkai, Tawi-Tawi", "Poblacion, Davao City"],
    threshold=80.0,
)
for r in results:
    if r.valid:
        print(f"{r.input!r} -> {r.matched_name} ({r.matched_psgc_id})")
    else:
        print(f"{r.input!r} -> NOT FOUND")

Batch Validation with CLI

Validate a list of addresses from a file:

barangay batch validate addresses.txt

Each line in addresses.txt should contain one barangay name.

Custom Sanitization

sanitize_input() strips common prefixes, suffixes, and noise tokens before matching. It is a plain helper — pair it with validate() or search_fuzzy() to clean free-text input:

from barangay import sanitize_input, validate, search_fuzzy

raw = "(Pob.) San Jose City"
clean = sanitize_input(raw, exclude=["(pob.)", " city"])

# Feed the cleaned string into the recommended API
v = validate(clean, threshold=80.0)
print(v.valid, v.matched_name)

# Or inspect candidates with search_fuzzy
for r in search_fuzzy(clean, threshold=60.0, limit=5):
    print(r.name, r.score)
TipReuse across batches

sanitize_input() is cheap and stateless. Run it over every row of your dataset before calling validate_many() to normalize inconsistent casing, parentheticals, and redundant qualifiers.

Next Steps

NoteMigrating from the legacy search() API

The deprecated search() function (raw dicts, create_fuzz_base()) is covered in Migrate from the legacy API. Use validate(), validate_many(), and search_fuzzy() for all new code.