Address Validation
Validate Philippine addresses against the official PSGC masterlist using the barangay package. Fuzzy matching handles common misspellings, abbreviations, and unstandardized address formats.
Why Address Validation Matters
Philippine addresses are notoriously inconsistent. A single barangay might appear as:
- “Brgy. 291”
- “Barangay 291”
- “Barangay No. 291”
- “Brgy 291”
The barangay package solves this by matching addresses against the official Philippine Standard Geographic Code (PSGC) masterlist from the Philippine Statistics Authority (PSA), which covers all 42,010 barangays.
Installation
See Get Started — Installation. Quick one-liner:
pip install barangayUsing validate()
The validate() function provides a simple interface for address validation with a high-confidence default threshold of 95.0:
from barangay import validate
v = validate("Tongmageng, Sitangkai, Tawi-Tawi")
print(v.valid, v.matched_name, v.score) # True Tongmageng 100.0For addresses that use abbreviations or non-standard formats, lower the threshold:
v = validate("Brgy 291, City of Manila", threshold=80.0)
print(v.valid, v.matched_name, v.score) # False None NoneThe default threshold of 95.0 is strict. Abbreviations like “Brgy.” or “No.” and non-standard formats (“Pob.”, parentheticals) rarely clear it. Drop to 80.0 for lenient matching, or use sanitize_input() to normalize the text first.
ValidationResult Properties
| Property | Type | Description |
|---|---|---|
.input |
str |
Original input string |
.valid |
bool |
Whether a match was found above threshold |
.matched_name |
str \| None |
Name of the matched record |
.matched_psgc_id |
str \| None |
PSGC ID of the matched record |
.matched_record |
AdminDivRecord \| None |
Full matched record |
.score |
float \| None |
Confidence score |
Handling Misspellings with search_fuzzy()
When you need more than a pass/fail verdict — for example, to show candidate matches for manual review — use search_fuzzy(). It tolerates typos and alternative spellings and returns typed SearchResult objects with a similarity score:
from barangay import search_fuzzy
results = search_fuzzy("Tongmagen", threshold=60.0, limit=3)
for r in results:
print(f"{r.name} — score: {r.score}")Output:
| Barangay | Municipality | Province | Score |
|---|---|---|---|
| Tongmageng | Sitangkai | Tawi-Tawi | 94.7 |
| Tagen | Sarangani | Davao Occidental | 71.4 |
| Tonga | Balud | Masbate | 71.4 |
Each SearchResult exposes .name, .psgc_id, .score, .match_type, and .record (see the Search reference for the full property list).
Batch Validation with validate_many()
Validate many addresses in a single call and inspect each result programmatically:
from barangay import validate_many
results = validate_many([
"Tongmageng, Sitangkai, Tawi-Tawi",
"Brgy 291, City of Manila",
"Nonexistent Place",
])
for r in results:
status = "valid" if r.valid else "invalid"
print(f"{r.input!r} -> {status}")
# 'Tongmageng, Sitangkai, Tawi-Tawi' -> valid
# 'Brgy 291, City of Manila' -> invalid
# 'Nonexistent Place' -> invalidLower the threshold once for the whole batch when your data uses abbreviations or non-standard formats:
results = validate_many(
["Brgy 291, Manila", "Tongmageng, Sitangkai, Tawi-Tawi", "Poblacion, Davao City"],
threshold=80.0,
)
for r in results:
if r.valid:
print(f"{r.input!r} -> {r.matched_name} ({r.matched_psgc_id})")
else:
print(f"{r.input!r} -> NOT FOUND")Batch Validation with CLI
Validate a list of addresses from a file:
barangay batch validate addresses.txtEach line in addresses.txt should contain one barangay name.
Custom Sanitization
sanitize_input() strips common prefixes, suffixes, and noise tokens before matching. It is a plain helper — pair it with validate() or search_fuzzy() to clean free-text input:
from barangay import sanitize_input, validate, search_fuzzy
raw = "(Pob.) San Jose City"
clean = sanitize_input(raw, exclude=["(pob.)", " city"])
# Feed the cleaned string into the recommended API
v = validate(clean, threshold=80.0)
print(v.valid, v.matched_name)
# Or inspect candidates with search_fuzzy
for r in search_fuzzy(clean, threshold=60.0, limit=5):
print(r.name, r.score)sanitize_input() is cheap and stateless. Run it over every row of your dataset before calling validate_many() to normalize inconsistent casing, parentheticals, and redundant qualifiers.
Next Steps
- Search reference — full
search_fuzzy()parameter documentation - CLI Reference —
barangay batch validateoptions - Bulk Barangay Lookup — batch processing at scale
search() API
The deprecated search() function (raw dicts, create_fuzz_base()) is covered in Migrate from the legacy API. Use validate(), validate_many(), and search_fuzzy() for all new code.