Bulk Barangay Lookup
Perform bulk lookups and searches across the complete PSGC dataset of 42,010 barangays, 1,493 municipalities, 150 cities, 82 provinces, and 18 regions.
Why Bulk Lookup?
When cleaning large datasets of Philippine addresses or geographic references, you need efficient batch processing. The barangay package provides:
- Database views with
to_frame()andto_dicts()for direct export search_fuzzy()for typed fuzzy search resultsvalidate_many()for batch address validation- CLI batch commands for file-based processing
- Direct iteration over views for filtering and joins
Installation
See Get Started — Installation. Quick one-liner:
pip install barangayMethod 1: Database Views (Recommended)
Export data directly from database views for the most common use cases:
from barangay import barangays
# Export all barangays to pandas DataFrame
df = barangays.to_frame()
print(df.shape) # (42010, 15)
print(df.columns.tolist())
# ['name', 'type', 'psgc_id', 'parent_psgc_id', 'nicknames', 'extensions',
# 'region', 'province', 'highly_urbanized_city', 'independent_component_city',
# 'component_city', 'municipality', 'submunicipality',
# 'special_geographic_area', 'barangay']
# Export as list of dicts
data = barangays.to_dicts()
print(len(data)) # 42010Fuzzy Search on a View
Search within a specific admin level:
results = barangays.search_fuzzy("Tongmageng, Tawi-Tawi", threshold=60.0, limit=5)
for r in results:
print(f"{r.name} ({r.psgc_id}) — score: {r.score}")Batch Validation
from barangay import validate_many
addresses = [
"Tongmageng, Tawi-Tawi",
"Barangay 291, Manila",
"Poblacion, Cebu City",
]
results = validate_many(addresses, threshold=80.0)
for r in results:
if r.valid:
print(f"{r.input!r} -> {r.matched_name} ({r.matched_psgc_id})")
else:
print(f"{r.input!r} -> NOT FOUND")Method 2: Python Batch Search with search_fuzzy()
For repeated lookups where you want candidate matches (not just pass/fail validation), loop over search_fuzzy(). Each call returns typed SearchResult objects:
from barangay import search_fuzzy
queries = [
"Tongmageng, Tawi-Tawi",
"Barangay 291, Manila",
"Poblacion, Cebu City",
"San Roque, Quezon Province",
"Baluarte, City of San Fernando",
]
for query in queries:
results = search_fuzzy(query, threshold=70.0, limit=3)
if results:
top = results[0]
print(f"{query} → {top.name}, {top.province} (score {top.score})")
else:
print(f"{query} → NOT FOUND")Writing Results to JSON
Collect the typed result attributes (.name, .psgc_id, .score, .match_type) into a serializable structure:
import json
from barangay import search_fuzzy
queries = ["Tongmageng, Tawi-Tawi", "Barangay 291, Manila"]
output = {}
for query in queries:
results = search_fuzzy(query, threshold=70.0, limit=3)
output[query] = [
{
"name": r.name,
"psgc_id": r.psgc_id,
"score": r.score,
"match_type": r.match_type,
}
for r in results
]
with open("results.json", "w") as f:
json.dump(output, f, indent=2)Method 3: CLI Batch Search
Create a file with one query per line:
barangay batch batch-search queries.txt --limit 5 --output results.jsonThis reads queries.txt and writes matched results to results.json.
Method 4: Direct Data Filtering
For exact or prefix matching, iterate a Database view directly. Each record is an EnrichedRecord with resolved hierarchy fields:
from barangay import barangays
barangays_in_manila = [
r for r in barangays
if "manila" in r.name.lower()
]
print(f"Found {len(barangays_in_manila)} barangays matching 'manila'")
pangasinan_barangays = [
r for r in barangays
if r.parent_psgc_id.startswith("1")
]
print(f"Region I (psgc prefix '1') has {len(pangasinan_barangays)} barangays")to_frame() for vectorized filters
For anything beyond a simple list comprehension — grouping, joins, pandas queries — call barangays.to_frame() once and operate on the DataFrame. It is far faster than iterating 42,010 records in Python.
Method 5: Export Entire Dataset
Use the CLI to export all data for offline processing:
barangay export --model flat --format json --output all_barangays.jsonOr with Python using the Database view:
import json
from barangay import barangays
data = barangays.to_dicts()
with open("all_barangays.json", "w") as f:
json.dump(data, f, indent=2)Performance Tips
| Approach | Speed | Use Case |
|---|---|---|
to_frame() / to_dicts() |
Fastest | Full dataset export, vectorized filtering |
Iterate a view ([r for r in barangays ...]) |
<1ms/query | Exact or prefix matching |
search_fuzzy() |
~25-80ms/query | Fuzzy matching with typos |
validate_many() |
~25-80ms/query | Batch address validation |
CLI batch-search |
Batch optimized | File-based processing |
export + external tools |
Fastest | Full dataset export |
Historical Data
Bulk lookups also support historical PSGC data. Pass an as_of date (see available dates) to scope a search to a past masterlist:
from barangay import search_fuzzy
results = search_fuzzy("Tongmageng", threshold=70.0, as_of="2025-07-08")
for r in results:
print(r.name, r.psgc_id, r.score)To switch the whole process to a historical snapshot, call use_version() and remember to restore the latest when you are done:
import barangay
from barangay import search_fuzzy
barangay.use_version("2025-07-08")
results = search_fuzzy("Tongmageng", threshold=70.0)
barangay.use_version(None) # restore latestuse_version(None) restores the latest
use_version() applies globally — it is not a context manager, so it does not restore the previous version automatically. After you are done querying a historical snapshot, call use_version(None) to switch back to the latest masterlist.
barangay batch batch-search queries.txt --as-of "2025-07-08" --output historical_results.jsonNext Steps
- Getting Started — Database API overview
- Address Validation — validating individual addresses
- Search reference —
search_fuzzy()documentation - CLI Reference — batch command options