Bulk Barangay Lookup

Tutorial: perform bulk lookups and searches across the complete PSGC dataset of 42,010 barangays using to_frame(), to_dicts(), search_fuzzy(), validate_many(), and vectorized matching with pandas.
Author

bendlikeabamboo

Perform bulk lookups and searches across the complete PSGC dataset of 42,010 barangays, 1,493 municipalities, 150 cities, 82 provinces, and 18 regions.

Why Bulk Lookup?

When cleaning large datasets of Philippine addresses or geographic references, you need efficient batch processing. The barangay package provides:

  • Database views with to_frame() and to_dicts() for direct export
  • search_fuzzy() for typed fuzzy search results
  • validate_many() for batch address validation
  • CLI batch commands for file-based processing
  • Direct iteration over views for filtering and joins

Installation

See Get Started — Installation. Quick one-liner:

pip install barangay

Method 2: Python Batch Search with search_fuzzy()

For repeated lookups where you want candidate matches (not just pass/fail validation), loop over search_fuzzy(). Each call returns typed SearchResult objects:

from barangay import search_fuzzy

queries = [
    "Tongmageng, Tawi-Tawi",
    "Barangay 291, Manila",
    "Poblacion, Cebu City",
    "San Roque, Quezon Province",
    "Baluarte, City of San Fernando",
]

for query in queries:
    results = search_fuzzy(query, threshold=70.0, limit=3)
    if results:
        top = results[0]
        print(f"{query}{top.name}, {top.province} (score {top.score})")
    else:
        print(f"{query} → NOT FOUND")

Writing Results to JSON

Collect the typed result attributes (.name, .psgc_id, .score, .match_type) into a serializable structure:

import json
from barangay import search_fuzzy

queries = ["Tongmageng, Tawi-Tawi", "Barangay 291, Manila"]

output = {}
for query in queries:
    results = search_fuzzy(query, threshold=70.0, limit=3)
    output[query] = [
        {
            "name": r.name,
            "psgc_id": r.psgc_id,
            "score": r.score,
            "match_type": r.match_type,
        }
        for r in results
    ]

with open("results.json", "w") as f:
    json.dump(output, f, indent=2)

Method 4: Direct Data Filtering

For exact or prefix matching, iterate a Database view directly. Each record is an EnrichedRecord with resolved hierarchy fields:

from barangay import barangays

barangays_in_manila = [
    r for r in barangays
    if "manila" in r.name.lower()
]
print(f"Found {len(barangays_in_manila)} barangays matching 'manila'")

pangasinan_barangays = [
    r for r in barangays
    if r.parent_psgc_id.startswith("1")
]
print(f"Region I (psgc prefix '1') has {len(pangasinan_barangays)} barangays")
TipUse to_frame() for vectorized filters

For anything beyond a simple list comprehension — grouping, joins, pandas queries — call barangays.to_frame() once and operate on the DataFrame. It is far faster than iterating 42,010 records in Python.

Method 5: Export Entire Dataset

Use the CLI to export all data for offline processing:

barangay export --model flat --format json --output all_barangays.json

Or with Python using the Database view:

import json
from barangay import barangays

data = barangays.to_dicts()

with open("all_barangays.json", "w") as f:
    json.dump(data, f, indent=2)

Performance Tips

Approach Speed Use Case
to_frame() / to_dicts() Fastest Full dataset export, vectorized filtering
Iterate a view ([r for r in barangays ...]) <1ms/query Exact or prefix matching
search_fuzzy() ~25-80ms/query Fuzzy matching with typos
validate_many() ~25-80ms/query Batch address validation
CLI batch-search Batch optimized File-based processing
export + external tools Fastest Full dataset export

Historical Data

Bulk lookups also support historical PSGC data. Pass an as_of date (see available dates) to scope a search to a past masterlist:

from barangay import search_fuzzy

results = search_fuzzy("Tongmageng", threshold=70.0, as_of="2025-07-08")
for r in results:
    print(r.name, r.psgc_id, r.score)

To switch the whole process to a historical snapshot, call use_version() and remember to restore the latest when you are done:

import barangay
from barangay import search_fuzzy

barangay.use_version("2025-07-08")
results = search_fuzzy("Tongmageng", threshold=70.0)
barangay.use_version(None)  # restore latest
Importantuse_version(None) restores the latest

use_version() applies globally — it is not a context manager, so it does not restore the previous version automatically. After you are done querying a historical snapshot, call use_version(None) to switch back to the latest masterlist.

barangay batch batch-search queries.txt --as-of "2025-07-08" --output historical_results.json

Next Steps