API Reference

Main Search Function

`search()`

Search for barangays using fuzzy string matching.

from barangay import search

results = search("Tongmageng, Tawi-Tawi")

Parameters:

Parameter	Type	Default	Description
`search_string`	`str`	-	The string to search for
`match_hooks`	`List[Literal["province", "municipality", "barangay"]]`	`["province", "municipality", "barangay"]`	Administrative levels to match against. Valid options: `"province"`, `"municipality"`, `"barangay"`
`threshold`	`float`	`60.0`	Minimum similarity score (0-100)
`n`	`int`	`5`	Maximum number of results
`search_sanitizer`	`Callable`	-	Function to sanitize search string
`fuzz_base`	`FuzzBase \| None`	-	Pre-computed fuzzy matching instance for performance
`as_of`	`str \| None`	-	Historical date (YYYY-MM-DD) or None for latest

Returns: List[dict] with matching results

Note: The function supports multiple matching patterns:

B (barangay only): Matches against barangay name only
PB (province + barangay): Matches against province and barangay combined
MB (municipality + barangay): Matches against municipality and barangay combined
PMB (province + municipality + barangay): Matches against all three levels combined

Each result includes the following score fields: - f_000b_ratio_score: Score for barangay-only match - f_0p0b_ratio_score: Score for province + barangay match - f_00mb_ratio_score: Score for municipality + barangay match - f_0pmb_ratio_score: Score for province + municipality + barangay match

Each result also includes base fields (000b, 0p0b, 00mb, 0pmb) for the corresponding match patterns.

Example:

results = search(
    "Tongmagen, Tawi-Tawi",
    n=4,
    match_hooks=["municipality", "barangay"],
    threshold=70.0,
    as_of="2025-07-08"
)

for result in results:
    print(f"{result['barangay']} (score: {result['f_00mb_ratio_score']})")

Data Access

Pydantic Data Models

The package provides three main data models as Pydantic models:

`barangay` (AdminDiv)

Nested administrative division data model. Organizes data hierarchically by region → municipality/city → barangay.

from barangay import barangay

# Access nested data using dict-like access
ncr_cities = list(barangay["National Capital Region (NCR)"].keys())
manila_brgys = barangay["National Capital Region (NCR)"]["City of Manila"]

# Iterate over regions
for region, municipalities in barangay.items():
    print(f"Region: {region}")

Type: AdminDiv (RootModel[dict[str, AdminDiv] | List[str]])

Note: This is a Pydantic model, not a plain dict. Use .model_dump() to convert to dict if needed.

`barangay_extended` (AdminDivExtended)

Extended recursive model with complete administrative hierarchy. Each division includes its PSGC ID, parent PSGC ID, type, and nested components.

from barangay import barangay_extended

# Access extended hierarchical data
country = barangay_extended
for region in country.components:
    print(f"Region: {region.name} (PSGC: {region.psgc_id})")
    for province in region.components:
        print(f"  Province: {province.name}")

Type: AdminDivExtended

Fields:

Parameter	Type	Default	Description
`name`	`str`	-	Name of the administrative division
`type`	`str`	-	Type of division (country, region, province, city, municipality, barangay, special_geographic_area, submunicipality)
`psgc_id`	`str`	-	PSGC identifier or "n/a"
`parent_psgc_id`	`str`	-	Parent PSGC identifier or "n/a"
`nicknames`	`Optional[List[str]]`	-	Optional list of alternative names
`components`	`List[AdminDivExtended]`	-	List of nested administrative divisions

`barangay_flat` (List[AdminDivFlat])

Flat list of all administrative divisions without nesting. Each entry is a standalone record.

from barangay import barangay_flat

# Access flat data
all_barangays = [item for item in barangay_flat if item.type == "barangay"]
municipalities = [item for item in barangay_flat if item.type == "municipality"]

# Example: Find a specific barangay
brgy = [loc for loc in barangay_flat if loc.name == "Marayos"]
if brgy:
    print(f"Found: {brgy[0].name} in {brgy[0].type}")

Type: List[AdminDivFlat]

Fields (AdminDivFlat):

Parameter	Type	Default	Description
`name`	`str`	-	Name of the administrative division
`type`	`str`	-	Type of division (country, region, province, city, municipality, barangay, special_geographic_area, submunicipality)
`psgc_id`	`str`	-	PSGC identifier or "n/a"
`parent_psgc_id`	`str`	-	Parent PSGC identifier or "n/a"
`nicknames`	`Optional[List[str]]`	-	Optional list of alternative names

Backward Compatibility Dictionaries

For backward compatibility, dict versions of the data are also available:

from barangay import BARANGAY, BARANGAY_EXTENDED, BARANGAY_FLAT

# Dict versions (converted from Pydantic models at import time)
# Use these for legacy code or when you need plain dicts
BARANGAY  # dict: nested structure like barangay
BARANGAY_EXTENDED  # dict: extended nested structure
BARANGAY_FLAT  # List[dict]: list of dicts like barangay_flat

Note: These are converted from Pydantic models at module import time. For better performance, use the Pydantic models (barangay, barangay_extended, barangay_flat) directly when possible.

Example: Data Model Comparison

from barangay import barangay, barangay_extended, barangay_flat, BARANGAY_FLAT

# Using Pydantic models (recommended)
for item in barangay_flat:
    print(f"{item.name} - {item.type}")

# Using dict for compatibility
for item in BARANGAY_FLAT:
    print(f"{item['name']} - {item['type']}")

DataManager

Manage data loading, caching, and downloading. Provides access to data from bundled package, local cache, or GitHub.

from barangay import DataManager

dm = DataManager()

# Load latest data
data = dm.get_data(data_type="basic")

# Load historical data
data = dm.get_data(as_of="2025-07-08", data_type="flat")

Parameters:

Parameter	Type	Default	Description
`as_of`	`str \| None`	-	Historical date (YYYY-MM-DD) or None for latest bundled data
`data_type`	`str`	-	Data model type. Valid options: `"basic"` (nested structure), `"flat"` (flat list), `"extended"` (extended recursive structure)

Returns: Depending on data_type:

"basic": dict
"flat": dict
"extended": dict

Data Loading Priority:

Bundled package data (for current version or latest)
Local cache (for historical dates)
Download from GitHub (if not cached)

Note: The DataManager automatically handles date resolution, caching, and downloads based on the as_of parameter and configuration.

Fuzzy Matching

`create_fuzz_base()`

Factory function to create FuzzBase instances for performance optimization. Reusing a FuzzBase instance across multiple searches improves performance by avoiding redundant data loading and preprocessing.

from barangay import create_fuzz_base, search

# Create FuzzBase instance (can be reused for multiple searches)
fuzz_base = create_fuzz_base(as_of="2025-08-29")

# Use the same fuzz_base for multiple queries
results1 = search("Tongmageng", fuzz_base=fuzz_base)
results2 = search("Marayos", fuzz_base=fuzz_base)
results3 = search("San Jose", fuzz_base=fuzz_base)

Parameters:

Parameter	Type	Default	Description
`as_of`	`str \| None`	-	Historical date (YYYY-MM-DD) or None for latest bundled data

Returns: FuzzBase instance

FuzzBase

Class for fuzzy matching operations with pre-computed matching functions.

from barangay import FuzzBase, create_fuzz_base

# Create FuzzBase instance using factory function
fuzz_base = create_fuzz_base(as_of="2025-08-29")

Parameters:

Parameter	Type	Default	Description
`fuzzer_base`	`pd.DataFrame`	-	DataFrame with preprocessed barangay data
`sanitizer`	`Callable`	`_basic_sanitizer`	Function to clean strings (optional)

Note: FuzzBase internally creates four matching patterns:

000b: Barangay name only
0p0b: Province + Barangay
00mb: Municipality + Barangay
0pmb: Province + Municipality + Barangay

Each pattern has a pre-computed fuzzy matching function using rapidfuzz.token_sort_ratio.

Utilities

`sanitize_input()`

Utility function for string sanitization. Converts strings to lowercase and removes specified items.

from barangay import sanitize_input

# Basic sanitization (lowercase only)
cleaned = sanitize_input("City of San Jose")
# Result: "city of san jose"

# Sanitize with exclusions
cleaned = sanitize_input("City of San Jose", exclude=["city of ", " city"])
# Result: "san jose"

# Using a list of exclusions
cleaned = sanitize_input("(pob.) San Jose City", exclude=["(pob.)", " city"])
# Result: "san jose"

# Using a single exclusion string
cleaned = sanitize_input("San Jose & vicinity", exclude="&")
# Result: "san jose  vicinity"

Parameters:

Parameter	Type	Default	Description
`input_str`	`str \| None`	-	String to sanitize. None becomes empty string
`exclude`	`List[str] \| str \| None`	-	Items to remove. Can be a list, string, or None

Returns: Sanitized lowercase string with excluded items removed

Note: The function handles None input gracefully by converting it to an empty string.

`resolve_date()`

Resolve approximate dates to the closest available dataset. This is useful when working with historical data that may not have exact date matches.

from barangay import resolve_date

# Resolve to closest available date
resolved_date, status = resolve_date("2025-07-01", get_available_dates(), "2026-01-13")
print(resolved_date)  # '2025-07-08' (closest available)
print(status)  # Message describing the resolution

Parameters:

Parameter	Type	Default	Description
`target_date`	`str`	-	Target date string (YYYY-MM-DD)
`available_dates`	`List[str]`	-	List of available dataset dates
`current_date`	`str`	-	Current dataset date (for reference)

Returns: Tuple of (resolved_date: str | None, status_message: str)

`get_available_dates()`

Get list of available historical dataset dates. This typically includes all historical releases available on GitHub.

from barangay import get_available_dates

dates = get_available_dates()
print(dates)
# ['2023-01-25', '2023-04-18', '2023-08-15', '2023-10-24', '2024-01-23', '2024-04-23', '2024-05-08', '2024-07-12', '2024-10-18', '2025-01-30', '2025-04-23', '2025-07-08', '2025-08-29', '2025-10-13', '2026-01-13']

Returns: List[str] of available dates in YYYY-MM-DD format

Note: The current bundled version is also included in this list via the barangay.available_dates attribute.

Configuration

`resolve_as_of()`

Resolve the "as of" date for data queries from multiple layers with priority.

from barangay import resolve_as_of

# Resolve with parameter
date = resolve_as_of(as_of_param="2025-08-29")
print(date)  # '2025-08-29'

# Resolve without parameter (uses module attribute or env var)
date = resolve_as_of()
print(date)  # None (if not set) or value from barangay.as_of or BARANGAY_AS_OF

Parameters:

Parameter	Type	Default	Description
`as_of_param`	`str \| None`	-	Optional date string from function parameter

Returns: str | None - The resolved date string, or None for latest data

Priority order:

Function parameter (if provided)
Module attribute (barangay.as_of)
Environment variable (BARANGAY_AS_OF)
Default: None (use latest bundled data)

`get_verbose()`

Get verbose logging setting from environment variable.

from barangay import get_verbose

verbose = get_verbose()
print(verbose)  # True or False

Returns: bool - True if verbose logging is enabled

Environment Variable: BARANGAY_VERBOSE

Valid values (case-insensitive): "true", "1", "yes", "on"
Default: "true"

`get_cache_dir()`

Get the cache directory path for the application.

from barangay import get_cache_dir

cache_dir = get_cache_dir()
print(cache_dir)
# /home/user/.cache/barangay (or custom path)

Returns: Path - The cache directory path

Priority order:

Environment variable BARANGAY_CACHE_DIR (if set)
Windows: %LOCALAPPDATA%\barangay\cache
Linux/Mac with XDG_CACHE_HOME: $XDG_CACHE_HOME/barangay
Linux/Mac fallback: ~/.cache/barangay

`load_env_config()`

Load configuration from environment variables.

from barangay import load_env_config

config = load_env_config()
print(config)
# {
#     'BARANGAY_AS_OF': '2025-07-08' (or None),
#     'BARANGAY_VERBOSE': 'true',
#     'BARANGAY_CACHE_DIR': None (or custom path)
# }

Returns: dict with keys:

Parameter	Type	Default	Description
`BARANGAY_AS_OF`	`str \| None`	-	Target dataset date or None
`BARANGAY_VERBOSE`	`str`	`"true"`	Verbose setting string
`BARANGAY_CACHE_DIR`	`str \| None`	-	Custom cache directory path or None

Module-Level Attributes

import barangay

# Current dataset date (from bundled package)
print(barangay.current)           # '2026-01-13'

# Available dataset dates (historical + current)
print(barangay.available_dates)    # List of available dates

# Set default date for session (affects search() if no as_of parameter)
barangay.as_of = "2025-07-08"

Attributes:

Parameter	Type	Default	Description
`current`	`str`	-	Current bundled dataset date (YYYY-MM-DD format). Read from `barangay/data/CURRENT_VERSION`
`as_of`	`str \| None`	-	Default historical date to use for data queries. Can be set at runtime. Defaults to None (use latest data)
`available_dates`	`List[str]`	-	List of all available dataset dates including historical releases and current version. Populated at module import

Data Models

BarangayModel

Pydantic model for barangay data validation and serialization.

from barangay import BarangayModel

# Create a new instance
model = BarangayModel(
    barangay="Tongmageng",
    municipality_or_city="Tongmagen",
    province_or_huc="Tawi-Tawi",
    psgc_id="123456789"
)

# Access fields
print(model.barangay)              # 'Tongmageng'
print(model.municipality_or_city)  # 'Tongmagen'
print(model.province_or_huc)       # 'Tawi-Tawi'
print(model.psgc_id)               # '123456789'

# Convert to dict
model_dict = model.model_dump()
# {'barangay': 'Tongmageng', 'municipality_or_city': 'Tongmagen', ...}

# Convert to JSON
model_json = model.model_dump_json()

Fields:

Parameter	Type	Default	Description
`barangay`	`str`	-	Name of the barangay
`province_or_huc`	`str`	-	Province or highly urbanized city name
`municipality_or_city`	`str`	-	Municipality or city name
`psgc_id`	`str`	-	Philippine Standard Geographic Code identifier

Note: This is a simple model used for single barangay records, distinct from the hierarchical AdminDiv models.

AdminDiv

Root model for administrative division mapping. Can be either a nested dict structure or a flat list of identifiers.

from barangay import AdminDiv, barangay

# Used by the barangay module-level attribute
# Acts as a dict-like container
for region, municipalities in barangay.items():
    print(f"Region: {region}")
    for municipality, brgys in municipalities.items():
        print(f"  Municipality: {municipality}")

Type: RootModel[dict[str, AdminDiv] | List[str]]

Methods:

Parameter	Type	Default	Description
`keys()`	`Callable`	-	Get keys if root is a dict
`values()`	`Callable`	-	Get values if root is a dict
`items()`	`Callable`	-	Get items if root is a dict
`__contains__()`	`Callable`	-	Check membership
`__iter__()`	`Callable`	-	Iterate over the root structure

AdminDivExtended

Extended model for administrative division data with complete hierarchical structure. Each division can have nested components.

from barangay import AdminDivExtended

# Example structure
region = AdminDivExtended(
    name="National Capital Region (NCR)",
    type="region",
    psgc_id="130000000",
    parent_psgc_id="n/a",
    nicknames=["NCR", "Metro Manila"],
    components=[...]
)

# Access nested components
for component in region.components:
    print(f"{component.type}: {component.name}")

Fields:

Parameter	Type	Default	Description
`name`	`str`	-	Name of the administrative division
`type`	`Literal`	-	Type of division. Valid values: `"country"`, `"region"`, `"province"`, `"city"`, `"municipality"`, `"barangay"`, `"special_geographic_area"`, `"submunicipality"`
`psgc_id`	`str \| Literal["n/a"]`	-	PSGC identifier
`parent_psgc_id`	`str \| Literal["n/a"]`	-	Parent PSGC identifier
`nicknames`	`Optional[List[str]]`	-	List of alternative names
`components`	`List[AdminDivExtended]`	-	List of nested administrative divisions

Note: This is a recursive model where each division can contain nested child divisions, allowing for complete hierarchical representation of the Philippine administrative structure.

AdminDivFlat

Flat model for administrative division data without nesting. Each record is self-contained with all necessary information.

from barangay import AdminDivFlat, barangay_flat

# Find barangays by type
barangays = [item for item in barangay_flat if item.type == "barangay"]

# Find by name
for item in barangay_flat:
    if item.name == "Marayos":
        print(f"Found: {item.name}")
        print(f"Type: {item.type}")
        print(f"PSGC ID: {item.psgc_id}")
        print(f"Parent: {item.parent_psgc_id}")
        break

Fields:

Parameter	Type	Default	Description
`name`	`str`	-	Name of the administrative division
`type`	`Literal`	-	Type of division (same as AdminDivExtended)
`psgc_id`	`str \| Literal["n/a"]`	-	PSGC identifier
`parent_psgc_id`	`str \| Literal["n/a"]`	-	Parent PSGC identifier
`nicknames`	`Optional[List[str]]`	-	List of alternative names

Note: Unlike AdminDivExtended, this model has no nested components, making it ideal for flat list representations and simple lookups.

API Reference

Main Search Function

search()

Data Access

Pydantic Data Models

barangay (AdminDiv)

barangay_extended (AdminDivExtended)

barangay_flat (List[AdminDivFlat])

Backward Compatibility Dictionaries

Example: Data Model Comparison

DataManager

Fuzzy Matching

create_fuzz_base()

FuzzBase

Utilities

sanitize_input()

resolve_date()

get_available_dates()

Configuration

resolve_as_of()

get_verbose()

get_cache_dir()

load_env_config()

Module-Level Attributes

Data Models

BarangayModel

AdminDiv

AdminDivExtended

AdminDivFlat

`search()`

`barangay` (AdminDiv)

`barangay_extended` (AdminDivExtended)

`barangay_flat` (List[AdminDivFlat])

`create_fuzz_base()`

`sanitize_input()`

`resolve_date()`

`get_available_dates()`

`resolve_as_of()`

`get_verbose()`

`get_cache_dir()`

`load_env_config()`