// INTERACTIVE FILTERS

// SYSTEM_BLUEPRINT_MAP

PRT Stop (Size=Vol)

POGOH Station

📊 KEY INSIGHTS

System Overview: PRT serves ~834K daily riders while POGOH contributes ~3.4K trips (Oct 2025 peak).

Campus Dominance: 68.1% of all bike trips originate or terminate in the CMU/Pitt corridor, showing university-centric adoption.

Integration Gap: Only 11.5% of bus stops have meaningful bike activity within 400m walking distance.

🎯 Filter Active: System-Wide View

FIG 1A. POGOH DAILY RIDERSHIP (365 Days)

FIG 1B. PRT BUS RIDERSHIP (Monthly Trend)

📈 Temporal Analysis

Scale Difference: PRT serves ~834K daily riders while POGOH contributes ~3.4K daily trips (247x difference), showing buses as primary transit mode.

Seasonal Volatility: POGOH shows extreme seasonal fluctuation (13x from winter low to fall peak), while PRT remains stable (~611K-835K range).

Academic Calendar Impact: Sep/Oct surge (+86% from August) coincides with fall semester, confirming student-driven demand.

Winter Resilience: PRT actually grows in winter months while POGOH drops -63%, suggesting bikes fail as winter alternative.

FIG 2. TRIP ARCHETYPES (K-Means)

FIG 3. DIRECTIONAL FLOW

FIG 4. SEASONAL HOURLY PATTERNS

FIG 5. TRIP DURATION DISTRIBUTION

🚲 Usage Patterns Analysis

Archetypes: Commuters (47.9%) dominate with short, consistent trips, followed by Last-Mile users (32.8%). Leisure rides are minimal (3.6%).

Flow: Strong North-South axis alignment suggests heavy movement between Oakland/Shadyside and East Liberty corridors.

Seasonality: Fall semester (Sep-Nov) drives peak hourly usage. Winter usage retains only ~37% of peak volume, indicating weather sensitivity.

Durations: Median trip is ~7 mins. The sharp decay after 15 mins confirms POGOH is used primarily for efficient A-to-B transit, not recreation.

FIG 6. TOP 10 STATIONS (Member vs Casual)

Sort by:

FIG 7. BUS-BIKE CORRELATION

👥 User & Network Analysis

Membership Economy: ~70% of trips at top stations are from Members (Students/Locals). Casual use is concentrated in Downtown/Strip District.

Network Effect: Moderate positive correlation (R²=0.37) confirms that high-volume bus stops drive bikeshare usage, but proximity gaps remain.

FIG 8. TOP 10 PRT STOPS NEAR POGOH (Integration Potential)

🔗 Multimodal Opportunities

The "Last-Mile" Leaders: These 10 PRT stops are the most critical multimodal nodes. They have high bus volume AND are within 400m of a POGOH station.

Gap Analysis: While Forbes & Morewood sees massive bus traffic (Carnegie Mellon), bike uptake is lower compared to Liberty & Gateway, suggesting infrastructure quality varies.

FIG 9. TOP NEIGHBORHOODS BY BUS-BIKE INTEGRATION

FIG 10. INTEGRATION INDEX DISTRIBUTION (Top 15 Stops)

🏆 The Integration Leaderboard

What This Measures: The Integration Index rewards bus stops that have high ridership AND high nearby bike turnover.

Winner: Central Business District dominates due to density. Strip District and North Shore follow, showing the value of flat terrain and bike lanes.

Top Stop: 7th St @ Penn Ave is the "Golden Node" of the network, proving that bus lanes + protected bike lanes = maximum multimodal flow.

FIG 11. BEHAVIORAL HOTSPOTS (Top 3 Stations per Archetype)

🚴 COMMUTER HUBS

🔗 LAST-MILE CONNECTORS

🛒 ERRAND CENTERS

🎨 LEISURE DESTINATIONS

🎭 Station Personalities

Commuter Hubs: Schenley Dr (64.8%) and Forbes @ CMU (61.2%) are pure commuter stations—peak hour, A-to-B efficiency dominates.

Last-Mile Connectors: Boulevard of the Allies (46.9%) shows high last-mile percentage, serving as a critical transit feeder.

Errand Centers: Wilkinsburg Park & Ride (68.4%!) is overwhelmingly errand-focused, suggesting suburban shopping/service trip patterns.

Leisure Destinations: South Side Trail (19.0%) captures recreational riders—longer durations, lower displacement (circular routes).

Policy Insight: Stations have behavioral "DNA". Tailor rebalancing schedules and pricing to match: peak hour density for commuter hubs, leisure pricing at recreational nodes.

🎯 POLICY RECOMMENDATIONS

1. Fill the Homewood & Squirrel Hill Gaps

Issue: Two high-traffic corridors lack adequate POGOH coverage:
• Homewood: 4 high-volume bus stops (>800 boardings/day) have NO bike stations within 800m walking distance
• Squirrel Hill: Despite significant bus traffic along Forbes Ave and Murray Ave, POGOH stations are sparse, forcing residents to walk 600-1000m to access bikes
Action: Deploy 2 new stations in Homewood (Homewood-Brushton Busway, N. Homewood Ave) and 2 in Squirrel Hill (Forbes @ Murray, Murray @ Forward) to serve 18K+ daily bus riders.

EQUITY FIRST/LAST MILE

2. Winter Gear & Fleet Resilience

Issue: 63% ridership drop in January shows weather deterrence. Riders freeze and batteries drain.
Action: Distribute subsidized POGOH-branded winter riding kits (gloves/buffs) to students and prioritize battery heating/swapping for e-fleet reliability in <30°F.

SEASONAL RETENTION

3. Downtown Triangle Densification

Issue: 7TH & PENN shows highest integration score (3,732) but surrounding blocks are underserved.
Action: Add 3 micro-hubs (10 docks each) within 200m of Point State Park to capture tourist + commuter demand.

HIGH-ROI TOURISM

METHODOLOGY LOG

Documentation of exploratory data analysis process, combining raw POGOH and PRT data (XLSX and CSV) into features and insight for web dashboard. All processed data are stored in /processed_data/

📁 CSV Outputs (./processed_data/):

                            archetypes.csv (4 rows)

                            bike_stations_geo.csv (60 rows)

                            bus_stops_geo.csv (100 rows)

                            correlation.csv (2,720 rows)

                            daily_timeseries.csv (365 rows)

                            demographics.csv (10 rows)

                            directionality.csv (8 rows)
                        
                            duration_distribution.csv (30 rows)

                            heatmap_hour_day.csv (24 rows)

                            heatmap_hour_season.csv (24 rows)

                            monthly_trends.csv (12 rows)

                            prt_historical.csv (8 rows)

                            top_prt_pogoh.csv (10 rows)

📊 View Static EDA Report (Seaborn + Bokeh)

Jupyter-style notebook with publication-quality charts

1. Research Question & Data Pipeline

Core Question: How can we optimize micro-mobility integration with public transit in a student-dominated urban environment?

Pittsburgh's bikeshare system operates in a unique context: 68% of trips occur in the Campus Corridor (CMU/Pitt bounding box). This creates extreme seasonal volatility—ridership drops 63% during academic breaks. Traditional transit planning assumes stable demand; this analysis reveals the necessity of dynamic fleet scaling tied to the academic calendar.

Data Sources:

POGOH Bikeshare: 556,437 trips (2024 full year)
PRT Bus Stops: 1,223 stops with coordinates and annual boardings
Schema: Start/End timestamps, Station names, Duration, Rider type (Member/Casual), Geolocation

Data Quality Controls:

Removed trips >180 min (outliers/theft)
Geocoded station coordinates via fuzzy matching
Haversine distance calculation for spatial joins (400m threshold)

# Data Cleaning Pipeline (etl.py lines 85-110)
import pandas as pd
from datetime import datetime

# Load raw data
pogoh = pd.read_excel('dataset/POGOH_2024.xlsx')

# Parse timestamps
pogoh['Start Date'] = pd.to_datetime(pogoh['Start Date'])
pogoh['End Date'] = pd.to_datetime(pogoh['End Date'])

# Calculate duration in seconds
pogoh['Duration'] = (pogoh['End Date'] - pogoh['Start Date']).dt.total_seconds()

# Remove outliers (>180 min = theft/data error)
trips_clean = pogoh[pogoh['Duration'] <= 10800] # 180 min

# Extract temporal features
trips_clean['hour'] = trips_clean['Start Date'].dt.hour
trips_clean['day_of_week'] = trips_clean['Start Date'].dt.day_name()
trips_clean['month'] = trips_clean['Start Date'].dt.month

✓ Data Loaded: 556,437 trips (2024 full year)
✓ Key Insight: Unlike typical bikeshare systems that serve commuters year-round, Pittsburgh's system functions as a "Campus Mobility Extension" requiring different operational strategies than traditional urban bikeshare.

Exported to: ./processed_data/daily_timeseries.csv

2. Campus Geofencing & Temporal Segmentation

Methodology: Trips flagged as "Campus Corridor" if start OR end coordinates fall within the CMU/Pitt bounding box. This spatial segmentation enables analysis of the "Student Effect" on ridership patterns.

Bounding Box:

Latitude: 40.435°N to 40.450°N
Longitude: -79.970°W to -79.940°W

This captures CMU, University of Pittsburgh, and Shadyside neighborhoods.

# Campus Flag Logic (etl.py lines 156-165)
CAMPUS_LAT_MIN, CAMPUS_LAT_MAX = 40.435, 40.450
CAMPUS_LON_MIN, CAMPUS_LON_MAX = -79.970, -79.940

trips_clean['is_campus'] = (
  ((trips_clean['Start Lat'] >= CAMPUS_LAT_MIN) &
   (trips_clean['Start Lat'] <= CAMPUS_LAT_MAX) &
   (trips_clean['Start Lon'] >= CAMPUS_LON_MIN) &
   (trips_clean['Start Lon'] <= CAMPUS_LON_MAX)) |
  ((trips_clean['End Lat'] >= CAMPUS_LAT_MIN) &
   (trips_clean['End Lat'] <= CAMPUS_LAT_MAX) &
   (trips_clean['End Lon'] >= CAMPUS_LON_MIN) &
   (trips_clean['End Lon'] <= CAMPUS_LON_MAX))
)

# Result: 68.1% of all trips touch the Campus Corridor
campus_pct = trips_clean['is_campus'].sum() / len(trips_clean) * 100
print(f"Campus trips: {campus_pct:.1f}%") # Output: 68.1%

✓ Campus Segmentation Results:
• Campus Corridor: 379,039 trips (68.1% of system volume)
• Winter Drop: Campus ridership drops 63% in Jan vs Sep (academic break)
• City Stability: Non-campus trips remain stable year-round
• Policy Implication: Fleet sizing must be dynamically adjusted based on academic calendar. Operating a full fleet during winter break wastes capital on underutilized bikes.

Exported to: ./processed_data/daily_timeseries.csv

3. Unsupervised Learning: Trip Archetypes

Using K-Means clustering on trip duration, displacement, and start hour to categorize rider behavior patterns without labeled training data. This reveals latent behavioral segments for targeted policy interventions.

Algorithm Rationale:

K-Means (k=4): Chosen for interpretability. Silhouette analysis validated 4 as optimal cluster count (score: 0.68).
Feature Scaling: StandardScaler ensures duration (seconds), displacement (meters), and hour (0-23) contribute equally.
Feature Selection: Duration + Displacement capture trip purpose better than speed alone. Hour captures temporal behavior (commute vs leisure).

# 3. K-MEANS CLUSTERING (etl.py lines 196-223)
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Prepare features (Duration, Displacement, Hour)
features = trips[['Duration', 'displacement', 'hour']].dropna()
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Fit K-Means (k=4 determined via elbow method + silhouette)
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
trips['archetype'] = kmeans.fit_predict(features_scaled)

# Label clusters based on centroid characteristics
cluster_stats = trips.groupby('archetype').agg({
  'Duration': 'mean',
  'displacement': 'mean',
  'hour': 'mean'
})

# Assign semantic labels (manual inspection of centroids)
archetype_map = {
  0: 'Commuter', # Short duration, peak hours (17.7h avg)
  1: 'Errand', # Medium duration, mid-day (14.8h)
  2: 'Last-Mile', # Very short, morning (9.3h)
  3: 'Leisure' # Long duration (73.2 min avg!)
}
trips['archetype_label'] = trips['archetype'].map(archetype_map)

✓ Identified 4 Behavioral Archetypes:
1. Commuter (47.9% — 266,603 trips): Avg 7.7 min, 836m displacement, peak at 5:47 PM (evening commute)
2. Last-Mile (32.8% — 182,663 trips): Avg 7.0 min, 910m, peak at 9:18 AM (connects to morning transit)
3. Errand (15.7% — 87,128 trips): Avg 20.1 min, 3,359m displacement, peak at 2:48 PM (mid-day shopping/errands)
4. Leisure (3.6% — 20,043 trips): Avg 73.2 min, 737m (!), peak at 2:24 PM (weekend exploration, circular routes)

Unexpected Finding: Leisure trips (3.6%) have an average duration of 73.2 minutes with displacement of only 737m. This suggests recreational "circular routes" along the riverfront trail system—users exploring rather than commuting. These trips require different bike availability (longer rental periods, trail-adjacent stations).

Interpretation Note: "Last-Mile" trips (32.8%) peak at 9:18 AM with 7-minute duration. These are not standalone trips—they're bikeshare-to-bus connections. Cross-referencing with PRT data confirms high overlap with major bus hubs (Boulevard of the Allies, S Millvale Ave).

Exported to: ./processed_data/archetypes.csv

3B. Station Behavioral Profiling

After identifying trip archetypes, we reverse the analysis: which stations generate which behaviors? This reveals "station personalities" critical for targeted operational decisions.

Methodology:

Percentage Calculation: For each station, calculate % of trips matching each archetype (Commuter/Last-Mile/Errand/Leisure)
Statistical Significance: Only stations with 50+ total trips included (prevents noise from low-volume stations)
Top 3 Selection: Identify top 3 stations with highest percentage for each archetype

# 3B. STATION ARCHETYPE PROFILING (etl.py lines 580-617)
# Calculate percentage of each archetype per station
trips_clean['archetype_label'] = trips_clean['archetype'].map(archetype_map)

for archetype in ['Commuter', 'Last-Mile', 'Errand', 'Leisure']:
  # Count trips by station for this archetype
  station_counts = trips_clean.groupby(['Start Station Name', 'archetype_label']).size()

  # Get total trips per station
  station_totals = trips_clean.groupby('Start Station Name').size()

  # Calculate percentage
  station_archetype_pct = (station_counts / station_totals) * 100

  # Filter: only stations with 50+ trips
  archetype_data = station_archetype_pct[station_totals >= 50]

  # Get top 3 by percentage
  top_3 = archetype_data.nlargest(3)
  print(f"{archetype}: {top_3}")

✓ Station Behavioral Profiles Identified:

Commuter Hotspots:
• Schenley Dr & Schenley Dr Ext: 64.8% (12,721 of 19,637 trips) — Pure commuter station at CMU campus edge
• Forbes Ave @ TCS Hall (CMU): 61.2% (10,168 of 16,607 trips) — Academic commuter hub

Last-Mile Leaders:
• Boulevard of the Allies & Parkview Ave: 46.9% (13,859 of 29,538 trips) — Critical transit feeder
• S Millvale Ave & Centre Ave: 43.4% (4,045 of 9,310 trips) — East End connector

Errand Centers:
• Wilkinsburg Park & Ride: 68.4%! (444 of 649 trips) — Suburban shopping/service trips dominate
• Second Ave & Tecumseh St: 61.4% (181 of 295 trips) — South Side errand node

Leisure Destinations:
• South Side Trail & S 4th St: 19.0% (712 of 3,739 trips) — Recreational waterfront
• Liberty Ave & Stanwix St: 18.9% (1,218 of 6,438 trips) — Downtown leisure hub

Operational Insight: Schenley Dr (64.8% commuter) vs South Side Trail (19.0% leisure) require completely different operational strategies. Schenley needs predictable 8 AM bike availability for class commutes; South Side needs afternoon/weekend capacity for exploratory rides. One-size-fits-all rebalancing fails both station types.

Exported to: ./processed_data/station_archetypes.csv

4. Temporal Dynamics: Academic Calendar Dependency

The most striking feature of Pittsburgh's bikeshare is the academic calendar dependency. Daily ridership fluctuates from 412 trips (winter break nadir) to 3,800+ trips (fall semester peak)—a 9× variance.

Why Daily Granularity Matters:

365-Day Timeseries: Monthly averages hide spikes (orientation week, finals week). Daily data preserves these anomalies.
Peak Day: September 26, 2024 (3,800+ trips — Fall semester peak + ideal weather)
Trough Day: January 4, 2024 (412 trips — winter break nadir)

# 4. CAMPUS vs CITY DAILY TIMESERIES (etl.py lines 401-413)
# Generate daily counts, split by campus flag
trips['date'] = trips['Start Date'].dt.date
daily_pogoh = trips.groupby('date').size().reset_index(name='trips')

# Campus trips (is_campus == True)
daily_campus = trips[trips['is_campus']].groupby('date').size().reset_index(name='trips')
daily_campus['date_str'] = pd.to_datetime(daily_campus['date']).dt.strftime('%Y-%m-%d')

# City trips (is_campus == False)
daily_city = trips[~trips['is_campus']].groupby('date').size().reset_index(name='trips')
daily_city['date_str'] = pd.to_datetime(daily_city['date']).dt.strftime('%Y-%m-%d')

# Ensure all 365 dates present (fill gaps with 0)
all_dates = pd.DataFrame({'date_str': daily_pogoh['date_str']})
daily_campus_full = all_dates.merge(daily_campus[['date_str', 'trips']], on='date_str', how='left').fillna(0)
daily_city_full = all_dates.merge(daily_city[['date_str', 'trips']], on='date_str', how='left').fillna(0)

# Export 3 parallel 365-element arrays
daily_timeseries = {
  'dates': daily_pogoh['date_str'].tolist(),
  'pogoh_trips': daily_pogoh['trips'].tolist(),
  'pogoh_campus_trips': daily_campus_full['trips'].astype(int).tolist(),
  'pogoh_city_trips': daily_city_full['trips'].astype(int).tolist()
}

✓ Temporal Segmentation Results:
• Campus Corridor: 379,039 trips (68.1% of system volume)
• Winter Drop: Campus ridership drops 63% in Jan vs Sep (academic break)
• City Stability: Non-campus trips remain stable year-round, indicating resident commuter dependence
• Peak Day: September 26, 2025 (3,800+ trips — Fall semester peak + ideal weather)
• Trough Day: January 4, 2025 (412 trips — winter break nadir)

Policy Implication: Campus fleet should be dynamically scaled with academic calendar. City corridors need year-round minimum service.

Exported to: ./processed_data/daily_timeseries.csv (365 rows × 4 columns)

5. Season × Hour Matrix

A heatmap revealing the exact time-windows of highest demand. This drives our rebalancing schedules.

# 5. PIVOT TABLE ANALYSIS
matrix = trips.groupby(['season', 'hour']).size().unstack()
sns.heatmap(matrix, cmap='Blues')

Peak Load: Fall Semester, 5:00 PM (Commute + Class End).
Policy Insight: The "Winter Gap" is visible as a uniform cooling across all hours, not just peaks.

6. Statistical Correlation: Bus vs Bike

Do busy bus stops actually generate bike trips? We test this hypothesis with linear regression.

# 6. LINEAR REGRESSION
slope, r_value, p_value = linregress(bus_vol, integration_score)
print(f"R-Squared: {r_value**2}")

Result: R² = 0.372 (p < 0.001)
There is a moderate positive correlation. While bus volume is a predictor, it's not the only factor. Bike infrastructure (lanes) and topography play huge unmeasured roles.

Strategic Integration: Top 10 Multimodal Nodes

New analysis identifying high-impact integration opportunities. We filter for the top 10 bus stops (by volume) that are within 400m of a bikeshare station.

# 10. FINDING "LAST MILE LEADERS"
bus_stops_near_pogoh = bus_stops[bus_stops['bike_trips_nearby'] > 0]
top_10 = bus_stops_near_pogoh.nlargest(10, 'bus_boardings')
print(top_10[['stop_name', 'bus_boardings', 'bike_trips_nearby']])

Top 10 PRT Stops Near POGOH Stations:

Key Finding: #1 Node shows proven multimodal success. Gaps exist where high bus volume doesn't translate to bike usage (infrastructure opportunity).

// ORIGIN: JAKARTA (CGK) → DESTINATION: PITTSBURGH (PIT)

Relocating from Jakarta to Pittsburgh fundamentally shifted my mobility baseline from 'Park & Ride' to 'Bike & Bus'. In Jakarta, my daily commute often involved a car trip to the MRT followed by 45++ minutes of friction on the Tol Desari.

Arriving here introduced a new variable: Micro-mobility in a Winter Climate. Thanks to integrated transit access (via my CMU ID), POGOH became my critical First/Last-mile connector. The challenge shifted from traffic volume to thermal endurance. But the result—avoiding the despair of gridlock—is transformative. This project explores that efficiency.

// DATA_SOURCES

📄 POGOH Trip Data (WPRDC)
🚌 PRT Transit Data (ArcGIS Hub)

NODE A: POGOH in Bridge City

NODE B: PRT in AAA East Liberty

ORIGIN: MRT JAKARTA

FRICTION: TOL DESARI JAM

// RIZALDY AL KAUTSAR UTOMO

                        rutomo@andrew.cmu.edu

                        (Public Policy, Analytics, AI Management @ CMU)

I am on a mission to architect responsible AI solutions for developing markets, ensuring that technology helps people thrive rather than just survive.

My professional journey began at the intersection of policy and analytics in Indonesia’s telecommunications sector. During seven years at Telkomsel, I saw how mobility data can drive critical decisions... deploying credit-scoring models that expanded financial inclusion and designing campaign algorithms that lifted revenue.

Curiosity carried me from Bandung to Pittsburgh. A full-ride LPDP scholarship made it possible to study at Carnegie Mellon, where I bring an underrepresented international perspective to AI policy. Recognized with the Ganesha Karsa award (ITB).

Tinkering is in my DNA. EcoFlow grew from GIS into a crowd-intelligence tourism system (incubated by Singtel). Bukugambar.ai began as weekend playtime with my son and became an AI sketchbook that turns text or photos into coloring-page outlines.

Seeking Summer 2026 Internship in Analytics or AI Solutions in the US.