Data Methodology & Quality
This page explains how the datasets sold on HistoricalData.net are collected, processed, adjusted and validated, and what their limitations are. If a question is not answered here, check the FAQ or email [email protected].
1. Data Sources and Market Coverage
Stock data is built from trade data consolidated across 20+ U.S. exchanges and trading venues, including XNYS (New York Stock Exchange), XASE (NYSE American), XBOS (Nasdaq OMX BX) and XADF (FINRA Alternative Display Facility), sourced through an institutional-grade market data provider. Options data is sourced from OPRA (Options Price Reporting Authority) feeds covering all U.S. options exchanges.
Coverage includes more than 15,000 active and 50,000 delisted U.S. tickers: common stocks, ETFs, ADRs, preferred shares, warrants, units and funds. Index options (SPX, NDX, RUT, VIX, XSP and other CBOE index products) are included in the options dataset together with their index level history.
2. Time Range and Update Schedule
| Dataset | History starts | Update cadence |
|---|---|---|
| Stock daily & 1-minute bars | November 2003 | Weekly, processed over the weekend; the refreshed bundle is available before Monday's market open |
| Options EOD | May 2002 | Every trading day, in the evening (U.S. Eastern); subscription files are kept on the server for the most recent ~50 trading days |
All timestamps in all files are U.S. Eastern Time (EST/EDT according to the date). A new dataset version is published only after the complete update run has finished — a partially updated bundle is never sold (see Validation).
3. File Structure and Field Definitions
All files are plain CSV (RFC 4180, comma-separated, \r\n line endings, header row first). Stock files are named {TYPE}_{TICKER}_{day|minute_regular|minute_extended}.csv, where TYPE is the security type (CS = common stock, ETF, ADRC, PFD, WARRANT, …). Delisted tickers are named delisted_{TYPE}_{TICKER}_{delisting-date}_….csv so the delisting date is visible in the filename itself.
Stock files (16 columns):
| Column | Meaning |
|---|---|
| Time | YYYY-MM-DD (daily) or YYYY-MM-DD HH:MM:SS (minute), U.S. Eastern |
| Open, Close, High, Low | Unadjusted prices |
| Volume | Unadjusted share volume |
| Average | Volume-weighted average price (VWAP) |
| Transactions | Number of trades in the bar |
| AdjOpen … AdjAverage | The same six fields, fully adjusted for splits and dividends |
| Dividend | Cash dividend per share, present only on the ex-dividend date |
| Split | Split ratio (e.g. 1:4), present only on the split execution date |
Options files (18 columns): contract (OCC symbol), underlying, expiration, type (call/put), strike, style (A = American, E = European/index), bid, bid_size, ask, ask_size, volume, open_interest, quote_date, delta, gamma, theta, vega, implied_volatility. Each trading day also has a companion underlying-price file: symbol, open, high, low, close, volume, adjust_close.
Full per-column documentation with sample rows is on the stocks page and the options page.
4. Corporate Actions: Splits, Dividends, Renames, Delistings
- Splits and cash dividends are tracked per ticker from reference data. The event itself is recorded in-line: the Dividend column carries the cash amount on the ex-dividend date, the Split column carries the ratio on the execution date.
- When a new corporate action occurs, the entire price history of that ticker is recomputed and atomically replaced — adjusted columns are always consistent across the whole file, never patched incrementally.
- Delisted companies keep their full history, frozen as of the delisting date. This makes the dataset suitable for survivorship-bias-free backtesting.
- Ticker reuse is handled explicitly. The same symbol is often recycled by different companies over the years (some symbols have been used by several issuers). Each company's history is kept as a separate file, bounded by its own listing period, so price series from different companies are never spliced together.
- Renamed tickers (e.g. FB → META) end the old symbol's file at the rename date and continue under the new symbol.
5. Adjustment Versions
Every row carries both unadjusted and fully adjusted values side by side — there are no separate "adjusted" and "unadjusted" downloads to buy or reconcile. Adjusted prices apply the cumulative product of all later split and dividend factors (dividend factor = (close − dividend) / close measured on the prior trading day; split factor = from/to). Adjusted volume applies split factors only, as is standard. If you need a different adjustment convention (e.g. split-only), the in-line Dividend and Split event columns let you rebuild it from the unadjusted columns.
6. Trading Sessions, Holidays and Missing Bars
- Each stock ticker has three files: daily, minute regular (09:30–16:00 only) and minute extended (including pre-market from 04:00 and after-hours sessions).
- Market holidays are excluded. On half-days (early close), the regular-session file contains bars only up to the early close.
- Minute bars are built from actual trades: a minute with no trades has no row (bars are not zero-filled). Gaps inside a session therefore indicate no trading activity, not missing data.
7. Validation
- Every file update is written to a temporary file first and atomically renamed only on success — an interrupted run can never leave a truncated or corrupted file in the dataset.
- The published dataset version (the date you see on the download page) advances only after every ticker and every bundle in the run has completed. If anything fails mid-run, customers keep receiving the previous complete version.
- Upstream requests are automatically retried with backoff; failed tickers are logged and re-processed on the next run.
- All bars are cross-checked against an exchange holiday and half-day calendar before being written.
- Row structure is verified on every incremental update (column count and date continuity are checked before new rows are appended).
8. Known Limitations
- Prices are consolidated across venues; they can differ slightly from single-exchange data or from vendors using a different consolidation method.
- The stock dataset contains bars (OHLCV), not tick-by-tick trades or quotes.
- The options dataset is end-of-day: one record per contract per day, no intraday option data.
- Greeks and implied volatility are missing for some illiquid or deep in-the-money contracts where no usable quote exists.
- History begins November 2003 for stocks and May 2002 for options; earlier data is not available.
- This data is provided for research and analysis; it is not a real-time feed and not suitable for live trading decisions.
9. Sample Data
Free samples are identical in structure to the paid files — same columns, same naming, same formats — so you can build and test your entire pipeline before buying:
- Stocks: one full month (March 2023) of daily and minute data for all symbols, plus AAPL sample files.
- Options: six full months (January–June 2013) for all symbols, plus current subscription-format samples.
Loading a file in Python:
import pandas as pd
df = pd.read_csv("CS_AAPL_day.csv", parse_dates=["Time"])
print(df.tail())
Files also open directly in Excel, R (read.csv) and any tool that reads standard CSV.
Last updated: 2026-06-11. Questions about methodology: [email protected].