Capacity estimation at bottleneck locations along urban motorways are of importance for traffic management purposes. If the capacity is known, bottlenecks with observed low capacity can be identified and improvements, such as active control strategies or infrastructure modifications, can be initiated to reduce the number of breakdowns. In this paper, we propose a methodology for estimating the capacity distribution for clusters with similar speed-flow relations before a breakdown for large data sets. Explanatory variables are identified to show that the proposed methodology can be used to categorize each cluster based on the bottleneck characteristics. The methodology consists of the following steps: (1) an automated process to identify breakdowns, (2) a clustering method to cluster breakdown days and locations with similar capacity levels, and (3) a statistical approach for estimating the capacity distribution for each cluster. Further, we illustrate how the proposed methodology can be used to identify bottleneck characteristics with the greatest impact on the capacity level by using one year of empirical observations of speed and flow from an urban motorway stretch south of Stockholm in Sweden. The results show that the proposed methodology has potential to identify bottlenecks with frequently observed low capacity and to find bottleneck characteristics with a large impact on breakdown capacity.