NPM in numbers

NPM.js is a package manager that is essential to the node.js community. As of September 22nd, it had over 99,000 packages hosted, a number that grew by 10,000 packages in the last two months alone.

As the node.js package ecosystem grows, it is essential to understand the behaviour of the system: how many packages and what packages are crucial to the community, how the ecosystem evolves over time and can we predict the success of a package based on its metrics alone.

Major Version

  • 82% of packages have a major version of 0.*
  • 14% are version 1.*
  • 3% are greater than 1.*.

    all_packages['major_version_category'].value_counts(normalize=True)
    version_major_zero      0.822502
    version_major_one       0.143654
    version_major_gt_one    0.033843
    

Major Version & Releases

  • The median value for number of releases is 3 releases

    all_packages.version_count.median()
    
  • Packages whose major version is greater than 1 have a higher median of 7.

    • 0.* - 3
    • 1.* - 3
    • > 1.* - 7

    code:

    all_packages.groupby(['major_version_category'])['version_count'].median()
    version_major_gt_one      7
    version_major_one         3
    version_major_zero        3
    
  • The higher the major version, the higher the number of releases a package has on average

    sns.barplot("major_version_category", "version_count", data=all_packages);
    

  • The package with the most versions is apostrophe with 433 releases! It's current version number is 0.5.197

    all_packages[all_packages['version_count'] == all_packages.version_count.max()]
    

Major Version & Last updated

  • 71% of packages have been updated in the past year
  • 21% of packages have been updated between a year and two years ago

    all_packages['updated_category'].value_counts(normalize=True)
    within_last_year                 0.706571
    havent_been_updated_in_1_year    0.208503
    havent_been_updated_in_2_year    0.068142
    havent_been_updated_in_3_year    0.016784
    
  • Packages that have a higher version number tend to be updated more frequently. Days since last modified median by version:

    • Version 0.* - 210 days
    • Version 1.* - 118 days
    • Version > 1.* - 112 days

    code:

    all_packages.groupby(['major_version_category'])['deltaSinceModifiedDays'].median()
    version_major_gt_one      112
    version_major_one         118
    version_major_zero        210
    
  • More packages whose major version is > 1.* have been updated within the past year (87%) than those whose version is 0.* (68%). Having said that, there were 53,180 packages whose version is 0.* that have been updated in the last year, versus only 2,771 whose version is greater than 1.*.

    cats = all_packages['major_version_category'].unique()
    for c in cats:
      print "=> " + c
      sub = all_packages[all_packages['major_version_category'] == c]
      print sub.groupby(['updated_category'])['version_count'].count() / float(len(sub))
    
    => version_major_zero
    updated_category
    havent_been_updated_in_1_year    0.222227
    havent_been_updated_in_2_year    0.074814
    havent_been_updated_in_3_year    0.018295
    within_last_year                 0.684665
    Name: version_count, dtype: float64
    
    => version_major_one
    updated_category
    havent_been_updated_in_1_year    0.155831
    havent_been_updated_in_2_year    0.039437
    havent_been_updated_in_3_year    0.010541
    within_last_year                 0.794191
    Name: version_count, dtype: float64
    
    => version_major_gt_one
    updated_category
    havent_been_updated_in_1_year    0.098561
    havent_been_updated_in_2_year    0.027534
    havent_been_updated_in_3_year    0.006884
    within_last_year                 0.867021
    Name: version_count, dtype: float64
    
  • 28,322 packages (~30%) have been updated in the last 3 months alone.

    len(all_packages[all_packages['deltaSinceModifiedDays'] < 365/4.0]) / float(len(all_packages))
    0.29990999099909993
    

Age

In the past year alone, there have been 54,051 packages that were added to npm. This is 57% of all packages on npm.

  len(all_packages[all_packages.age < 365]) / float(len(all_packages))
  0.5723619420765605

Age & Major Version

  • Regardless of the age, most packages aren't past major version 0.*. Over 80% of packages in every age bucket are 0.*.

    cats = all_packages['age_category'].unique()
    for c in cats:
        print "=> " + c
        sub = all_packages[all_packages['age_category'] == c]
        print sub.groupby(['major_version_category'])['package'].count() / float(len(sub))
    
    => age_0.5_year
    major_version_category
    version_major_gt_one      0.026615
    version_major_one         0.149802
    version_major_zero        0.823583
    Name: package, dtype: float64
    => age_2_year
    major_version_category
    version_major_gt_one      0.040523
    version_major_one         0.129075
    version_major_zero        0.830364
    Name: package, dtype: float64
    => age_3_year
    major_version_category
    version_major_gt_one      0.041274
    version_major_one         0.123679
    version_major_zero        0.835047
    Name: package, dtype: float64
    => age_1_year
    major_version_category
    version_major_gt_one      0.032167
    version_major_one         0.136044
    version_major_zero        0.831789
    Name: package, dtype: float64
    => age_0.25_year
    major_version_category
    version_major_gt_one      0.025377
    version_major_one         0.192218
    version_major_zero        0.782405
    
  • The semver specification gets applied fairly liberaly to package versioning. Some are very careful to bump even minor versions, while others speed along past the infamous 1.*, never to look back. Looking at the relationship between package age and its major version, there is no correlation between the two.

    s = all_packages
    x = 'version_major'
    y = 'age'
    
    mod = ols(formula='version_major ~ age', data=s)
    res = mod.fit()
    print res.summary()
    
                                OLS Regression Results
    ==============================================================================
    Dep. Variable:          version_major   R-squared:                       0.000
    Model:                            OLS   Adj. R-squared:                 -0.000
    Method:                 Least Squares   F-statistic:                   0.09118
    Date:                Wed, 24 Sep 2014   Prob (F-statistic):              0.763
    Time:                        11:48:31   Log-Likelihood:                -98690.
    No. Observations:               94435   AIC:                         1.974e+05
    Df Residuals:                   94433   BIC:                         1.974e+05
    Df Model:                           1
    ==============================================================================
                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
    ------------------------------------------------------------------------------
    Intercept      0.2340      0.004     65.640      0.000         0.227     0.241
    age        -2.175e-06    7.2e-06     -0.302      0.763     -1.63e-05  1.19e-05
    ==============================================================================
    Omnibus:                   190799.917   Durbin-Watson:                   1.736
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):       2537154961.582
    Skew:                          16.109   Prob(JB):                         0.00
    Kurtosis:                     805.348   Cond. No.                         788.
    ==============================================================================
    

Age & Deep dependents

  • Deep dependents is a value representing the number of packages that depend on another packge directly and indirectly (through other packages.)

  • 95% of packages on npm have less than 8 packages depending on them.

    q95 = all_packages['deep_dependents'].quantile(q=.95)
    8.0
    
  • That leaves us with 4,477 (5%) of packages that represent the most dependent upon packages on npm.

    most_dependent_upon = all_packages[all_packages['deep_dependents'] > q95]
    len(most_dependent_upon)
    
  • The top 5% of most dependend on packages exhibit a clear trend: The older the package is, the more likely it is to be in that top 5%. Packages that have been created in the past year comprise only ~12% of the top 5%, while packages created 1-2 years ago are ~18% of the top 5% with the remaining 70% of the top 5% being older than a year!

    most_dependent_upon = all_packages[all_packages['deep_dependents'] > q95]
    most_dependent_upon.groupby(['age_category'])['package'].count() / len(most_dependent_upon)
    
    age_category
    age_0.25_year    0.041099
    age_0.5_year     0.082868
    age_1_year       0.186285
    age_2_year       0.342864
    age_3_year       0.346661
    

Deep Dependents & Age

  • The majority of packages on npm, 70,821 (75%), have no packages depending on them.

    len(all_packages[all_packages['deep_dependents'] == 0]) / float(len(all_packages))
    0.7499444062053264
    
  • In the top 5% of dependend upon packages, older packages have slightly more dependents, but not much. Past year: 1,317 (median 20), the year before: 1,866 (median 24) and the before that: 1,931 (median 33). Age is not a significant predictor of dependency.

    thisyear = most_dependent_upon[most_dependent_upon['age'] < 365]
    lastyear = most_dependent_upon[(most_dependent_upon['age'] >= 365) & (most_dependent_upon['age'] < 2 * 365)]
    yearbefore = most_dependent_upon[(most_dependent_upon['age'] >= 365 * 2)]
    
    print len(thisyear), len(lastyear), len(yearbefore)
    print thisyear['deep_dependents'].median(), lastyear['deep_dependents'].median(),yearbefore['deep_dependents'].median()
    

Dependents & Maintainers

  • NPM packages are not the most collaborative of endeavours: Most packages (88,132, 93%) on npm have only one maintainer (although contributors are not accounted for.)

    print len(all_packages[all_packages['maintainer_count'] == 1])
    print len(all_packages[all_packages['maintainer_count'] == 1]) / float(len(all_packages))
    
  • There is no relationship between how many dependents a package has and how many maintainers it has. Turns out, it doesn't take a village to raise a successful npm package.

    s = all_packages[all_packages['deep_dependents'] > 99]
    x = 'maintainer_count'
    y = 'deep_dependents'
    
    mod = ols(formula='deep_dependents ~ maintainer_count', data=s)
    res = mod.fit()
    print res.summary()
    
    f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
    sns.regplot(x, y, s, ax=ax1)
    ax1.set(xlabel=x, ylabel=y);
    sns.residplot(x, y, s, color="seagreen", ax=ax2)
                                OLS Regression Results
    ==============================================================================
    Dep. Variable:        deep_dependents   R-squared:                       0.000
    Model:                            OLS   Adj. R-squared:                 -0.001
    Method:                 Least Squares   F-statistic:                    0.1094
    Date:                Thu, 25 Sep 2014   Prob (F-statistic):              0.741
    Time:                        13:43:43   Log-Likelihood:                -10201.
    No. Observations:                1072   AIC:                         2.041e+04
    Df Residuals:                    1070   BIC:                         2.042e+04
    Df Model:                           1
    ====================================================================================
                           coef    std err          t      P>|t|      [95.0% Conf. Int.]
    ------------------------------------------------------------------------------------
    Intercept         1731.1602    124.380     13.918      0.000      1487.104  1975.217
    maintainer_count   -12.6690     38.308     -0.331      0.741       -87.836    62.498
    ==============================================================================
    Omnibus:                      861.536   Durbin-Watson:                   1.846
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14296.422
    Skew:                           3.750   Prob(JB):                         0.00
    Kurtosis:                      19.242   Cond. No.                         4.16
    ==============================================================================
    
In []: