Stochastic Gradient Descent for Linear Regression#
Predict the miles per gallon from the curb weight and engine size, using Stochastic Gradient Descent and a linear model with L2 regularization. You need to code up SGD yourself such as in here.
import pandas as pd
# Replace 'imports-85.csv' with the actual file path if it's not in the current directory
dataset_url = "https://raw.githubusercontent.com/plotly/datasets/master/imports-85.csv"
# Read the CSV file into a Pandas DataFrame
df = pd.read_csv(dataset_url)
# Now, you can work with the 'df' DataFrame as needed
pd.set_option('display.max_columns', None) # Show all columns
df.head(10)
symboling | normalized-losses | make | fuel-type | aspiration | num-of-doors | body-style | drive-wheels | engine-location | wheel-base | length | width | height | curb-weight | engine-type | num-of-cylinders | engine-size | fuel-system | bore | stroke | compression-ratio | horsepower | peak-rpm | city-mpg | highway-mpg | price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | NaN | alfa-romero | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | 4 | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111.0 | 5000.0 | 21 | 27 | 13495.0 |
1 | 3 | NaN | alfa-romero | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | 4 | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111.0 | 5000.0 | 21 | 27 | 16500.0 |
2 | 1 | NaN | alfa-romero | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | 6 | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154.0 | 5000.0 | 19 | 26 | 16500.0 |
3 | 2 | 164.0 | audi | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | 4 | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102.0 | 5500.0 | 24 | 30 | 13950.0 |
4 | 2 | 164.0 | audi | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | 5 | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115.0 | 5500.0 | 18 | 22 | 17450.0 |
5 | 2 | NaN | audi | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | 5 | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110.0 | 5500.0 | 19 | 25 | 15250.0 |
6 | 1 | 158.0 | audi | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | 5 | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110.0 | 5500.0 | 19 | 25 | 17710.0 |
7 | 1 | NaN | audi | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | 5 | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110.0 | 5500.0 | 19 | 25 | 18920.0 |
8 | 1 | 158.0 | audi | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | 5 | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140.0 | 5500.0 | 17 | 20 | 23875.0 |
9 | 0 | NaN | audi | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | 5 | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160.0 | 5500.0 | 16 | 22 | NaN |
target_variable_column = df[['city-mpg']]
feature_columns = df[['curb-weight', 'engine-size']]
# Convert selected columns to a NumPy array
y = target_variable_column.values
X = feature_columns.values
print(X)
[[2548 130]
[2548 130]
[2823 152]
[2337 109]
[2824 136]
[2507 136]
[2844 136]
[2954 136]
[3086 131]
[3053 131]
[2395 108]
[2395 108]
[2710 164]
[2765 164]
[3055 164]
[3230 209]
[3380 209]
[3505 209]
[1488 61]
[1874 90]
[1909 90]
[1876 90]
[1876 90]
[2128 98]
[1967 90]
[1989 90]
[1989 90]
[2191 98]
[2535 122]
[2811 156]
[1713 92]
[1819 92]
[1837 79]
[1940 92]
[1956 92]
[2010 92]
[2024 92]
[2236 110]
[2289 110]
[2304 110]
[2372 110]
[2465 110]
[2293 110]
[2337 111]
[1874 90]
[1909 90]
[2734 119]
[4066 258]
[4066 258]
[3950 326]
[1890 91]
[1900 91]
[1905 91]
[1945 91]
[1950 91]
[2380 70]
[2380 70]
[2385 70]
[2500 80]
[2385 122]
[2410 122]
[2385 122]
[2410 122]
[2443 122]
[2425 122]
[2670 140]
[2700 134]
[3515 183]
[3750 183]
[3495 183]
[3770 183]
[3740 234]
[3685 234]
[3900 308]
[3715 304]
[2910 140]
[1918 92]
[1944 92]
[2004 92]
[2145 98]
[2370 110]
[2328 122]
[2833 156]
[2921 156]
[2926 156]
[2365 122]
[2405 122]
[2403 110]
[2403 110]
[1889 97]
[2017 103]
[1918 97]
[1938 97]
[2024 97]
[1951 97]
[2028 97]
[1971 97]
[2037 97]
[2008 97]
[2324 120]
[2302 120]
[3095 181]
[3296 181]
[3060 181]
[3071 181]
[3139 181]
[3139 181]
[3020 120]
[3197 152]
[3230 120]
[3430 152]
[3075 120]
[3252 152]
[3285 120]
[3485 152]
[3075 120]
[3252 152]
[3130 134]
[1918 90]
[2128 98]
[1967 90]
[1989 90]
[2191 98]
[2535 122]
[2818 156]
[2778 151]
[2756 194]
[2756 194]
[2800 194]
[3366 203]
[2579 132]
[2460 132]
[2658 121]
[2695 121]
[2707 121]
[2758 121]
[2808 121]
[2847 121]
[2050 97]
[2120 108]
[2240 108]
[2145 108]
[2190 108]
[2340 108]
[2385 108]
[2510 108]
[2290 108]
[2455 108]
[2420 108]
[2650 108]
[1985 92]
[2040 92]
[2015 92]
[2280 92]
[2290 92]
[3110 92]
[2081 98]
[2109 98]
[2275 110]
[2275 110]
[2094 98]
[2122 98]
[2140 98]
[2169 98]
[2204 98]
[2265 98]
[2300 98]
[2540 146]
[2536 146]
[2551 146]
[2679 146]
[2714 146]
[2975 146]
[2326 122]
[2480 110]
[2414 122]
[2414 122]
[2458 122]
[2976 171]
[3016 171]
[3131 171]
[3151 161]
[2261 97]
[2209 109]
[2264 97]
[2212 109]
[2275 109]
[2319 97]
[2300 109]
[2254 109]
[2221 109]
[2661 136]
[2579 97]
[2563 109]
[2912 141]
[3034 141]
[2935 141]
[3042 141]
[3045 130]
[3157 130]
[2952 141]
[3049 141]
[3012 173]
[3217 145]
[3062 141]]