project1

530 project final report Team member: Fei Liang Yilun Lyu Jianfeng Zeng

### Project Description

Basketball Database

  • Create our own criteria considering different player's statistics in all the games
  • Analyze players performed best using our own criteria in different cities as a guest team player(For every city in past five years, which guest team players can get most points/assists/rebounds/steals/blocks in average?).

Data Source and Data Preparation

  • Generally the data will be collected from website BASKETBALL REFERENCE, including the following information: (1)Game results between teams in seasons from 2011 to 2016 ( add more results from earlier seasons if necessary ) (2)Team Statistics with information like team name, team home and players in the team (3)Player Statistics with information like player names, position, teams, points/assists/rebounds/steals/blocks per game and other statistics info

  • Because the data from BASKETBALL REFERENCE is well formed and organized, we don't need to do much work on cleaning or preparing it, though we do need to do some works like choosing useful attributes from all of them

Database Attributes

In this project, we have implemented following data tables with following attributes:

  • GameInfo table(each record describes the result of a game):  Game time  Home team  Home team score  Away team  Away team score

  • Box table(each record describes one player’s statistic on one specified game):  Player name  Player team  Date  Is_away  Opponent name  Points  Rebounds  Assists  Blocks  Steals

  • Player table(each record describes the basic info of a player):  Player Name  Team Name  Position  Jersey Number

  • Team table(each record describes the basic info of a team):  Team Name  League

UML Diagram is in attachment

Data Prepossessing

Since basketballreference.com supports all data in the website be exported as .csv files, and the PostgreSQL also supports import .csv files into database as soon as we have created adequate data tables with correct data format, so in this project we do not write too many codes to import data into database.

Data Setup(Create table) Following are codes we create data tables:

creation of team table

create table team(                                       
team_name character(3) primary key not null,
league text);

creation of player table

create table player(                                          
player_name text primary key not null,          
team_name character(3) references team(team_name),
position character(2),
jersey_number smallint);

creation of gameInfo table

create table gameInfo(                                          
date text,                                      
away_team_name character(3) references team(team_name),
away_points smallint,
home_team_name character(3) references team(team_name),
home_points smallint);

creation of box table

create table box(                                               
player_name text references player(player_name),
date text,                                             
team_name character(3) references team(team_name),
is_away character(1),
opp_name character(3) references team(team_name),
rbs smallint,
ast smallint,
stl smallint,
blk smallint,
pts smallint);

Database Insert Data

After we have created tables, we can import .csv files into database. The inner commands of PostgreSQL are following:

\copy team from '/home/warehouse/yilunlyu/team_info2.csv' with delimiter ',';
\copy player from '/home/warehouse/yilunlyu/530project/player.csv' with delimiter ',';
\copy gameinfo from '/home/warehouse/yilunlyu/530project/gameinfo.csv' with delimiter ',';
\copy box from '/home/warehouse/yilunlyu/530project/box.csv' with delimiter ',';

After finishing this we have imported all data into database.

Database Query

It is hard to use one SQL query to get the final result. So we decide to use views to store the partial results.

Let us use “get best player in scoring of different arenas when they play as a guest team player” for example.

First of all, we created view1:

create view view1 as 
select player_name,avg(pts) as avg_points, opp_name 
from box 
where is_away = 'Y' and opp_name in 
        (select team_name from team) 
group by player_name,opp_name 
order by avg_points desc;

This view calculates every players average points in all arenas when they play away, and sort points from high to low.

Part of results of view1 looks like this:

Stephen Curry      |     39.3333333333333333 | MIA
 James Harden       |     37.6000000000000000 | SAC
 Stephen Curry      |     36.3333333333333333 | CHO
 LeBron James       |     35.5000000000000000 | CLE
 Klay Thompson      |     34.3333333333333333 | IND
 LeBron James       |     34.3333333333333333 | NOP
 Russel Westbrook   |     34.0000000000000000 | PHO
 James Harden       |     33.8000000000000000 | GSW
 Anthony Davis      |     33.3333333333333333 | DET
 Stephen Curry      |     33.3333333333333333 | TOR
 Damian Lillard     |     32.6666666666666667 | TOR
 Stephen Curry      |     32.5000000000000000 | POR
 James Harden       |     32.3333333333333333 | TOR
 Stephen Curry      |     32.3333333333333333 | WAS
 LeBron James       |     32.0000000000000000 | DAL
 Carmelo Anthony    |     31.6666666666666667 | SAC
 Stephen Curry      |     31.5000000000000000 | PHO
 Stephen Curry      |     31.2000000000000000 | NOP
 James Harden       |     31.2000000000000000 | POR
 James Harden       |     31.1666666666666667 | LAL
 Kevin Durant       |     31.0000000000000000 | ORL

It seems we have found the result. However, it not the end. We can find out that there are 2 records which opponents’ name is “TOR”, which disobeys our goal. The reason is this view is just grouped by player names and their opponents’ names. Our final result is in view1, we need to dig it out. So we create view2:

create view view2 as 
select MAX(avg_points),opp_name 
from view1 
group by opp_name;

view2 is grouped by opp_name, and calculates the max average points of every opponent.

The last step is a SQL query:

select view1.player_name, view2.max,view2.opp_name 
from view1,view2
where view1.avg_points = view2.max and view1.opp_name = view2.opp_name limit 30;

We just need join view1 and view2 with view2’s max average points of every opponent, since there are only 30 teams in the league, so we just need to get 30 records for the result. The result is following:

Stephen Curry    | 39.3333333333333333 | MIA
 James Harden     | 37.6000000000000000 | SAC
 Stephen Curry    | 36.3333333333333333 | CHO
 LeBron James     | 35.5000000000000000 | CLE
 Klay Thompson    | 34.3333333333333333 | IND
 LeBron James     | 34.3333333333333333 | NOP
 Russel Westbrook | 34.0000000000000000 | PHO
 James Harden     | 33.8000000000000000 | GSW
 Anthony Davis    | 33.3333333333333333 | DET
 Stephen Curry    | 33.3333333333333333 | TOR
 Stephen Curry    | 32.5000000000000000 | POR
 Stephen Curry    | 32.3333333333333333 | WAS
 LeBron James     | 32.0000000000000000 | DAL
 James Harden     | 31.1666666666666667 | LAL
 Kevin Durant     | 31.0000000000000000 | ORL
 Stephen Curry    | 30.3333333333333333 | UTA
 LeBron James     | 30.0000000000000000 | LAC
 Kevin Durant     | 30.0000000000000000 | HOU
 Stephen Curry    | 29.8000000000000000 | OKC
 Stephen Curry    | 29.6666666666666667 | BOS
 James Harden     | 29.4000000000000000 | MIN
 Damian Lillard   | 29.3333333333333333 | BRK
 Kyrie Irving     | 29.3333333333333333 | SAS
 Stephen Curry    | 29.3333333333333333 | BRK
 James Harden     | 29.0000000000000000 | NYK
 DeMarcus Cousins | 28.6666666666666667 | PHI
 Stephen Curry    | 28.0000000000000000 | ATL
 Anthony Davis    | 28.0000000000000000 | MIL
 LeBron James     | 27.3333333333333333 | DEN
 Damian Lillard   | 27.3333333333333333 | CHI

We can find out Stephen Curry, James Harden and LeBron James are most dangerous opponents for most teams in NBA, since they have conquered most arenas in average points. Especially, Stephen Curry is the best scorer in 9 cities when he plays as a guest player, he is a master!

Things are similar in rebounds, assists, blocks and steals.

The result of rebounds is following:

DeAndre Jordan     | 20.6666666666666667 | PHI
 Andre Drummond     | 20.0000000000000000 | POR
 Pau Gasol          | 19.0000000000000000 | CHI
 DeAndre Jordan     | 18.3333333333333333 | IND
 DeAndre Jordan     | 18.1666666666666667 | DAL
 DeAndre Jordan     | 18.0000000000000000 | ATL
 Andre Drummond     | 18.0000000000000000 | GSW
 DeAndre Jordan     | 17.3333333333333333 | MIL
 Andre Drummond     | 17.0000000000000000 | NOP
 Andre Drummond     | 17.0000000000000000 | LAL
 Andre Drummond     | 16.8000000000000000 | BRK
 Andre Drummond     | 16.3333333333333333 | HOU
 Andre Drummond     | 16.0000000000000000 | SAS
 DeAndre Jordan     | 16.0000000000000000 | ORL
 Karl-Anthony Towns | 16.0000000000000000 | BOS
 Andre Drummond     | 15.6666666666666667 | PHO
 DeAndre Jordan     | 15.3333333333333333 | DET
 DeAndre Jordan     | 15.0000000000000000 | CHO
 Andre Drummond     | 15.0000000000000000 | SAC
 Andre Drummond     | 14.8000000000000000 | NYK
 DeAndre Jordan     | 14.8000000000000000 | MIN
 Anthony Davis      | 14.6666666666666667 | WAS
 Andre Drummond     | 14.6666666666666667 | LAC
 DeAndre Jordan     | 14.0000000000000000 | DEN
 Karl-Anthony Towns | 14.0000000000000000 | MIA
 DeAndre Jordan     | 13.3333333333333333 | CLE
 Karl-Anthony Towns | 12.5000000000000000 | MEM
 Karl-Anthony Towns | 12.0000000000000000 | OKC
 Andre Drummond     | 11.8000000000000000 | TOR
 Anthony Davis      | 11.8000000000000000 | UTA

In rebounds, DeAndre Jordan and Andre Drummond are experts.

The result of assists is following:

John Wall        | 13.3333333333333333 | LAL
 John Wall        | 13.0000000000000000 | HOU
 Chris Paul       | 11.5000000000000000 | POR
 Chris Paul       | 11.4000000000000000 | NOP
 John Wall        | 11.3333333333333333 | DAL
 John Wall        | 11.3333333333333333 | LAC
 Chris Paul       | 11.0000000000000000 | CLE
 Chris Paul       | 11.0000000000000000 | MIN
 John Wall        | 10.8333333333333333 | ATL
 John Wall        | 10.6666666666666667 | DEN
 Chris Paul       | 10.6666666666666667 | BOS
 John Wall        | 10.4000000000000000 | NYK
 Chris Paul       | 10.3333333333333333 | MIA
 John Wall        | 10.3333333333333333 | ORL
 Chris Paul       | 10.1666666666666667 | OKC
 John Wall        | 10.0000000000000000 | PHI
 John Wall        |  9.6666666666666667 | PHO
 Russel Westbrook |  9.6666666666666667 | IND
 John Wall        |  9.6666666666666667 | SAC
 John Wall        |  9.3333333333333333 | MEM
 John Wall        |  9.0000000000000000 | MIL
 LeBron James     |  8.8333333333333333 | TOR
 John Wall        |  8.8000000000000000 | CHI
 John Wall        |  8.6666666666666667 | CHO
 Chris Paul       |  8.3333333333333333 | WAS
 Chris Paul       |  8.3333333333333333 | GSW
 LeBron James     |  8.3333333333333333 | DET
 Chris Paul       |  8.0000000000000000 | BRK
 James Harden     |  8.0000000000000000 | BRK
 John Wall        |  7.3333333333333333 | SAS

John Wall is the King of assists.

The result of blocks is following:

Karl-Anthony Towns | 6.0000000000000000 | ORL
 Hassan Whiteside   | 6.0000000000000000 | DEN
 Hassan Whiteside   | 6.0000000000000000 | LAL
 Hassan Whiteside   | 5.0000000000000000 | DAL
 Brook Lopez        | 5.0000000000000000 | SAC
 Pau Gasol          | 5.0000000000000000 | CHI
 Anthony Davis      | 4.3333333333333333 | CLE
 Hassan Whiteside   | 4.2500000000000000 | DET
 Hassan Whiteside   | 4.0000000000000000 | CHO
 Hassan Whiteside   | 4.0000000000000000 | POR
 Dwight Howard      | 4.0000000000000000 | PHI
 Hassan Whiteside   | 3.5000000000000000 | NOP
 Hassan Whiteside   | 3.5000000000000000 | OKC
 Hassan Whiteside   | 3.5000000000000000 | BOS
 Serge Ibaka        | 3.4000000000000000 | HOU
 Derrick Favors     | 3.3333333333333333 | MIA
 Karl-Anthony Towns | 3.0000000000000000 | ATL
 Hassan Whiteside   | 3.0000000000000000 | MIN
 DeAndre Jordan     | 2.6666666666666667 | NYK
 DeAndre Jordan     | 2.6666666666666667 | IND
 Hassan Whiteside   | 2.5000000000000000 | LAC
 Hassan Whiteside   | 2.5000000000000000 | MEM
 Rudy Gobert        | 2.3333333333333333 | BRK
 Serge Ibaka        | 2.3333333333333333 | MIL
 Andre Drummond     | 2.3333333333333333 | GSW
 Rudy Gobert        | 2.3333333333333333 | MIL
 Anthony Davis      | 2.1666666666666667 | SAS
 DeAndre Jordan     | 2.0000000000000000 | UTA
 Hassan Whiteside   | 2.0000000000000000 | WAS
 Draymond Green     | 2.0000000000000000 | WAS

Towns and Whiteside are so amazing that they can get 6 blocks when they play against Orlando, Denver and Los Angeles!

The result of steals is following:

Paul Millsap     | 3.8000000000000000 | NYK
 Chris Paul       | 3.6666666666666667 | ATL
 James Harden     | 3.6666666666666667 | MIL
 Kemba Walker     | 3.6666666666666667 | MIN
 Paul Millsap     | 3.3333333333333333 | LAC
 Chris Paul       | 3.1666666666666667 | HOU
 Stephen Curry    | 3.0000000000000000 | PHI
 Paul Millsap     | 3.0000000000000000 | SAC
 Paul Millsap     | 3.0000000000000000 | SAS
 Mike Conley      | 3.0000000000000000 | MIA
 Kemba Walker     | 3.0000000000000000 | POR
 Kemba Walker     | 3.0000000000000000 | SAC
 Kyrie Irving     | 3.0000000000000000 | UTA
 Jimmy Butler     | 3.0000000000000000 | LAL
 Pau Gasol        | 3.0000000000000000 | CHI
 Kyle Lowry       | 3.0000000000000000 | LAL
 James Harden     | 2.8000000000000000 | PHO
 Russel Westbrook | 2.8000000000000000 | DAL
 John Wall        | 2.8000000000000000 | BOS
 Kemba Walker     | 2.8000000000000000 | TOR
 Chris Paul       | 2.8000000000000000 | NOP
 Kyle Lowry       | 2.6666666666666667 | OKC
 Kyrie Irving     | 2.6666666666666667 | GSW
 Kawhi Leonard    | 2.6666666666666667 | CLE
 Klay Thompson    | 2.6666666666666667 | WAS
 Kemba Walker     | 2.6666666666666667 | CLE
 Stephen Curry    | 2.6666666666666667 | MEM
 Khris Middleton  | 2.6666666666666667 | DEN
 Chris Paul       | 2.6666666666666667 | ORL
 James Harden     | 2.3333333333333333 | BRK

results matching ""

    No results matching ""