project1
530 project final report Team member: Fei Liang Yilun Lyu Jianfeng Zeng
### Project Description
Basketball Database
- Create our own criteria considering different player's statistics in all the games
- Analyze players performed best using our own criteria in different cities as a guest team player(For every city in past five years, which guest team players can get most points/assists/rebounds/steals/blocks in average?).
Data Source and Data Preparation
Generally the data will be collected from website BASKETBALL REFERENCE, including the following information: (1)Game results between teams in seasons from 2011 to 2016 ( add more results from earlier seasons if necessary ) (2)Team Statistics with information like team name, team home and players in the team (3)Player Statistics with information like player names, position, teams, points/assists/rebounds/steals/blocks per game and other statistics info
Because the data from BASKETBALL REFERENCE is well formed and organized, we don't need to do much work on cleaning or preparing it, though we do need to do some works like choosing useful attributes from all of them
Database Attributes
In this project, we have implemented following data tables with following attributes:
GameInfo table(each record describes the result of a game): Game time Home team Home team score Away team Away team score
Box table(each record describes one player’s statistic on one specified game): Player name Player team Date Is_away Opponent name Points Rebounds Assists Blocks Steals
Player table(each record describes the basic info of a player): Player Name Team Name Position Jersey Number
Team table(each record describes the basic info of a team): Team Name League
UML Diagram is in attachment
Data Prepossessing
Since basketballreference.com supports all data in the website be exported as .csv files, and the PostgreSQL also supports import .csv files into database as soon as we have created adequate data tables with correct data format, so in this project we do not write too many codes to import data into database.
Data Setup(Create table) Following are codes we create data tables:
creation of team table
create table team(
team_name character(3) primary key not null,
league text);
creation of player table
create table player(
player_name text primary key not null,
team_name character(3) references team(team_name),
position character(2),
jersey_number smallint);
creation of gameInfo table
create table gameInfo(
date text,
away_team_name character(3) references team(team_name),
away_points smallint,
home_team_name character(3) references team(team_name),
home_points smallint);
creation of box table
create table box(
player_name text references player(player_name),
date text,
team_name character(3) references team(team_name),
is_away character(1),
opp_name character(3) references team(team_name),
rbs smallint,
ast smallint,
stl smallint,
blk smallint,
pts smallint);
Database Insert Data
After we have created tables, we can import .csv files into database. The inner commands of PostgreSQL are following:
\copy team from '/home/warehouse/yilunlyu/team_info2.csv' with delimiter ',';
\copy player from '/home/warehouse/yilunlyu/530project/player.csv' with delimiter ',';
\copy gameinfo from '/home/warehouse/yilunlyu/530project/gameinfo.csv' with delimiter ',';
\copy box from '/home/warehouse/yilunlyu/530project/box.csv' with delimiter ',';
After finishing this we have imported all data into database.
Database Query
It is hard to use one SQL query to get the final result. So we decide to use views to store the partial results.
Let us use “get best player in scoring of different arenas when they play as a guest team player” for example.
First of all, we created view1:
create view view1 as
select player_name,avg(pts) as avg_points, opp_name
from box
where is_away = 'Y' and opp_name in
(select team_name from team)
group by player_name,opp_name
order by avg_points desc;
This view calculates every players average points in all arenas when they play away, and sort points from high to low.
Part of results of view1 looks like this:
Stephen Curry | 39.3333333333333333 | MIA
James Harden | 37.6000000000000000 | SAC
Stephen Curry | 36.3333333333333333 | CHO
LeBron James | 35.5000000000000000 | CLE
Klay Thompson | 34.3333333333333333 | IND
LeBron James | 34.3333333333333333 | NOP
Russel Westbrook | 34.0000000000000000 | PHO
James Harden | 33.8000000000000000 | GSW
Anthony Davis | 33.3333333333333333 | DET
Stephen Curry | 33.3333333333333333 | TOR
Damian Lillard | 32.6666666666666667 | TOR
Stephen Curry | 32.5000000000000000 | POR
James Harden | 32.3333333333333333 | TOR
Stephen Curry | 32.3333333333333333 | WAS
LeBron James | 32.0000000000000000 | DAL
Carmelo Anthony | 31.6666666666666667 | SAC
Stephen Curry | 31.5000000000000000 | PHO
Stephen Curry | 31.2000000000000000 | NOP
James Harden | 31.2000000000000000 | POR
James Harden | 31.1666666666666667 | LAL
Kevin Durant | 31.0000000000000000 | ORL
It seems we have found the result. However, it not the end. We can find out that there are 2 records which opponents’ name is “TOR”, which disobeys our goal. The reason is this view is just grouped by player names and their opponents’ names. Our final result is in view1, we need to dig it out. So we create view2:
create view view2 as
select MAX(avg_points),opp_name
from view1
group by opp_name;
view2 is grouped by opp_name, and calculates the max average points of every opponent.
The last step is a SQL query:
select view1.player_name, view2.max,view2.opp_name
from view1,view2
where view1.avg_points = view2.max and view1.opp_name = view2.opp_name limit 30;
We just need join view1 and view2 with view2’s max average points of every opponent, since there are only 30 teams in the league, so we just need to get 30 records for the result. The result is following:
Stephen Curry | 39.3333333333333333 | MIA
James Harden | 37.6000000000000000 | SAC
Stephen Curry | 36.3333333333333333 | CHO
LeBron James | 35.5000000000000000 | CLE
Klay Thompson | 34.3333333333333333 | IND
LeBron James | 34.3333333333333333 | NOP
Russel Westbrook | 34.0000000000000000 | PHO
James Harden | 33.8000000000000000 | GSW
Anthony Davis | 33.3333333333333333 | DET
Stephen Curry | 33.3333333333333333 | TOR
Stephen Curry | 32.5000000000000000 | POR
Stephen Curry | 32.3333333333333333 | WAS
LeBron James | 32.0000000000000000 | DAL
James Harden | 31.1666666666666667 | LAL
Kevin Durant | 31.0000000000000000 | ORL
Stephen Curry | 30.3333333333333333 | UTA
LeBron James | 30.0000000000000000 | LAC
Kevin Durant | 30.0000000000000000 | HOU
Stephen Curry | 29.8000000000000000 | OKC
Stephen Curry | 29.6666666666666667 | BOS
James Harden | 29.4000000000000000 | MIN
Damian Lillard | 29.3333333333333333 | BRK
Kyrie Irving | 29.3333333333333333 | SAS
Stephen Curry | 29.3333333333333333 | BRK
James Harden | 29.0000000000000000 | NYK
DeMarcus Cousins | 28.6666666666666667 | PHI
Stephen Curry | 28.0000000000000000 | ATL
Anthony Davis | 28.0000000000000000 | MIL
LeBron James | 27.3333333333333333 | DEN
Damian Lillard | 27.3333333333333333 | CHI
We can find out Stephen Curry, James Harden and LeBron James are most dangerous opponents for most teams in NBA, since they have conquered most arenas in average points. Especially, Stephen Curry is the best scorer in 9 cities when he plays as a guest player, he is a master!
Things are similar in rebounds, assists, blocks and steals.
The result of rebounds is following:
DeAndre Jordan | 20.6666666666666667 | PHI
Andre Drummond | 20.0000000000000000 | POR
Pau Gasol | 19.0000000000000000 | CHI
DeAndre Jordan | 18.3333333333333333 | IND
DeAndre Jordan | 18.1666666666666667 | DAL
DeAndre Jordan | 18.0000000000000000 | ATL
Andre Drummond | 18.0000000000000000 | GSW
DeAndre Jordan | 17.3333333333333333 | MIL
Andre Drummond | 17.0000000000000000 | NOP
Andre Drummond | 17.0000000000000000 | LAL
Andre Drummond | 16.8000000000000000 | BRK
Andre Drummond | 16.3333333333333333 | HOU
Andre Drummond | 16.0000000000000000 | SAS
DeAndre Jordan | 16.0000000000000000 | ORL
Karl-Anthony Towns | 16.0000000000000000 | BOS
Andre Drummond | 15.6666666666666667 | PHO
DeAndre Jordan | 15.3333333333333333 | DET
DeAndre Jordan | 15.0000000000000000 | CHO
Andre Drummond | 15.0000000000000000 | SAC
Andre Drummond | 14.8000000000000000 | NYK
DeAndre Jordan | 14.8000000000000000 | MIN
Anthony Davis | 14.6666666666666667 | WAS
Andre Drummond | 14.6666666666666667 | LAC
DeAndre Jordan | 14.0000000000000000 | DEN
Karl-Anthony Towns | 14.0000000000000000 | MIA
DeAndre Jordan | 13.3333333333333333 | CLE
Karl-Anthony Towns | 12.5000000000000000 | MEM
Karl-Anthony Towns | 12.0000000000000000 | OKC
Andre Drummond | 11.8000000000000000 | TOR
Anthony Davis | 11.8000000000000000 | UTA
In rebounds, DeAndre Jordan and Andre Drummond are experts.
The result of assists is following:
John Wall | 13.3333333333333333 | LAL
John Wall | 13.0000000000000000 | HOU
Chris Paul | 11.5000000000000000 | POR
Chris Paul | 11.4000000000000000 | NOP
John Wall | 11.3333333333333333 | DAL
John Wall | 11.3333333333333333 | LAC
Chris Paul | 11.0000000000000000 | CLE
Chris Paul | 11.0000000000000000 | MIN
John Wall | 10.8333333333333333 | ATL
John Wall | 10.6666666666666667 | DEN
Chris Paul | 10.6666666666666667 | BOS
John Wall | 10.4000000000000000 | NYK
Chris Paul | 10.3333333333333333 | MIA
John Wall | 10.3333333333333333 | ORL
Chris Paul | 10.1666666666666667 | OKC
John Wall | 10.0000000000000000 | PHI
John Wall | 9.6666666666666667 | PHO
Russel Westbrook | 9.6666666666666667 | IND
John Wall | 9.6666666666666667 | SAC
John Wall | 9.3333333333333333 | MEM
John Wall | 9.0000000000000000 | MIL
LeBron James | 8.8333333333333333 | TOR
John Wall | 8.8000000000000000 | CHI
John Wall | 8.6666666666666667 | CHO
Chris Paul | 8.3333333333333333 | WAS
Chris Paul | 8.3333333333333333 | GSW
LeBron James | 8.3333333333333333 | DET
Chris Paul | 8.0000000000000000 | BRK
James Harden | 8.0000000000000000 | BRK
John Wall | 7.3333333333333333 | SAS
John Wall is the King of assists.
The result of blocks is following:
Karl-Anthony Towns | 6.0000000000000000 | ORL
Hassan Whiteside | 6.0000000000000000 | DEN
Hassan Whiteside | 6.0000000000000000 | LAL
Hassan Whiteside | 5.0000000000000000 | DAL
Brook Lopez | 5.0000000000000000 | SAC
Pau Gasol | 5.0000000000000000 | CHI
Anthony Davis | 4.3333333333333333 | CLE
Hassan Whiteside | 4.2500000000000000 | DET
Hassan Whiteside | 4.0000000000000000 | CHO
Hassan Whiteside | 4.0000000000000000 | POR
Dwight Howard | 4.0000000000000000 | PHI
Hassan Whiteside | 3.5000000000000000 | NOP
Hassan Whiteside | 3.5000000000000000 | OKC
Hassan Whiteside | 3.5000000000000000 | BOS
Serge Ibaka | 3.4000000000000000 | HOU
Derrick Favors | 3.3333333333333333 | MIA
Karl-Anthony Towns | 3.0000000000000000 | ATL
Hassan Whiteside | 3.0000000000000000 | MIN
DeAndre Jordan | 2.6666666666666667 | NYK
DeAndre Jordan | 2.6666666666666667 | IND
Hassan Whiteside | 2.5000000000000000 | LAC
Hassan Whiteside | 2.5000000000000000 | MEM
Rudy Gobert | 2.3333333333333333 | BRK
Serge Ibaka | 2.3333333333333333 | MIL
Andre Drummond | 2.3333333333333333 | GSW
Rudy Gobert | 2.3333333333333333 | MIL
Anthony Davis | 2.1666666666666667 | SAS
DeAndre Jordan | 2.0000000000000000 | UTA
Hassan Whiteside | 2.0000000000000000 | WAS
Draymond Green | 2.0000000000000000 | WAS
Towns and Whiteside are so amazing that they can get 6 blocks when they play against Orlando, Denver and Los Angeles!
The result of steals is following:
Paul Millsap | 3.8000000000000000 | NYK
Chris Paul | 3.6666666666666667 | ATL
James Harden | 3.6666666666666667 | MIL
Kemba Walker | 3.6666666666666667 | MIN
Paul Millsap | 3.3333333333333333 | LAC
Chris Paul | 3.1666666666666667 | HOU
Stephen Curry | 3.0000000000000000 | PHI
Paul Millsap | 3.0000000000000000 | SAC
Paul Millsap | 3.0000000000000000 | SAS
Mike Conley | 3.0000000000000000 | MIA
Kemba Walker | 3.0000000000000000 | POR
Kemba Walker | 3.0000000000000000 | SAC
Kyrie Irving | 3.0000000000000000 | UTA
Jimmy Butler | 3.0000000000000000 | LAL
Pau Gasol | 3.0000000000000000 | CHI
Kyle Lowry | 3.0000000000000000 | LAL
James Harden | 2.8000000000000000 | PHO
Russel Westbrook | 2.8000000000000000 | DAL
John Wall | 2.8000000000000000 | BOS
Kemba Walker | 2.8000000000000000 | TOR
Chris Paul | 2.8000000000000000 | NOP
Kyle Lowry | 2.6666666666666667 | OKC
Kyrie Irving | 2.6666666666666667 | GSW
Kawhi Leonard | 2.6666666666666667 | CLE
Klay Thompson | 2.6666666666666667 | WAS
Kemba Walker | 2.6666666666666667 | CLE
Stephen Curry | 2.6666666666666667 | MEM
Khris Middleton | 2.6666666666666667 | DEN
Chris Paul | 2.6666666666666667 | ORL
James Harden | 2.3333333333333333 | BRK