Thursday, November 7, 2013

College Football at its Finest

Tonight is a big night in college football. Maybe even bigger than Saturday. And this is with only 2 games. For a college student like myself, this is the type of day when I get zero work done. Zilch. Nada. I'm all in on sports today (actually typing this in class). As my friend and I say, you're either all in or all out.

Baylor v. Oklahoma is a bigger game than most people believe. Baylor is ranked No. 6 in the BCS standings with an offense that could take out a city. Oklahoma is a top-10 team (by virtue of being exactly No. 10). A win for Baylor here would be huge, as it not only keeps their hopes alive for a possible shot at the championship but sets them up well for for a tough finishing stretch. A loss would drop them down in the rankings quite a bit, and would not be a good way to enter the homestretch (with the potential to leapfrog Ohio State later in the year for No. 4). Oklahoma, on the other hand, is just good. They're not elite this year, but they are very good. I've only watched them play once, but I think they have enough to give Baylor a fight.

The only "real" teams Baylor has faced have been Kansas State and West Virginia. Both teams exposed some problems with the Baylor team: West Virginia, as a team with a half-decent offense of its own, dropped 42 on the Bears. And Kansas State, known for being fundamentally sound on defense, held the Bears to a season-low 35 points. I would gamble that Oklahoma has an offense that can put up 40+ on Baylor, and a defense that can hold Baylor to less than 50 points. Basically, this has the potential to be a very exciting, high-scoring game.

I don't know if I can say more about Oregon v. Stanford than has already been said. The Ducks can score with the best of them. Stanford has a very sound team all around. Both are top-5 teams, which means this has very direct consequences for the National Championship. If Stanford wins out, they still have a shot at the Natty (though very, very slim). A loss to Oregon would not be the end of the world for the Cardinal, but it would destroy Oregon's hopes for the season. The Ducks came in hoping for a chance at the Natty, and if they win out they get to go. But winning out is the essential part, and Stanford on the road is easily the most difficult game left on their schedule.

So what will happen? As someone who comes from the West Coast and is a Duck fan, I think the Ducks will put up around 40 points. I don't think Stanford will score 30. Stanford just doesn't have the offense to stand up to a very underrated Oregon defense. Stanford thrives on running the ball and play action passes off of an established run game, but Oregon's D-Line matches up with with the Stanford O-Line. On the other side of the ball, with a week to rest and a finally healthy group of offensive terrors (Mariota, DAT, Marshall, Tyner, Addison, Huff, etc.), I don't think Stanford will make it through the second half. Like most Stanford-Oregon match ups, I predict a close game at the half but a quick Oregon run halfway through the third quarter to put the game out of reach.

I'm excited!

Friday, November 1, 2013

Deciphering the Football Blowout

Ful disclosure: I am only a casual football fan. But I am an avid sports fan, and so I think I have the ability to write down my thoughts. This has been a question that has bothered me for quite some time: what does it mean for a football game to be a blowout, and how can we compare different blowouts.

For example, Oregon beats Tennessee at home in week 3 by a score of 59-14.
Alabama beat Tennessee at home in week 9 by a score of 45-10.

It seems pretty clear that both Oregon and Alabama are significantly better than Tennessee this year. But can we use these games against common opponents to determine which team is better?

At first glance, it would seem that the Oregon defense is about 30% worse than Alabama while the offense is about 25% better. So the offense/defense comparison doesn't seem to work, since 5% is really a very small amount. But we know that neither team actually tried after the second quarter, at which point Alabama led 35-0 and Oregon led 38-7. You can jump to all sorts of conclusions based on this, too: Oregon was able to score more points but couldn't keep Tennessee from scoring, so they must be worse.

But that's all missing the bigger picture: both Oregon and Alabama had their respective games in the bag at the start of halftime. In reality, it was clear about halfway through the 1st quarter that the game would be a blowout. Strange things happen in blowouts. Offenses start slowing down and stop playing at their normal intensity. Defenses start playing prevent, knowing they don't have to give it 100% on each play. And, at a certain point, the second string teams come in to play, leading to even fewer points scored and more points allowed.

In the world of college football, the timing of games is incredibly important. These two games were separated by 6 weeks, which is enough time for an entire Georgia team to get injured and fall out of the national title talk. Lots can happen. The team that played at Alabama was not the team that played at Oregon. At Oregon they were excited and overly energized for a new season against a strong opponent, and it led to several mistakes but some excitement early on. The team in Alabama was worn down from a tough string of games and looked defeated after the first Bama touchdown.

Basically, you can't compare two football games. I would even argue that you can't compare games in back-to-back weeks. Teams have good days and bad days, and sometimes it can be as little as the food the quarterback ate last night that determines the outcome of a jump ball.

A blowout is a blowout, and I don't see any reasonable way to compare these two games other than knowing that both Alabama and Oregon are both significantly better than Tennessee.

Tuesday, August 27, 2013

Left is the New Right

I've been thinking about this for a long time, but now that I have to take the subway to and from work every day it's become just too much. New Yorkers are weird, and apparently left is the new right.

Scenario: I'm walking down the street and spot a Duane Reade (basically Walgreens for New Yorkers). It has two door next to each other. I reach for the one on the right. After all, people in America tend to drive on the right, are right handed, and think that the political left is bad. But as I walk through the door I collide with someone exiting. After an apology and some sidestepping, I turn around and see that the left door is actually the one marked Enter and the one I just walked through Exit. Maybe it has to do with the proximity to the checkout area, you say? Well, you'd be wrong - the checkout is right next to the Enter door. That's weird.

But it's everywhere in New York. Every Duane Reade with two doors is set up like this. Sometimes it makes sense because of the position of the checkout area, sometimes it doesn't. I walked into a Staples (I'll do some advertising here while I'm writing) and it was the same thing. It's confusing, and I keep bumping into people because of it.

The worst part is that it is starting to come into play in other parts of New York. Thankfully, drivers still drive on the right. And large swarms of people stay to the right. But the flow of things is consistently opposite what it should be. Case in point: every day I want through the underground tunnel connecting the 42nd Street subway station with the Times Square subway station. It's narrow and it's hot, but it gets the job done. The stream of foot traffic sticks works just like a road in that the flow stays to the right. But the faster lane isn't in the middle (the left), it's all the way against the wall to the right. It bothers me. It shouldn't, but it does. Every day for 10 minutes I have to go through this primitive traffic design, and with all the body odor floating around and no airflow it just drives me nuts.

So if you live in New York and do things on the right side, from eating to driving, be cool and be the first to start doing it on the left side. It's the hip thing to do. Bikes are already going against traffic everywhere in Manhattan. See what you can come up with!

Thursday, August 15, 2013

The Smartphone has Taken Over

You already knew that, but now the rest of the world knows, too!

While scrolling through Google News today I came across an interesting article. I thought it couldn't be true. Surely dumbphones weren't more popular than smart phones just a year ago! But that's the case. I guess it comes with living as an upper-middle class white kid that has always been around academic institutions and all those young trendsetters, but I really thought that smartphones had been beating out dumbphones for some time now. Well, now it's official. No longer is a phone about just talking and texting, it's about playing games and surfing the web and taking notes. The smartphone is one of the greatest inventions ever, and it looks like it's here to stay.

Thursday, August 8, 2013

Competitive Baseball Nerd

I've spent enough time thinking and reading about baseball in my life to write a book. Several books. Or an almanac. When people say things like "student of the game," it usually only means the athlete is capable of having conversations about their sport. It's not a particular honor to be called a student of the game, but I'm going to call myself that. I'm a student of the game in a much more literal sense.

I study baseball. I study the crap out of it. And I'm very competitive, like I'm fighting for medical school. If someone knows something I don't, I have to know it. I have to know how they know about it. And then the next time that thing, whatever it may be, comes up, I'll know about it. And I'll know it better than the person that showed me.

But like any other competitive person, it's not just other people that drive me. It's myself. I need to have the advantage before someone comes up to talk to me. I need to know exactly what the linear weights behind wRC+ mean. I need to know why they're linear and not some other, non-linear model. I need to know why I would choose to use FIP, xFIP, or SIERA over other forms of pitching evaluation. I compete with myself - I need to know more than I do. I suppose that's what could constitute an addiction or obsession, but I'm pretty happy with what I've learned so far. And happiness is good, so I'll stick with it.

Thursday, June 27, 2013

In the Land of the Blue Dress Shirt

This is the first time I've had to wear a dress shirt every day. In fact, it might be the first time I've ever worn a dress shirt all day. It's an interesting experience. I don't like the top button, but thankfully there are no ties in the Office of the Commissioner. No jackets, either, which I have mixed feelings about. I love the look, and I like the feel of the look, but I don't like the feel. Suit jackets are just that little bit awkward. That little bit stiff right in the wrong places (like trying to reach for something and noticing the should doesn't even go to 90 degrees). That little bit too flexible in the wrong places (like trying to reach for something and watch the lapel ride up). And, since this is the summer in New York, that little bit too warm. I actually love how warm suit jackets are, but only in the winter. In a muggy city like this: no thanks.

I thought I'd see more white shirts here. Instead it's almost entirely blue. Like amazingly blue. Incredibly blue. Noticing the non-blue shirts is sort of obsession of mine now, so I started counting today. Women don't count because they almost always wear black or white shirts. Come to think of it, I've never seen a woman wearing a blue shirt. Maybe the blue shirt is a sign of manliness? I never thought of faded/light blue that way.

My blue shirt watching from walking to and from work today:
        Estimated people seen total: ~1000
        Non-blue shirts: ~300 (it's hard to count that high)

Tune in next time for another exciting adventure

Wednesday, May 29, 2013

What is a Power User?

I was reading an article by some techie about a week ago (I really don't remember where) that caught my attention. He was talking about how he built himself a Macintosh computer from scratch, and noted that he needed more processor power because he was a "power user." By which he meant that he had many applications open at once. This got me thinking, what does it mean to be a power user?

Wikipedia, of course, has an answer. But for phrases like this, connotations change and they mean different things to different people. I grew up in a household with a programmer and a math professor, both of whom were more than capable computer users, and certainly power users. My parents have been programming since the early 80's, and though recently my mom has only written basic shell scripts to automate her processes, they have kept up with the evolution of programming and computer applications as they have changed. My mom never really closes applications because she needs them for her work: 8 terminal windows (each one has its specific task - one for mail, one for her TEX files, etc.), her PDF reader, internet, a TEX editor, and Finder at the absolute minimum. My dad can have a lot of stuff open, depending on what he's doing. But neither say that having 12 applications open means you're a "power user."

Just looking at applications I have installed, I could have the internet, iTunes, Stickies, Terminal, Messages, and Twitter open without even thinking. When writing a paper, I could easily have a PDF reader,  Pages, and OpenOffice (in fact, when OpenOffice is open I can almost guarantee Pages will be open) open, as well. Maybe even Evernote. And if I'm working on some computer science assignment, add in Sublime Text 2 and Xcode. I guess I could be waiting for a call, so throw in Skype. That's 13 applications. And I can't see myself using more than that for anything. Usually I close things if they're not being used. Just because I only have Sublime, the internet, Finder, and one or two Terminal windows means I'm not a power user? I like to think of myself as a power user. I have Terminal and Alfred shortcuts for everything. I can do basic programming tasks, I know my basics of debugging, and I can help others figure stuff out (though I get really impatient).

So what is a power user? I don't know. But it's not having applications open. It's a knowledge of how your computer works, automating your most common workflows, and being able to navigate applications and operating system without having to reference the help guide repeatedly (but please read the manual when you get the thing), even knowing some tricks and easter eggs here and then (I'm going to exclude my knowledge of word processing applications here - never can find those stupid table settings...). It's taking an interest in learning more about the technology you have, wanting to find ways to improve it. But most importantly, it's knowing exactly when to read the fucking manual.

Thursday, May 16, 2013

The Best Time of Your Life?

People say that college is the best time of your life. You get to still be a kid, but you're on your own. It can be tough at times, but the overall experience is a greatly positive one. For my mother and many others, it is a time that can be relived over and over in memory. I would like to be able to say that this will be true of me, but at the rate I'm going I will be happy to leave college and not think about it ever again.

It's true that we usually only remember the good things in life. After all, we like those warm, fuzzy feelings. It's nostalgia. So it's likely my memories have been skewed, and what I'm going through right now is worse only by comparison. But I do remember my moody issues from sophomore year of high school, and what I'm experiencing right now is very similar to those. Only now I'm on my own.

My mom visited me last week. It was Mother's Day weekend and Scav, so I had a good excuse to avoid Scavving too much. This was the first time in a very long time that I got along with my mom. A very long time. We've been bickering at each other since at least 4th grade, however many years ago that was. For a few years I tried to avoid my house, and I was looking forward to leaving for college for quite a while now. And I love being out of my parent's house. They're still paying for a very sizeable portion of my education, but the physical distance between us has eased all the tensions. Maybe I just needed to not be in constant contact with her to calm down. Who knows. Regardless, I had a great weekend, just hanging out with my mom and my girlfriend. It's a little scary how well they get along, actually. Best friends forever, as they say.

But since then all the emotional and mental stress has come flooding back, and it's worse than ever. I knew it would, and so I started working out more often (working out releases endorphins, or so I'm told) to help ease the transition. But I'm still not happy where I am.

Where to begin? Every day when I wake up I have to stare at an ancient ceiling in an ancient building with ancient carpets. It was kind of pretty, but that charm wears off quickly when you start to deal with mice, silverfish, no central air, and other amenities common to old buildings. The new furniture this year was nice, but it was like putting lipstick on a pig. Once the University finishes building Campus North, or whatever they're going to call it, I hope they start to fix up Snell-Hitchcock.

So I can't stand my home. I wake up and sigh. I walk in the door and sigh. To say I'm unhappy with where I live is an understatement. I love the people, but when you can't stand the place itself, it's hard to make up for it. My hope was to get a job as an RA. I've wanted to be one since before I came to college. I've dealt with being the new kid many, many times in my life. I'm a passable cook, and I like to think I'm diplomatic enough to handle the stickier situations. But I wasn't selected (read last post). I still haven't recovered from that. Devastated might be the best word to describe it. I've only ever had the "heavy heart" feeling a few times in my life (that I remember): my last high school baseball game (I blame Jace Fry for the feeling there) and when I found out I was not selected to be an RA.

The baseball feeling went away quickly. I've lost plenty of games before. This one stung a lot more than normal, but not anything a drive home couldn't fix. But I'm still feeling it from the RA selection process.

I tried to get a room with a friend of mine in another dorm to fix part of the problem (staying in my current dorm), but that didn't go through. So I reserved a room in Snell and tried to think positive thoughts. The problem was: there weren't any besides being closer to my girlfriend. That was reason enough to consider staying, but when that feeling continued after my mother left, I went to the general lottery (stressful) and got a room with someone I'd met at the gym a few times before (stressful). It's not that I made the right decision. But I needed to get out more than I wanted anything else. That was last night.

I've been a train wreak all day today. I couldn't sit through class. I left the classroom to sit in the hall for a few minutes about halfway through to calm myself down. I was breaking into tears randomly for no reason. As soon as class was over I ran to the housing office to see if I could switch back. Not that I would, but I thought it would make me feel better to know. No chance, they said. So I walked back to my room and just, well, cried. Not full on bawling or anything, but welling up every few minutes.

Since this has been going on for a few weeks now, I scheduled a appointment with a counselor and my advisor (not at the same time). If it's anything like that counselor I had sophomore year of high school, this will not be much help at all, but I just need to vent to someone who's not my girlfriend right now (she's seen me sad enough these past weeks). And I'm talking to my advisor about the possibility of taking a leave of absence. I want to finish this quarter, but we'll see what happens.

I've been unable to form any real coherent thoughts all day. I've taken two trips to the grocery store to get fruit and have been staring blankly at my math textbook for a few hours. I went to discussion session for math (since I have a midterm on Monday - oh, joy!) and was sort of able to pay attention. My mom wants me to try to take a medical leave before the quarter is over. I'm going to wait on my appointments next week, but we'll see what happens.

I hate to make this blog a venting place, but sometimes it just happens. I'm still really looking forward to this summer, so that's one positive thing I have going for me. Just right now it feels like one of the only things.

Saturday, May 4, 2013

It's the Little Things

It's been a very long time since I've posted, but that's because a lot has been happening.

First the disappointments: that math midterm a few weeks ago went absolutely horribly, I have a nerve contusion that is keeping me from throwing a baseball, and I was not chosen to be an RA next year. I'm particularly upset about the RA situation because that was something I was really looking forward to doing, but I guess now I'll just have a lot more free time than I had planned on next year. Also, I will still help out with Orientation regardless, so I'll still feel helpful.

The successes outweigh the failures. Well the one success: I will be an intern with Major League Baseball's Commissioner's Office this summer. It happened right when I thought I wouldn't be able to find a job for the summer, so that was both exciting and relieving. I already have housing reserved in New York, and I cannot wait for summer to start. I also had my first non-phone interview (well, if you don't count video chatting as a phone) and they had me solve a Rubik's cube at the beginning. Apparently quite a few people watched, making this the first time I've ever solved a cube for someone over video. The solve was pretty bad (I thought they were joking when they mentioned it before so I wasn't really prepared). Actually, it was acceptable until I got a Ua PLL. That 1.5 second algorithm took closer to 10, and I've been practicing it at every opportunity since then. Next time someone asks, I'll be prepared!

As for the post title, I recently jailbroke (jailbreaked?) my iPod, and it was the best thing I've done with it since I bought it. Apple really does a great job with it's products, and it's always the little things that impress: uniform icon sizes, a responsive UI, the very nice notification center, etc. But there are things that I would like to have. For example, I don't want to be restricted to Apple's folder size limit, or be forced to deal with 12-15 default apps that I never use. So I fixed that, and I love it. It's still the same amazing core features, but it's the little things that make it so much better now. I love how the Calendar app displays the current date, and with a jailbroken device you can do the same with the weather (and clock, but I don't use that). I can remove app names and don't have to follow Apple's forced grid placement. And I can hide those pesky system apps.

It's been a long few weeks, but it's only 5 more till the end of the quarter.

Sunday, March 31, 2013

Post Spring Break

So winter quarter ended and spring break is coming to a close, so I thought I could contribute a little to the blog. The quarter was very stressful (hence the lack of posts), but I made it through. Funnily enough it was my best academic quarter ever, though that may be more of a result of the amount of work I had to put in to stay alive in my classes. I also got my first A ever in a course! I definitely deserved 3, but I'll take whatever they give me.

Both quarters this year I've had to bite the bullet on one of either computer science or real analysis. Full effort in both classes takes between 15 and 20 hours per week, which is far too much. Last quarter I poured my effort into analysis, this quarter into computer science (though I would have preferred the analysis grade). I guess that's just how the world works. Biology was a cakewalk - I studied for the quizzes the class before. And SOSC is a paper-writing class, which means that I will get a good grade but not a great one. Easily the most stressful quarter I've had so far, but maybe my most productive.

But after all that hell, I got to go to Florida! It was a great trip for me, though the team didn't do as well as we could have. I still have a shiny 0.00 ERA, but I've only thrown 4 innings. Still, I got tan and was able to spend a full week away from school. I even have a decent tan (on my face). Still, it was a great time with the guys and I'm looking forward to spring quarter.

Friday, February 22, 2013

Found out How to Put Code in Blog Posts!

It's an exciting day! Yesterday I figured out how to put code into my blog posts. It's sort of sad that I get excited by something like that, but, hey, I like it! I updated the only post I had with code in it, and I think it's beautiful.

In separate but related news, I finally got myself a github account. Nothing is on there yet, but I have big things in mind for it in the future. Not that big, mind you, but big. My current hope is to sync up my Dropbox and github repositories. What else am I going to do with 11GB of Dropbox storage? Not much else going on in my life, but woo code!

Sunday, February 17, 2013

The C Programming Language

I suppose it was inevitable, but my path into the world of programming has led to to C. C might be the most famous of all programming languages. I can't actually tell you why that is the case, but in my experience it has been true. Just as well known as Java (by known I mean people recognize the name) and C++, and far ahead of languages like Python or Ruby or ML, C predates them all.

There's a reason C survived all these new languages, but I have yet to find it. As my computer science professor explained, C is like learning to drive a stick-shift car. It's very difficult to learn and people will laugh at you for not using an automatic, but it allows you to do much more with your knowledge and makes you appreciate the car more. The problem with that analogy, for me at least: I learned to drive on a stick shift, and it didn't take me nearly as much time to learn that as it has for me to learn C.

After learning to Program with a very "clean" (my own quotes) language in Haskell, and then a very English-friendly language in Python, learning C has been an adventure. I imagine myself as an explorer in the heart of Africa, paving new paths and meeting new people, always confused and amazed by their customs and constantly pondering how they are able to survive with such "primitive" methods. Just as those people were probably racist, I feel I am language-ist for thinking such thoughts. But I cannot see the need for the jungle of semicolons and curly brackets. My progressive programming languages had no need for such strange and unneeded grammar. But C does.

Harnessing C is difficult, and I am nowhere near where I should be. I spent a good 5 hours today trying to make my program read a file in 1 MB chunks instead of the whole thing at once. But it now works, and I think I'm a better programmer as a result. If learning Python was easy mode and Haskell hard mode, C would be god mode.

The thing I've found with C is that I improve much faster than I have with any other language. I wonder if that's because of the pace of the class (absurdly fast) or because the language just works like that. C is a stressful and complicated language, and I think working for 10-12 hours on "simple" programs trains your brain to deal with it better and faster next time. Somehow, being forced to solve a problem in C is training me to solve problems better and faster than any other language I have experience with. Also C programs run fast. If Haskell and Python were planes, C would be the Starship Enterprise. Not only are the file sizes smaller, but they just run faster. Significantly faster. And I can definitely appreciate that.

Lastly, I need to rant here for a bit. The big problem with teaching C in the modern day is Google. Google is the place to go for programming help. Need to know how to parse something in Haskell. Quick google search gets you the function you need. Need to know how to work with URLs in Python? Google that shit. Need to know how to do anything in C? Nope. Because of languages like C++ and C#, searching for any C code on the internet is a giant pain in the butt. Everybody tries to give you C++ code, and while the languages are similar they are different enough that the code will not work. Conversations I've had with some people all seem to lead to the same conclusion: C is used by programmers who do not need help from the internet, so they don't put anything on the internet. It may be true that C is at a level beyond that of the average programmer (if such a being exists), but I'd really appreciate it if Google could give me some answers every now and then (yes, I know about Stack Overflow).

Wednesday, February 6, 2013

Obligatory Birthday Post

I should be writing a paper. And getting my computer science lab done. But why do that when I can blog?

Today is my 20th birthday, and that means very little to me. I prefer to celebrate every 1000 days rather than every 365.25. But society tells me that I should celebrate, so I did. I did some homework last night, I'll do some homework tonight, and I'll go to baseball practice and run some. Tomorrow I'll finish my paper and go to an info session on becoming an RA. If that's not celebrating, I don't know what is!

Actually, though, I'm celebrating by having some leftover soda from the Super Bowl party. I haven't had soda this whole quarter, so it is something of a big deal. Also related to food: I have eaten 3 full meals a day on almost 90% of days in school this year. Again, might not seem like a big thing, but that's something I've tried hard to improve on from last year.

I know I haven't addressed my workout schedule at all since, well, a long time ago. Now that baseball has started up I only give myself the option of working out on Thursdays (no practice) or Saturdays (might as well be no practice sometimes). My arm seems to be in pretty good shape but my ankle and legs still aren't 100% yet, which is very disappointing. Still, it's a big improvement from last year and I should be ready to go by the time games start up.

Till next time.

Friday, January 25, 2013

Not Feeling So Great About Myself

Unfortunately, I have to make an unhappy post. It's almost the reason I made this blog: so I could be the one student that didn't cry to the internet every time something went wrong. But I guess I just didn't have it. I'm technically still a teenager, so I guess I'm just going through the final throes of my teenage angst. Something like that. Regardless, I a sad panda right now.

The worst part is I don't know why. Baseball started on Wednesday, and though I'm sore all over, I'm very happy to be back with the guys and doing things. Classes are going fine, though this internship thing is sort of stressing me out. I really don't want to go back home for the summer (though I can't even explain that) and all this down time is freaking me out. I was told in late December that I would be asked in January to schedule a phone interview (by two places I applied). Well, January is almost over but I haven't heard back from either one.

I don't know if I should contact them (my dad says they're busy and will get around to me), but I feel my Haskell comprehension slipping through my fingers as I spend less and less time programming for fun and more time programming for class. Honestly, who uses awk nowadays (don't answer - I know its still around)? I need the Haskell knowledge because one of the positions involves programming in OCaml, which is very similar to Haskell from what I've seen of it. Also I think its a shame I don't know how to use regex in Haskell, but that's mostly a product of my obsession with HTML scraping.

Next week is hell week for me: SOSC paper, analysis midterm, biology quiz, and a crazy lab of CS. I took the full brunt (as my House's vice president I have to take it) of some yelling during my neighbor house's House Meeting yesterday, which is unfortunate because most of them are my friends. I'm very unsure about my future because everywhere I turn someone is doing something better. Basically I'm unhappy right now with my life, and I doubt I'll be feeling better anytime soon.

Friday, January 18, 2013

First Two Weeks

I just realized I haven't posted in a while. Not that anything really exciting has happened. Classes started up and are taking up time like only classes can. Most of my time recently has been spent on fantasy baseball, though. Below is the code I use to get information on every active major league player. Once baseball practice starts up I might have more to say here.

#!/usr/bin/python
# Simon Swanson
# baseball-reference.Extraction.py


import sys
import re
import urllib2
from string import lowercase
from bs4 import BeautifulSoup


# takes in a letter of the alphabet and passes that to a URL containing all baseball players whose last names starts with the given letter
def url_info(char):
    '''Opens the correct URLs then delegates work to other functions to get player information.'''
    req = urllib2.urlopen('http://www.baseball-reference.com/players/%s' % char)
    f = req.read()
    req.close()
    info_name = {}
    soup = BeautifulSoup(f, 'lxml')
    # lines with boldface are famous and/or active players
    boldface = soup.find_all('b')

    for line in boldface: 
        
        if line.find(text=re.compile(r'\s+\d+-201[12]')):
            link = line.a['href']

            if link:
                req = urllib2.urlopen('http://www.baseball-reference.com%s' % link)
                f = req.read()
                req.close()
                soup = BeautifulSoup(f, 'lxml')
                player = player_info(soup)
                player_name = player[0]
                
                # if the name already exists, add an underscore to the end of the name
                if player_name in info_name.keys():
                    player_name = player_name + '_'

                info_name[player_name] = (player[1], player[2], player[3])
                print player_name

    return info_name


def player_info(html):
    '''Gets the players name, age, postition eligibility, and statistics. Returns a tuple.'''
    # default values in case something goes wrong
    player_name = 'Unknown'
    age = None
    positions = []
    stats = []
    info_name = {}

    # gets fielding information only from 2012 to determine position eligibility (10+ games at the position previous year)
    fielding = html.find_all(onclick='sumSpan(this);', id='2012:standard_fielding')

    for lines in fielding:
        games = lines.find_all('td', align='right')
        line = lines.find_all('td', align='left')
        # age can also be found in this fielding section. problematic if player didn't play in 2012, so I have a more complicated check below
        age = int(games[0].string)

        if int(games[1].string) >= 10:
            positions.append(str(line[3].string))

    # for cases where the player didn't play the field and/or doesn't have a position from 2012, I refer to baseball-reference.com's position (which is wrong in several cases, but at least gives me something to work with)
    if not positions:
        get_line = re.search(r'Positions?:([\w, ]+)\n', str(html))
        positions = position_info(get_line.group(1))
    
    # from the position I can find the appropriate stats (because there are no pitcher/fielders in the game today except in 20-0 blowouts)
    if positions[0] == 'P':
        stats = pitcher_info(html)
    else:
        stats = batter_info(html)

    # player's name is in the first h1 tag
    player_name = html.h1.string

    # verifies that the player has an age
    if not age:
        tag = html.find('span', id='necro-birth')
        dob = tag['data-birth']
        age = 2013 - int(dob[:4])

    return (player_name, age, positions, stats)


# simple extraction of batting stats (24 per year), puts into list
def batter_info(html):
    '''Gets all batter stats.'''
    batting_stats = []
    hitting = html.find_all('tr', class_='full', id=re.compile(r'batting_standard\.\d+'))
    
    if not hitting:
        batting_stats = pitching_stats(html)

    # excludes first element of the list because it's the player's age for each year (which I have more efficienty ways of getting)
    for lines in hitting:
        stats = lines.find_all('td', align='right')[1:]

        for stat in stats:
            stat = stat.string

            # sometimes there isn't a stat for something, and baseball-reference just leaves it blank
            if not stat:
                stat = 0

            batting_stats.append(float(stat))

    return batting_stats


# simple extraction of pitching stats (29 per year). Pitchers are more likely to succumb to the None value in their stats (more ratios) and can even have inf ratio values
def pitcher_info(html):
    '''Gets all statistics for pitchers.'''
    pitching_stats = []
    pitching = html.find_all('tr', class_='full', id=re.compile(r'pitching_standard\.\d+'))

    if not pitching:
        return ['0']

    for lines in pitching:
        stats = lines.find_all('td', align='right')[1:]

        for stat in stats:
            stat = stat.string
            
            if not stat:
                stat = 0

            # it is possible to have a pitcher stat be divided by 0 in some extreme cases. 1000 seems like a punishable enough number to work with
            if stat == 'inf':
                stat = 1000
                
            pitching_stats.append(float(stat))

    return pitching_stats


# figures out what's in the baseball-reference.com position list
def position_info(pos):
    '''If the position was not available before, matches with positions listed at the top of the page.'''
    positions = []
    position_dict = {'Pitcher': 'P', 'Catcher': 'C', 'First Baseman': '1B', 'Second Baseman': '2B', 'Third Baseman': '3B', 'Shortstop': 'SS', 'Outfielder': 'OF', 'Leftfielder': 'LF', 'Centerfielder': 'CF', 'Rightfielder': 'RF', 'Designated Hitter': 'DH'}

    # loops through the dict to try and match the key with the string of positions
    for key in position_dict:

        if key in pos:
            positions.append(position_dict[key])

    # some people just don't have a real position (pinch runner, etc.) and get the unknown label
    if not positions:
        return ['Unknown']

    return positions


# passes each letter of the alphabet to url_info
def main():
    '''Gets the information from every active major league baseball player, returned as a dictionary to a file.'''
    args = lowercase
    output = open('/Users/Swanson/Programs/Python/All_Player_Information.txt', 'w')
    info = []
    s = ''

    # adds all the separate dictionaries to a list as strings for further use
    for arg in args:
        info.append(str(url_info(arg)))
        print 'Finished with letter %s!\n' % arg

    # makes it all one big string to operate on
    for players in info:
        s += players

    # replaces all the dictionary separators to make one large dictionary
    s = s.replace('}{}{', ', ').replace('}{', ', ')
    output.write(s)
    output.close()


if __name__ == '__main__':
    main()