Home > Online Business > Mining The Stock Market: Which Measure Is Best ?

Mining The Stock Market: Which Measure Is Best ?

Published on July 7, 2009 by admin   ·   No Comments
  • File Title: Mining The Stock Market: Which Measure Is Best ?
  • Source: http://www.ideal.ece.utexas.edu
  • Number of pages: 10
  • Short Description:

    cations; examples include stock market data (probably the … ties of stock market data. First of all, the best clustering

Sample Mining The Stock Market: Which Measure Is Best ? PDF Content Inside ( Please note, sometime we have problem to read the content. We use automated process to stream the data)

thesensethatforeachoftheseclusters(sayC)theS&P
clusterclosesttoCalsochoosesCasitsclosestcluster.All
oftheremainingclustersarenotnearestneighborsofanyS
&Pcluster.
Thehighqualityoftheclusteringobtainedusingderivatives
hasveryinterestingimplications,sincetheperformanceanal-
ysisformostofthetimesseriesdatastructuresassumesthat
thesequencesaresmooth
1
,whichisclearlynotthecasefor
thederivatives.Therefore,ourresultssuggestthatnewal-
gorithmictechniquesshouldbedeveloped,tocapturethe
scenariosinwhichnon-smoothtimeseriesdataarepresent.
2.SETUPDESCRIPTION
TheData.WehaveusedtheStandardandPoor500index
(S&P)historicalstockdatapublishedat
http://kumo.swcp.com/stocks/.Thereareapproximately
500stockswhichdailypriceuctuationsarerecordedover
thecourseofoneyear.
Eachstockisasequenceofsomelengthd,whered252
(thelatternumberisthenumberofdaysin1998whenthe
stockmarketwasoperational,butdcanbesmallerifthe
companyisremovedfromtheIndex).Weusedonlythe
day’sopeningprice;thedataalsocontainstheclosingprice,
andthelowandhighstockvaluationfortheday.
ThedataalsocontainedtheocialS&Pclusteringinforma-
tionwhichgroupsthedierentstocksintoindustrygroups
basedontheirprimarybusinessfocus.Thisinformationwas
alsousedinourexperiments,withtheassumptionthatit
providesuswithabasisfora\ground-truth”withwhichwe
cancompareandratetheresultsofourunsupervisedcluster-
ingalgorithm.Weabstractedthe102membersofthisS&P
clusteringinto62\superclusters”bycombiningcloselyre-
latedonestogether,e.g.,\Automobiles”and\Auto(Parts)”
or\Computers(Software)”with\Computers(Hardware)”.
FeatureSelection.Ourfeatureselectionapproachcon-
sistsofthreemainsteps,depictedonthefollowingpicture:
Dim reduction
- Aggregation
- Fourier Transform
- PCA
- none
– first derivative
Normalization
– global
– piecewise
– raw data
Representation
– none
Figure1:Featureextractionprocess
1.Representationchoice:inthisstepwemaptheoriginal
timeseriesintoapointind-dimensionalspace,where
disclosetothelengthofthesequence.Weusetwo
typesofmapping:identityandrstderivative(orFD
forshort).Intherstcase,thewholesequenceis
consideredtobeone252-dimensionalpoint.Inthe
secondcase,thei-thcoordinateofthederivativevector
1
E.g.,[1,7]approximateasequencebyremovingallbut
fewelementsintheFourierrepresentationofasequence;the
qualityofapproximationinthiscasereliesonthefactthat
highfrequencycomponentofasignalhavelowamplitude,
whichisclearlynotthecaseforthederivativesequence.
isequaltothedierencebetweenthe(i+1)-thandi-
thvalueofthesequence.Bothmappingsarenatural
inthecontextoftime-seriesdata.
2.Normalization:inthisstepwedecideifandhowwe
shouldnormalizethevectors.Thestandardnormal-
izationisdonebycomputingthemeanofthevectorco-
ordinatesandsubtractingitfromallcoordinates(note
thatinthiswaythemeanbecomesequalto0)andthen
dividingthevectorbyitsL2norm.Thisstepallows
ustobringtogetherstockswhichfollowsimilartrends
butarevalueddierently,e.g.,duetostocksplits(note
thatourtimeseriesarenotadjustedforsplits).We
alsointroduceanovelnormalizationmethodwhichwe
callpiecewisenormalization.Theideahereistosplit
thesequenceintowindows,andperformnormalization
(asdescribedabove)separatelywithineachwindow.
Inthiswaywetakeintoaccountlocalsimilarities,as
opposedtotheglobalsimilaritycapturedbythenor-
malizationofthewholevector.
3.Dimensionalityreduction:inthisstepweaimtore-
ducethedimensionalityofthevectorspacewhilepre-
serving(orperhapsevenimproving)thequalityof
therepresentation.Ourrstdimensionalityreduction
techniqueisbasedonthePrincipalComponentAnal-
ysis(PCA).PCAmapsvectorsx
n
inad-dimensional
space(x
1
;:::;x
d
)ontovectorsz
n
inanM-dimensional
space,whereM<d.PCAndsdorthonormalba-
sisvectorsu
i
,calledalsoprincipalcomponents,and
retainsonlyasubsetM<doftheseprincipalcom-
ponentstorepresenttheprojectionsofvectorsx
n
into
thelower-dimensionalspace.PCAexploitsthetech-
niqueofSingularValueDecomposition(SVD),which
ndstheeigenvaluesandeigenvectorsofthecovari-
ancematrix
=
X
n
(x
n
x)(x
n
x)
T
wherexisthemeanofallvectorsx
n
.Theprincipal
componentsareshowntobetheeigenvectorscorre-
spondingtotheMlargesteigenvaluesofandthe
inputvectorsareprojectedontotheeigenvectorsto
givethecomponentsofthetransformedvectorsz
n
in
theM-dimensionalspace.
Oursecondtechnique,aggregation,isbasedonthe
assumptionthatlocaluctuationofthestock(say,
withintheperiodof10days)isnotasimportantasits
globalbehavior,andthereforethatwecanreplacea10
dayperiodbytheaveragestockpriceduringthattime.
Inparticular,wesplitthetimedomainintowindows
oflengthB(forB=5;10;20etc)andreplaceeach
windowbyitsaveragevalue.Clearly,thisdecreases
thedimensionalitybyafactorofB.
OurthirdtechniqueisbasedontheFourierTransform
(e.g.,see[1]forthedescription).Basically,weused
truncatedspectralrepresentations,i.e.,werepresented
atime-seriesbyonlyafewofitslowestfrequencies.
Untilnowwedescribedhowwecomparesequencesofiden-
ticallength.Inordertocompareapairofsequencesof
dierentlengths,wetakeonlytherelevantportionofthe


Tags:  , , , , , , , ,

Readers Comments (0)




WordPress Themes
Free WordPress Theme